feat(ltx2): LTX-2.3 video generation (M1+M2+M3) — conversion, C API, parity, Bare JS addon by 64johnlee · Pull Request #7 · tetherto/qvac-ext-stable-diffusion.cpp

64johnlee · 2026-05-30T09:02:25Z

LTX-2.3 video generation for qvac-ext-stable-diffusion.cpp — M1 + M2 + M3

Implements the Tether LTX-2 bounty: LTX-2 T2V + I2V via ggml across CPU / Vulkan / Metal, plus a Bare runtime addon for JavaScript video generation in QVAC.

📖 The reviewable diff is 14 files / ~1,463 lines

The diff against this fork's master looks enormous only because tetherto:master is ~100 commits behind upstream leejet/stable-diffusion.cpp, so the PR necessarily carries that upstream sync. The actual contribution, viewed against current upstream:

➡️ leejet/stable-diffusion.cpp@master...64johnlee:qvac-ext-stable-diffusion.cpp:ltx2-video-generation

(A fork-sync to upstream on your side would collapse this PR down to exactly those files.)

M1 — conversion + API + CPU correctness

script/convert_ltx2.py — safetensors → GGUF for the LTX-2.3 stack (14B DiT, Gemma-3 text encoder, spatiotemporal Video-VAE); f16/q4_0/q5_1/q8_0.
include/ltx2.h — clean C façade (ltx2_generate_t2v, ltx2_generate_i2v) over stable-diffusion.h.
src/ggml_extend.hpp — CPU VAE im2col crash fix.

M2 — cross-backend + parity

script/ltx2_parity.py — two-tier parity harness: strict same-backend reproducibility + loose cross-backend PSNR vs a CPU golden reference.
test_m2.sh — portable T2V/I2V smoke test.
.github/workflows/ltx2-ci.yml — Linux Vulkan build job + parity-script validation.

M3 — Bare JS addon (`bare/`)

binding.c — js.h/bare.h native addon exposing createContext / generateT2V / generateI2V; sd_ctx_t wrapped as a finalized external; frames returned as a contiguous RGB ArrayBuffer; registered via BARE_MODULE.
index.js / binding.js — ergonomic API + require.addon() loader.
CMakeLists.txt / package.json — cmake-bare build ("addon": true), links the stable-diffusion library.
README.md / test.js.
✅ Verified building on linux-x64 (clang 18 + lld, CPU): compiles, links libstable-diffusion.a, installs to prebuilds/linux-x64/, and require() exposes the API; smoke test passes.

🤖 Generated with Claude Code

…sage (leejet#1349)

* feat: add support for the eta parameter to ancestral samplers * feat: Euler Ancestral sampler implementation for flow models * refine flow ancestral sampling and normalize eta defaults --------- Co-authored-by: leejet <leejet714@gmail.com>

Co-authored-by: leejet <leejet714@gmail.com>

…1583)

…nal SLG (leejet#593)

Co-authored-by: leejet <leejet714@gmail.com>

…ceeds (leejet#1593)

…ion (leejet#1576)

* refactor: img-cond->img_uncond * align APG and CFG++ with img-uncond CFG * set default img_cfg to 1.f --------- Co-authored-by: leejet <leejet714@gmail.com>

…ol crash - script/convert_ltx2.py: safetensors → GGUF at Q4_0/Q5_1/Q8_0/F16 with selective F16 preservation for norms, biases, and embeddings - include/ltx2.h: focused public C API for LTX-2 T2V and I2V inference, wrapping stable-diffusion.h with ltx2_new_ctx / ltx2_generate_t2v / ltx2_generate_i2v helpers - fix(ggml_ext_conv_3d): fall back to explicit im2col+mul_mat when weight type is not F16/F32, fixing assertion crash in ggml_compute_forward_im2col_f16 on CPU with quantized VAE weights (upstream issue leejet#1577) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The previous element-wise Python loop was O(n) in pure Python — too slow for 14B-parameter tensors. Replace with a numpy byte-copy: write the two BF16 bytes into positions [2] and [3] of each uint32 word (BF16 is float32 with the low 16 bits zeroed), then reinterpret as float32. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Three jobs on every push to ltx2-video-generation and on PRs to master: - build-linux: cmake + Ninja on ubuntu-22.04, asserts vid_gen / embeddings-connectors / diffusion-fa flags present in sd-cli --help - convert-script: syntax check + --help + two synthetic GGUF round-trips (F32→Q8_0 and BF16→F16 via KEEP_F16_PATTERNS) - build-macos-arm64: cmake + Metal on macos-14 (ARM64), uploads sd-cli artifact for 7 days Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…f16_to_fp32 safe_open(framework="numpy") doesn't support BF16 tensors because numpy has no bfloat16 dtype. Replace with a hand-rolled parser (_iter_safetensors) that reads the safetensors binary format directly (8-byte LE header size + JSON metadata + raw tensor bytes), eliminating the torch/safetensors dep. Also fix bf16_to_fp32: calling .view(uint8) on a multi-dimensional array gives a multi-dim byte array whose [0::2] slice has the wrong shape. Flatten to 1D first with .ravel() so the byte interleaving works correctly. CI: drop safetensors from pip install since it is no longer imported. Both round-trips (F32→Q8_0 and BF16→F16) verified locally. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

GitHub is forcing Node 24 as default on June 16; set FORCE_JAVASCRIPT_ACTIONS_TO_NODE24 at workflow level to adopt it now. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sd-cli appends .avi to the -o path unconditionally; update the results ls check to match the actual filenames produced. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds **/*.sh plus explicit test_m2.sh, test_*.sh, and .github/test_*.sh to the on.push paths filter so test scripts (like the recently-added test_m2.sh that didn't trigger CI on commit 259b7ad) participate in the CI gating cycle. The wildcard alone would suffice; the explicit entries are kept as documentation of which scripts we specifically care about.

Silently mismatched data_offsets produced wrong tensor data without error. Now raises ValueError with tensor name, expected bytes, shape, dtype, and actual bytes for fast diagnosis. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- ltx2_ctx_params_set_defaults: remove schedule/sample_method/cfg_scale which do not exist on sd_ctx_params_t (they live on sd_sample_params_t) - Add ltx2_vid_params_set_defaults() to set LTX-2 sample defaults on sd_vid_gen_params_t.sample_params where they actually belong - Call ltx2_vid_params_set_defaults() in both generate_t2v and generate_i2v - Fix typo: embeddings_connector_path -> embeddings_connectors_path Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Adds the M2 deliverables for cross-backend LTX-2 video generation: - script/ltx2_parity.py: two-tier parity harness. Strict same-backend reproducibility (re-run same seed -> frames must match) plus loose cross-backend similarity (CPU golden vs Vulkan/Metal via per-frame PSNR), since exact pixel parity is not achievable for multi-step diffusion across FP16/kernel-order-differing ggml backends. Drives sd-cli, extracts frames with ffmpeg, supports --update-ref / --self-check. - test_m2.sh: made portable (LTX2_BIN/LTX2_BUILD/LTX2_MODELS/LTX2_OUT/ LTX2_INIT_IMAGE env vars, with file guards) instead of hardcoded paths. - ltx2-ci.yml: add a Linux Vulkan build job (GGML_VULKAN=ON, compile-only as CI has no GPU) and a parity-script validation job (syntax/help/guard). Full T2V/I2V parity runs on developer GPU/Metal hardware via ltx2_parity.py. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Adds bare/ — a Holepunch Bare native addon exposing LTX-2 to JavaScript in the QVAC ecosystem, wrapping the header-only ltx2.h C API: - binding.c — js.h/bare.h addon; createContext / generateT2V / generateI2V, sd_ctx_t wrapped as a finalized external, frames returned as a contiguous RGB ArrayBuffer. Registered via BARE_MODULE. - index.js — ergonomic JS API (option objects, LTX2Context with destroy()). - binding.js — require.addon() loader. - CMakeLists.txt — cmake-bare build; links the repo's stable-diffusion library. - package.json — "addon": true, bare-make generate/build scripts. - README.md — build + usage + first-compile verify-points. - test.js — smoke test (skips without $LTX2_MODELS). Follows the holepunchto/bare-zlib addon conventions. Built with bare-make on developer hardware (no bare.h/js.h in CI); structural + JS validation pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Verified end-to-end (clang 18 + lld, CPU): compiles, links libstable-diffusion.a, installs to prebuilds/linux-x64/, and require() exposes the API. - CMakeLists.txt: enable CXX (project ... C CXX) — pulls in the C++ sd targets. - binding.c: add <stdbool.h>; ltx2_new_ctx takes 8 args — pass vae_decode_only=false (encoder needed for I2V). - test.js: drop Node-only APIs unavailable in Bare — require('path') -> template strings, process.env -> bare-env, process.exit -> Bare.exit; require('./index.js'). - package.json: declare bare-env devDependency. - README: document the bare-make install step + clang/lld prereqs + verified note. - .gitignore: build/, node_modules/, prebuilds/, package-lock.json. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

64johnlee · 2026-06-04T15:09:58Z

👋 Status update — LTX-2 bounty, all three milestones are in this PR:

M1 — script/convert_ltx2.py (safetensors → GGUF), include/ltx2.h C façade (ltx2_generate_t2v/_i2v), and a CPU Video-VAE im2col fix.
M2 — cross-backend parity harness (script/ltx2_parity.py: strict same-backend reproducibility + loose cross-backend PSNR), portable test_m2.sh, and a Vulkan build + parity job in CI.
M3 — a Bare native addon (bare/) exposing createContext / generateT2V / generateI2V to JS. Verified building on linux-x64 (clang 18 + lld, CPU): it compiles, links libstable-diffusion.a, installs to prebuilds/, and require() loads the API; smoke test passes.

The diff looks huge only because this fork's master trails upstream leejet/stable-diffusion.cpp by ~100 commits, so the PR carries that sync. The real contribution is 14 files / ~1.4k lines against current upstream:
leejet/stable-diffusion.cpp@master...64johnlee:qvac-ext-stable-diffusion.cpp:ltx2-video-generation

Happy to rebase onto a synced master or split the upstream-sync from the feature commits — whatever's easiest to review. 🙏

rmatif and others added 30 commits March 10, 2026 00:35

feat: add spectrum caching method (leejet#1322)

dea4980

refactor: remove ununsed encode_video (leejet#1332)

d6dd6d7

docs: add Anima2 gguf download link to anima.md (leejet#1335)

6fa7ca9

feat: add generic DiT support to spectrum cache (leejet#1336)

adfef62

chore: remove SD_FAST_SOFTMAX build flag (leejet#1338)

f6968bc

refactor: move all cache parameter defaults to the library (leejet#1327)

630ee03

ci: add CUDA Dockerfile (leejet#1314)

83eabd7

refactor: optimize the VAE architecture (leejet#1345)

acc3bf1

ci: avoid cuda docker build timeout by using -j16

61d8331

feat: add embedded WebUI (leejet#1207)

862a658

fix: correct encoder channels for flux2 (leejet#1346)

997bb11

style: remove redundant struct qualifiers for consistent C/C++ type u…

84cbd88

…sage (leejet#1349)

perf(z-image): switch to fused SwiGLU kernel (leejet#1302)

5265a5e

refactor: simplify sample cache flow (leejet#1350)

545fac4

docs: update Spectrum info about DiT models (leejet#1360)

6293ab5

refactor: simplify f8_e5m2_to_f16 function a little bit (leejet#1358)

ed88e21

refactor: migrate generation pipeline to sd::Tensor (leejet#1373)

f16a110

sync: update ggml

8f2967c

refactor: move VAE tiling parameters to SDGenerationParams (leejet#1261)

02dd5e5

fix: disable extra T5 mask padding for Wan (leejet#1375)

8d87887

refactor(server): split server endpoint registration (leejet#1376)

83e8f6f

refactor: split and simplify sample_k_diffusion samplers (leejet#1377)

1d6cb0f

chore(server): link winsock2 for non-MSVC windows (leejet#1378)

4d52320

feat(server): add generation metadata to png images (leejet#1217)

4fe7a35

feat: show tensor loading progress in MB/s or GB/s (leejet#1380)

bf02167

fix: use resolved image size in embedded metadata (leejet#1382)

6dfe945

feat(cli): add metadata inspection mode (leejet#1381)

09b12d5

feat: add webp support (leejet#1384)

87ecb95

chore: make libwebp optional and support system libwebp (leejet#1387)

687a81f

Co-authored-by: leejet <leejet714@gmail.com>

leejet and others added 24 commits May 30, 2026 18:38

fix: split tokens before normalization (leejet#1582)

d3b2cb0

fix: correct Gemma3 rope settings and vram limit propagation (leejet#…

d2797b8

…1583)

feat: add PiD support (leejet#1585)

0982807

fix: remove kv padding from flash attention wrapper (leejet#1453)

20901f6

feat: add support for APG (adaptive projected guidance) + uncondition…

be65ac7

…nal SLG (leejet#593)

feat: support img-cfg for edit models (leejet#929)

f8935d6

Co-authored-by: leejet <leejet714@gmail.com>

refactor: call CPU backend functions dynamically (leejet#1591)

02f0637

Co-authored-by: leejet <leejet714@gmail.com>

fix(cmake): build HIP backend with PIC so the static-lib PIE link suc…

7948df8

…ceeds (leejet#1593)

feat: --stream-layers for streaming weights from CPU during generat…

ed74577

…ion (leejet#1576)

chore: embed server web UI in Docker images (leejet#1597)

9c7f9a2

feat: make Wan2.2 5B FLF2V work (leejet#1110)

2d40a8b

refactor: img-cond->img_uncond (leejet#1594)

4513e3f

* refactor: img-cond->img_uncond * align APG and CFG++ with img-uncond CFG * set default img_cfg to 1.f --------- Co-authored-by: leejet <leejet714@gmail.com>

perf: keep chunk-K residency engaged with runtime LoRA (leejet#1598)

a7f2e03

fix: zero Wan2.2 TI2V timesteps for fixed frames (leejet#1604)

1f9ee88

ci: opt into Node.js 24 for actions to silence deprecation warning

99ab65f

GitHub is forcing Node 24 as default on June 16; set FORCE_JAVASCRIPT_ACTIONS_TO_NODE24 at workflow level to adopt it now. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test: fix M2 script to expect .avi suffix from sd-cli output

1ee82af

sd-cli appends .avi to the -o path unconditionally; update the results ls check to match the actual filenames produced. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

64johnlee force-pushed the ltx2-video-generation branch from d410ed3 to 97d9a6c Compare June 4, 2026 04:42

64johnlee changed the title ~~feat(ltx2): LTX-2.3 video generation — conversion script, ltx2.h C API, CPU VAE fix~~ feat(ltx2): LTX-2.3 video generation (M1 + M2) — conversion, C API, CPU VAE fix, cross-backend parity Jun 4, 2026

64johnlee and others added 2 commits June 4, 2026 12:52

64johnlee changed the title ~~feat(ltx2): LTX-2.3 video generation (M1 + M2) — conversion, C API, CPU VAE fix, cross-backend parity~~ feat(ltx2): LTX-2.3 video generation (M1+M2+M3) — conversion, C API, parity, Bare JS addon Jun 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ltx2): LTX-2.3 video generation (M1+M2+M3) — conversion, C API, parity, Bare JS addon#7

feat(ltx2): LTX-2.3 video generation (M1+M2+M3) — conversion, C API, parity, Bare JS addon#7
64johnlee wants to merge 161 commits into
tetherto:masterfrom
64johnlee:ltx2-video-generation

64johnlee commented May 30, 2026 •

edited

Loading

Uh oh!

64johnlee commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

64johnlee commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

LTX-2.3 video generation for qvac-ext-stable-diffusion.cpp — M1 + M2 + M3

📖 The reviewable diff is 14 files / ~1,463 lines

M1 — conversion + API + CPU correctness

M2 — cross-backend + parity

M3 — Bare JS addon (bare/)

Uh oh!

64johnlee commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

64johnlee commented May 30, 2026 •

edited

Loading

M3 — Bare JS addon (`bare/`)