Skip to content

feat(ltx2): LTX-2.3 video generation (M1+M2+M3) — conversion, C API, parity, Bare JS addon#7

Open
64johnlee wants to merge 161 commits into
tetherto:masterfrom
64johnlee:ltx2-video-generation
Open

feat(ltx2): LTX-2.3 video generation (M1+M2+M3) — conversion, C API, parity, Bare JS addon#7
64johnlee wants to merge 161 commits into
tetherto:masterfrom
64johnlee:ltx2-video-generation

Conversation

@64johnlee
Copy link
Copy Markdown

@64johnlee 64johnlee commented May 30, 2026

LTX-2.3 video generation for qvac-ext-stable-diffusion.cpp — M1 + M2 + M3

Implements the Tether LTX-2 bounty: LTX-2 T2V + I2V via ggml across CPU / Vulkan / Metal, plus a Bare runtime addon for JavaScript video generation in QVAC.

📖 The reviewable diff is 14 files / ~1,463 lines

The diff against this fork's master looks enormous only because tetherto:master is ~100 commits behind upstream leejet/stable-diffusion.cpp, so the PR necessarily carries that upstream sync. The actual contribution, viewed against current upstream:

➡️ leejet/stable-diffusion.cpp@master...64johnlee:qvac-ext-stable-diffusion.cpp:ltx2-video-generation

(A fork-sync to upstream on your side would collapse this PR down to exactly those files.)

M1 — conversion + API + CPU correctness

  • script/convert_ltx2.py — safetensors → GGUF for the LTX-2.3 stack (14B DiT, Gemma-3 text encoder, spatiotemporal Video-VAE); f16/q4_0/q5_1/q8_0.
  • include/ltx2.h — clean C façade (ltx2_generate_t2v, ltx2_generate_i2v) over stable-diffusion.h.
  • src/ggml_extend.hpp — CPU VAE im2col crash fix.

M2 — cross-backend + parity

  • script/ltx2_parity.py — two-tier parity harness: strict same-backend reproducibility + loose cross-backend PSNR vs a CPU golden reference.
  • test_m2.sh — portable T2V/I2V smoke test.
  • .github/workflows/ltx2-ci.yml — Linux Vulkan build job + parity-script validation.

M3 — Bare JS addon (bare/)

  • binding.cjs.h/bare.h native addon exposing createContext / generateT2V / generateI2V; sd_ctx_t wrapped as a finalized external; frames returned as a contiguous RGB ArrayBuffer; registered via BARE_MODULE.
  • index.js / binding.js — ergonomic API + require.addon() loader.
  • CMakeLists.txt / package.jsoncmake-bare build ("addon": true), links the stable-diffusion library.
  • README.md / test.js.
  • Verified building on linux-x64 (clang 18 + lld, CPU): compiles, links libstable-diffusion.a, installs to prebuilds/linux-x64/, and require() exposes the API; smoke test passes.

🤖 Generated with Claude Code

rmatif and others added 30 commits March 10, 2026 00:35
* feat: add support for the eta parameter to ancestral samplers

* feat: Euler Ancestral sampler implementation for flow models

* refine flow ancestral sampling and normalize eta defaults

---------

Co-authored-by: leejet <leejet714@gmail.com>
leejet and others added 24 commits May 30, 2026 18:38
Co-authored-by: leejet <leejet714@gmail.com>
Co-authored-by: leejet <leejet714@gmail.com>
* refactor: img-cond->img_uncond

* align APG and CFG++ with img-uncond CFG

* set default img_cfg to 1.f

---------

Co-authored-by: leejet <leejet714@gmail.com>
…ol crash

- script/convert_ltx2.py: safetensors → GGUF at Q4_0/Q5_1/Q8_0/F16 with
  selective F16 preservation for norms, biases, and embeddings
- include/ltx2.h: focused public C API for LTX-2 T2V and I2V inference,
  wrapping stable-diffusion.h with ltx2_new_ctx / ltx2_generate_t2v /
  ltx2_generate_i2v helpers
- fix(ggml_ext_conv_3d): fall back to explicit im2col+mul_mat when weight
  type is not F16/F32, fixing assertion crash in ggml_compute_forward_im2col_f16
  on CPU with quantized VAE weights (upstream issue leejet#1577)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous element-wise Python loop was O(n) in pure Python — too slow
for 14B-parameter tensors. Replace with a numpy byte-copy: write the two
BF16 bytes into positions [2] and [3] of each uint32 word (BF16 is float32
with the low 16 bits zeroed), then reinterpret as float32.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three jobs on every push to ltx2-video-generation and on PRs to master:
- build-linux: cmake + Ninja on ubuntu-22.04, asserts vid_gen /
  embeddings-connectors / diffusion-fa flags present in sd-cli --help
- convert-script: syntax check + --help + two synthetic GGUF round-trips
  (F32→Q8_0 and BF16→F16 via KEEP_F16_PATTERNS)
- build-macos-arm64: cmake + Metal on macos-14 (ARM64), uploads sd-cli
  artifact for 7 days

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…f16_to_fp32

safe_open(framework="numpy") doesn't support BF16 tensors because numpy
has no bfloat16 dtype. Replace with a hand-rolled parser (_iter_safetensors)
that reads the safetensors binary format directly (8-byte LE header size +
JSON metadata + raw tensor bytes), eliminating the torch/safetensors dep.

Also fix bf16_to_fp32: calling .view(uint8) on a multi-dimensional array
gives a multi-dim byte array whose [0::2] slice has the wrong shape. Flatten
to 1D first with .ravel() so the byte interleaving works correctly.

CI: drop safetensors from pip install since it is no longer imported.
Both round-trips (F32→Q8_0 and BF16→F16) verified locally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GitHub is forcing Node 24 as default on June 16; set
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24 at workflow level to adopt it now.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sd-cli appends .avi to the -o path unconditionally; update the
results ls check to match the actual filenames produced.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds **/*.sh plus explicit test_m2.sh, test_*.sh, and .github/test_*.sh
to the on.push paths filter so test scripts (like the recently-added
test_m2.sh that didn't trigger CI on commit 259b7ad) participate in
the CI gating cycle. The wildcard alone would suffice; the explicit
entries are kept as documentation of which scripts we specifically
care about.
Silently mismatched data_offsets produced wrong tensor data without
error. Now raises ValueError with tensor name, expected bytes, shape,
dtype, and actual bytes for fast diagnosis.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- ltx2_ctx_params_set_defaults: remove schedule/sample_method/cfg_scale
  which do not exist on sd_ctx_params_t (they live on sd_sample_params_t)
- Add ltx2_vid_params_set_defaults() to set LTX-2 sample defaults on
  sd_vid_gen_params_t.sample_params where they actually belong
- Call ltx2_vid_params_set_defaults() in both generate_t2v and generate_i2v
- Fix typo: embeddings_connector_path -> embeddings_connectors_path

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds the M2 deliverables for cross-backend LTX-2 video generation:

- script/ltx2_parity.py: two-tier parity harness. Strict same-backend
  reproducibility (re-run same seed -> frames must match) plus loose
  cross-backend similarity (CPU golden vs Vulkan/Metal via per-frame PSNR),
  since exact pixel parity is not achievable for multi-step diffusion across
  FP16/kernel-order-differing ggml backends. Drives sd-cli, extracts frames
  with ffmpeg, supports --update-ref / --self-check.
- test_m2.sh: made portable (LTX2_BIN/LTX2_BUILD/LTX2_MODELS/LTX2_OUT/
  LTX2_INIT_IMAGE env vars, with file guards) instead of hardcoded paths.
- ltx2-ci.yml: add a Linux Vulkan build job (GGML_VULKAN=ON, compile-only as
  CI has no GPU) and a parity-script validation job (syntax/help/guard).

Full T2V/I2V parity runs on developer GPU/Metal hardware via ltx2_parity.py.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@64johnlee 64johnlee force-pushed the ltx2-video-generation branch from d410ed3 to 97d9a6c Compare June 4, 2026 04:42
@64johnlee 64johnlee changed the title feat(ltx2): LTX-2.3 video generation — conversion script, ltx2.h C API, CPU VAE fix feat(ltx2): LTX-2.3 video generation (M1 + M2) — conversion, C API, CPU VAE fix, cross-backend parity Jun 4, 2026
64johnlee and others added 2 commits June 4, 2026 12:52
Adds bare/ — a Holepunch Bare native addon exposing LTX-2 to JavaScript in the
QVAC ecosystem, wrapping the header-only ltx2.h C API:

- binding.c    — js.h/bare.h addon; createContext / generateT2V / generateI2V,
                 sd_ctx_t wrapped as a finalized external, frames returned as a
                 contiguous RGB ArrayBuffer. Registered via BARE_MODULE.
- index.js     — ergonomic JS API (option objects, LTX2Context with destroy()).
- binding.js   — require.addon() loader.
- CMakeLists.txt — cmake-bare build; links the repo's stable-diffusion library.
- package.json — "addon": true, bare-make generate/build scripts.
- README.md    — build + usage + first-compile verify-points.
- test.js      — smoke test (skips without $LTX2_MODELS).

Follows the holepunchto/bare-zlib addon conventions. Built with bare-make on
developer hardware (no bare.h/js.h in CI); structural + JS validation pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Verified end-to-end (clang 18 + lld, CPU): compiles, links libstable-diffusion.a,
installs to prebuilds/linux-x64/, and require() exposes the API.

- CMakeLists.txt: enable CXX (project ... C CXX) — pulls in the C++ sd targets.
- binding.c: add <stdbool.h>; ltx2_new_ctx takes 8 args — pass vae_decode_only=false
  (encoder needed for I2V).
- test.js: drop Node-only APIs unavailable in Bare — require('path') -> template
  strings, process.env -> bare-env, process.exit -> Bare.exit; require('./index.js').
- package.json: declare bare-env devDependency.
- README: document the bare-make install step + clang/lld prereqs + verified note.
- .gitignore: build/, node_modules/, prebuilds/, package-lock.json.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@64johnlee 64johnlee changed the title feat(ltx2): LTX-2.3 video generation (M1 + M2) — conversion, C API, CPU VAE fix, cross-backend parity feat(ltx2): LTX-2.3 video generation (M1+M2+M3) — conversion, C API, parity, Bare JS addon Jun 4, 2026
@64johnlee
Copy link
Copy Markdown
Author

👋 Status update — LTX-2 bounty, all three milestones are in this PR:

  • M1script/convert_ltx2.py (safetensors → GGUF), include/ltx2.h C façade (ltx2_generate_t2v/_i2v), and a CPU Video-VAE im2col fix.
  • M2 — cross-backend parity harness (script/ltx2_parity.py: strict same-backend reproducibility + loose cross-backend PSNR), portable test_m2.sh, and a Vulkan build + parity job in CI.
  • M3 — a Bare native addon (bare/) exposing createContext / generateT2V / generateI2V to JS. Verified building on linux-x64 (clang 18 + lld, CPU): it compiles, links libstable-diffusion.a, installs to prebuilds/, and require() loads the API; smoke test passes.

The diff looks huge only because this fork's master trails upstream leejet/stable-diffusion.cpp by ~100 commits, so the PR carries that sync. The real contribution is 14 files / ~1.4k lines against current upstream:
leejet/stable-diffusion.cpp@master...64johnlee:qvac-ext-stable-diffusion.cpp:ltx2-video-generation

Happy to rebase onto a synced master or split the upstream-sync from the feature commits — whatever's easiest to review. 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.