feat(ltx2): LTX-2.3 video generation (M1+M2+M3) — conversion, C API, parity, Bare JS addon#7
Open
64johnlee wants to merge 161 commits into
Open
feat(ltx2): LTX-2.3 video generation (M1+M2+M3) — conversion, C API, parity, Bare JS addon#764johnlee wants to merge 161 commits into
64johnlee wants to merge 161 commits into
Conversation
* feat: add support for the eta parameter to ancestral samplers * feat: Euler Ancestral sampler implementation for flow models * refine flow ancestral sampling and normalize eta defaults --------- Co-authored-by: leejet <leejet714@gmail.com>
Co-authored-by: leejet <leejet714@gmail.com>
Co-authored-by: leejet <leejet714@gmail.com>
Co-authored-by: leejet <leejet714@gmail.com>
* refactor: img-cond->img_uncond * align APG and CFG++ with img-uncond CFG * set default img_cfg to 1.f --------- Co-authored-by: leejet <leejet714@gmail.com>
…ol crash - script/convert_ltx2.py: safetensors → GGUF at Q4_0/Q5_1/Q8_0/F16 with selective F16 preservation for norms, biases, and embeddings - include/ltx2.h: focused public C API for LTX-2 T2V and I2V inference, wrapping stable-diffusion.h with ltx2_new_ctx / ltx2_generate_t2v / ltx2_generate_i2v helpers - fix(ggml_ext_conv_3d): fall back to explicit im2col+mul_mat when weight type is not F16/F32, fixing assertion crash in ggml_compute_forward_im2col_f16 on CPU with quantized VAE weights (upstream issue leejet#1577) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous element-wise Python loop was O(n) in pure Python — too slow for 14B-parameter tensors. Replace with a numpy byte-copy: write the two BF16 bytes into positions [2] and [3] of each uint32 word (BF16 is float32 with the low 16 bits zeroed), then reinterpret as float32. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three jobs on every push to ltx2-video-generation and on PRs to master: - build-linux: cmake + Ninja on ubuntu-22.04, asserts vid_gen / embeddings-connectors / diffusion-fa flags present in sd-cli --help - convert-script: syntax check + --help + two synthetic GGUF round-trips (F32→Q8_0 and BF16→F16 via KEEP_F16_PATTERNS) - build-macos-arm64: cmake + Metal on macos-14 (ARM64), uploads sd-cli artifact for 7 days Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…f16_to_fp32 safe_open(framework="numpy") doesn't support BF16 tensors because numpy has no bfloat16 dtype. Replace with a hand-rolled parser (_iter_safetensors) that reads the safetensors binary format directly (8-byte LE header size + JSON metadata + raw tensor bytes), eliminating the torch/safetensors dep. Also fix bf16_to_fp32: calling .view(uint8) on a multi-dimensional array gives a multi-dim byte array whose [0::2] slice has the wrong shape. Flatten to 1D first with .ravel() so the byte interleaving works correctly. CI: drop safetensors from pip install since it is no longer imported. Both round-trips (F32→Q8_0 and BF16→F16) verified locally. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GitHub is forcing Node 24 as default on June 16; set FORCE_JAVASCRIPT_ACTIONS_TO_NODE24 at workflow level to adopt it now. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sd-cli appends .avi to the -o path unconditionally; update the results ls check to match the actual filenames produced. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds **/*.sh plus explicit test_m2.sh, test_*.sh, and .github/test_*.sh to the on.push paths filter so test scripts (like the recently-added test_m2.sh that didn't trigger CI on commit 259b7ad) participate in the CI gating cycle. The wildcard alone would suffice; the explicit entries are kept as documentation of which scripts we specifically care about.
Silently mismatched data_offsets produced wrong tensor data without error. Now raises ValueError with tensor name, expected bytes, shape, dtype, and actual bytes for fast diagnosis. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- ltx2_ctx_params_set_defaults: remove schedule/sample_method/cfg_scale which do not exist on sd_ctx_params_t (they live on sd_sample_params_t) - Add ltx2_vid_params_set_defaults() to set LTX-2 sample defaults on sd_vid_gen_params_t.sample_params where they actually belong - Call ltx2_vid_params_set_defaults() in both generate_t2v and generate_i2v - Fix typo: embeddings_connector_path -> embeddings_connectors_path Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds the M2 deliverables for cross-backend LTX-2 video generation: - script/ltx2_parity.py: two-tier parity harness. Strict same-backend reproducibility (re-run same seed -> frames must match) plus loose cross-backend similarity (CPU golden vs Vulkan/Metal via per-frame PSNR), since exact pixel parity is not achievable for multi-step diffusion across FP16/kernel-order-differing ggml backends. Drives sd-cli, extracts frames with ffmpeg, supports --update-ref / --self-check. - test_m2.sh: made portable (LTX2_BIN/LTX2_BUILD/LTX2_MODELS/LTX2_OUT/ LTX2_INIT_IMAGE env vars, with file guards) instead of hardcoded paths. - ltx2-ci.yml: add a Linux Vulkan build job (GGML_VULKAN=ON, compile-only as CI has no GPU) and a parity-script validation job (syntax/help/guard). Full T2V/I2V parity runs on developer GPU/Metal hardware via ltx2_parity.py. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
d410ed3 to
97d9a6c
Compare
Adds bare/ — a Holepunch Bare native addon exposing LTX-2 to JavaScript in the
QVAC ecosystem, wrapping the header-only ltx2.h C API:
- binding.c — js.h/bare.h addon; createContext / generateT2V / generateI2V,
sd_ctx_t wrapped as a finalized external, frames returned as a
contiguous RGB ArrayBuffer. Registered via BARE_MODULE.
- index.js — ergonomic JS API (option objects, LTX2Context with destroy()).
- binding.js — require.addon() loader.
- CMakeLists.txt — cmake-bare build; links the repo's stable-diffusion library.
- package.json — "addon": true, bare-make generate/build scripts.
- README.md — build + usage + first-compile verify-points.
- test.js — smoke test (skips without $LTX2_MODELS).
Follows the holepunchto/bare-zlib addon conventions. Built with bare-make on
developer hardware (no bare.h/js.h in CI); structural + JS validation pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Verified end-to-end (clang 18 + lld, CPU): compiles, links libstable-diffusion.a,
installs to prebuilds/linux-x64/, and require() exposes the API.
- CMakeLists.txt: enable CXX (project ... C CXX) — pulls in the C++ sd targets.
- binding.c: add <stdbool.h>; ltx2_new_ctx takes 8 args — pass vae_decode_only=false
(encoder needed for I2V).
- test.js: drop Node-only APIs unavailable in Bare — require('path') -> template
strings, process.env -> bare-env, process.exit -> Bare.exit; require('./index.js').
- package.json: declare bare-env devDependency.
- README: document the bare-make install step + clang/lld prereqs + verified note.
- .gitignore: build/, node_modules/, prebuilds/, package-lock.json.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Author
|
👋 Status update — LTX-2 bounty, all three milestones are in this PR:
The diff looks huge only because this fork's Happy to rebase onto a synced |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
LTX-2.3 video generation for qvac-ext-stable-diffusion.cpp — M1 + M2 + M3
Implements the Tether LTX-2 bounty: LTX-2 T2V + I2V via ggml across CPU / Vulkan / Metal, plus a Bare runtime addon for JavaScript video generation in QVAC.
📖 The reviewable diff is 14 files / ~1,463 lines
The diff against this fork's
masterlooks enormous only becausetetherto:masteris ~100 commits behind upstreamleejet/stable-diffusion.cpp, so the PR necessarily carries that upstream sync. The actual contribution, viewed against current upstream:➡️ leejet/stable-diffusion.cpp@master...64johnlee:qvac-ext-stable-diffusion.cpp:ltx2-video-generation
(A fork-sync to upstream on your side would collapse this PR down to exactly those files.)
M1 — conversion + API + CPU correctness
script/convert_ltx2.py— safetensors → GGUF for the LTX-2.3 stack (14B DiT, Gemma-3 text encoder, spatiotemporal Video-VAE);f16/q4_0/q5_1/q8_0.include/ltx2.h— clean C façade (ltx2_generate_t2v,ltx2_generate_i2v) overstable-diffusion.h.src/ggml_extend.hpp— CPU VAE im2col crash fix.M2 — cross-backend + parity
script/ltx2_parity.py— two-tier parity harness: strict same-backend reproducibility + loose cross-backend PSNR vs a CPU golden reference.test_m2.sh— portable T2V/I2V smoke test..github/workflows/ltx2-ci.yml— Linux Vulkan build job + parity-script validation.M3 — Bare JS addon (
bare/)binding.c—js.h/bare.hnative addon exposingcreateContext/generateT2V/generateI2V;sd_ctx_twrapped as a finalized external; frames returned as a contiguous RGBArrayBuffer; registered viaBARE_MODULE.index.js/binding.js— ergonomic API +require.addon()loader.CMakeLists.txt/package.json—cmake-barebuild ("addon": true), links thestable-diffusionlibrary.README.md/test.js.libstable-diffusion.a, installs toprebuilds/linux-x64/, andrequire()exposes the API; smoke test passes.🤖 Generated with Claude Code