Skip to content

QVAC-19998 feat: Add LTX support with Metal kernels and unified GGML#12

Open
aegioscy wants to merge 6 commits into
2026-06-04from
feature/ltx
Open

QVAC-19998 feat: Add LTX support with Metal kernels and unified GGML#12
aegioscy wants to merge 6 commits into
2026-06-04from
feature/ltx

Conversation

@aegioscy
Copy link
Copy Markdown

@aegioscy aegioscy commented Jun 5, 2026

Summary

Adds full LTX-2.3 video generation support to the stable-diffusion.cpp fork with complete Metal GPU acceleration.

Key changes:

  • Unified GGML: leejet/ggml v0.12 + Metal IM2COL_3D/PAD + ROPE_FLUX kernels
  • LTX engine: All qvac vcpkg patches reconciled on upstream LTX base
  • ESRGAN backend preference API preserved
  • Addon API migrated to LTX's 5-arg generate_video

Tested:

  • Generated 10-second LTX-2.3 video (768×512, 241 frames, full Metal GPU, no --vae-on-cpu)
  • vcpkg overlay build validated from GitHub sources
  • Backwards compatible with qvac-ext-stable-diffusion

Implementation

6 commits implementing:

  • qvac vcpkg port patches reconciliation
  • ESRGAN upscaler device API
  • Backend preference defaults
  • Metal IM2COL_3D + PAD kernels for LTX video VAE
  • Unified ggml submodule integration
  • System ggml compatibility

All based on upstream/master and ready for team review.

Made with Cursor

aegioscy and others added 6 commits June 3, 2026 14:12
Replays the downstream vcpkg overlay patches (originally b474457) on top of
upstream master, adapting to upstream's refactored backend system:

- preferred_gpu_backend (sd_backend_preference_t) is preserved as public API.
  Upstream's richer backend/params_backend string mechanism stays primary;
  when no explicit --backend is set, init_backend() now derives the spec from
  preferred_gpu_backend and the SD_CPU_ONLY env var (auto/cpu/gpu/opencl).
- abort-callback: sd_set_abort_callback() / sd_abort_requested() restored;
  the denoise step now returns an empty GuiderOutput when an abort is
  requested, so sample_k_diffusion bails through the normal cleanup path.
- ggml->sd log bridge restored (surfaces backend init failures via the host
  log callback; e.g. Android Vulkan diagnostics).

Dropped as obsolete (already fixed/superseded upstream):
- generic #ifdef backend init (replaced by backend_manager).
- failure-path free_compute_buffer fix (upstream already frees
  work_diffusion_model on the failure path).

Builds clean (sd-cli, Release, Metal).

Co-authored-by: Cursor <cursoragent@cursor.com>
…nto upstream

Combines the net effect of the two downstream ESRGAN commits and adapts them
to upstream's refactored upscaler (backend_manager / SDBackendModule::UPSCALER):

- Public API preserved for the vcpkg/RuntimeStats consumer:
  - sd_upscaler_device_t { CPU=0, GPU=1 }
  - new_upscaler_ctx_with_device(..., device, gpu_backend_pref)
  - get_upscaler_backend_device() -> 0=CPU, 1=GPU, -1=error
- Matches the final shipped qvac enum: sd_backend_preference_t drops AUTO
  (CPU=0, GPU, OPENCL) and defaults to GPU; init_backend()/sd_ctx_params_init
  updated accordingly.
- new_upscaler_ctx_with_device maps the high-level device/preference onto
  upstream's backend spec string; UpscalerGGML tracks actual_backend_device
  (resolved post-init via sd_backend_is_cpu) instead of the old custom
  device-enumeration init (superseded by backend_manager).

Builds clean (sd-cli, Release, Metal).

Co-authored-by: Cursor <cursoragent@cursor.com>
…n_cpu)

Follow-up to the b474457/ESRGAN reconcile:

- init_backend(): map SD_BACKEND_PREF_GPU (and the default) to an EMPTY
  backend spec rather than "gpu". Upstream auto-selection is already
  GPU-first, and an empty spec (unlike an explicit "gpu") leaves the
  keep_clip/vae/control_net_on_cpu overrides effective -- an explicit spec
  makes runtime_assignment_ non-empty and silently disables --vae-on-cpu.
- common.cpp: the CLI builds sd_ctx_params via aggregate init, which
  left the new preferred_gpu_backend field at 0 (== SD_BACKEND_PREF_CPU in
  the shipped qvac enum), forcing the whole pipeline onto CPU. Set it to
  SD_BACKEND_PREF_GPU explicitly.

Co-authored-by: Cursor <cursoragent@cursor.com>
Cherry-picks qvac-ext-ggml@bc053644 ("metal: add IM2COL_3D op and PAD
left-padding support for Wan video") onto leejet/ggml@0ce7ad3 (v0.12.0,
which the fork builds against to match upstream LTX). leejet's Metal
backend implements IM2COL_3D/PAD on CPU/CUDA only, so the LTX video VAE
(IM2COL_3D) and audio VAE (PAD) aborted on Metal and required --vae-on-cpu.

With this kernel port the entire LTX-2.3 pipeline (diffusion + video VAE +
audio VAE) runs on Metal: verified 512x320x25 T2V with audio on M3 Ultra,
~49s, no --vae-on-cpu.

NOTE: the ggml submodule now points at a local branch commit. To make the
fork cloneable, this commit must be pushed to a ggml fork (e.g.
tetherto/qvac-ext-ggml) and .gitmodules updated to that URL.

Co-authored-by: Cursor <cursoragent@cursor.com>
…/ltx-metal-unified)

Pins the ggml submodule to the unified branch = leejet/ggml v0.12 +
Metal IM2COL_3D/PAD + GGML_OP_ROPE_FLUX + qvac packaging deltas. This
single ggml supports LTX on Metal and stays backwards-compatible with
qvac-ext-stable-diffusion (feature/ltx-support, which calls ggml_rope_flux)
and the qvac monorepo DL integration.

Co-authored-by: Cursor <cursoragent@cursor.com>
…l pin

- ggml_graph_cut.cpp: include "ggml-impl.h" by name instead of the submodule
  path "../ggml/src/ggml-impl.h" so it resolves under both the bundled-submodule
  build and SD_USE_SYSTEM_GGML=ON (where the ggml port installs ggml-impl.h)
- CMakeLists: add ggml/src to the include path when building the submodule
- bump ggml submodule to the unified-ggml commit that exports GGML_MAX_NAME and
  ggml-impl.h via the package config

Co-authored-by: Cursor <cursoragent@cursor.com>
@aegioscy aegioscy changed the title feat: Add LTX support with Metal kernels and unified GGML QVAC-19998 feat: Add LTX support with Metal kernels and unified GGML Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant