QVAC-19998 feat: Add LTX support with Metal kernels and unified GGML#12
Open
aegioscy wants to merge 6 commits into
Open
QVAC-19998 feat: Add LTX support with Metal kernels and unified GGML#12aegioscy wants to merge 6 commits into
aegioscy wants to merge 6 commits into
Conversation
Replays the downstream vcpkg overlay patches (originally b474457) on top of upstream master, adapting to upstream's refactored backend system: - preferred_gpu_backend (sd_backend_preference_t) is preserved as public API. Upstream's richer backend/params_backend string mechanism stays primary; when no explicit --backend is set, init_backend() now derives the spec from preferred_gpu_backend and the SD_CPU_ONLY env var (auto/cpu/gpu/opencl). - abort-callback: sd_set_abort_callback() / sd_abort_requested() restored; the denoise step now returns an empty GuiderOutput when an abort is requested, so sample_k_diffusion bails through the normal cleanup path. - ggml->sd log bridge restored (surfaces backend init failures via the host log callback; e.g. Android Vulkan diagnostics). Dropped as obsolete (already fixed/superseded upstream): - generic #ifdef backend init (replaced by backend_manager). - failure-path free_compute_buffer fix (upstream already frees work_diffusion_model on the failure path). Builds clean (sd-cli, Release, Metal). Co-authored-by: Cursor <cursoragent@cursor.com>
…nto upstream
Combines the net effect of the two downstream ESRGAN commits and adapts them
to upstream's refactored upscaler (backend_manager / SDBackendModule::UPSCALER):
- Public API preserved for the vcpkg/RuntimeStats consumer:
- sd_upscaler_device_t { CPU=0, GPU=1 }
- new_upscaler_ctx_with_device(..., device, gpu_backend_pref)
- get_upscaler_backend_device() -> 0=CPU, 1=GPU, -1=error
- Matches the final shipped qvac enum: sd_backend_preference_t drops AUTO
(CPU=0, GPU, OPENCL) and defaults to GPU; init_backend()/sd_ctx_params_init
updated accordingly.
- new_upscaler_ctx_with_device maps the high-level device/preference onto
upstream's backend spec string; UpscalerGGML tracks actual_backend_device
(resolved post-init via sd_backend_is_cpu) instead of the old custom
device-enumeration init (superseded by backend_manager).
Builds clean (sd-cli, Release, Metal).
Co-authored-by: Cursor <cursoragent@cursor.com>
…n_cpu) Follow-up to the b474457/ESRGAN reconcile: - init_backend(): map SD_BACKEND_PREF_GPU (and the default) to an EMPTY backend spec rather than "gpu". Upstream auto-selection is already GPU-first, and an empty spec (unlike an explicit "gpu") leaves the keep_clip/vae/control_net_on_cpu overrides effective -- an explicit spec makes runtime_assignment_ non-empty and silently disables --vae-on-cpu. - common.cpp: the CLI builds sd_ctx_params via aggregate init, which left the new preferred_gpu_backend field at 0 (== SD_BACKEND_PREF_CPU in the shipped qvac enum), forcing the whole pipeline onto CPU. Set it to SD_BACKEND_PREF_GPU explicitly. Co-authored-by: Cursor <cursoragent@cursor.com>
Cherry-picks qvac-ext-ggml@bc053644 ("metal: add IM2COL_3D op and PAD
left-padding support for Wan video") onto leejet/ggml@0ce7ad3 (v0.12.0,
which the fork builds against to match upstream LTX). leejet's Metal
backend implements IM2COL_3D/PAD on CPU/CUDA only, so the LTX video VAE
(IM2COL_3D) and audio VAE (PAD) aborted on Metal and required --vae-on-cpu.
With this kernel port the entire LTX-2.3 pipeline (diffusion + video VAE +
audio VAE) runs on Metal: verified 512x320x25 T2V with audio on M3 Ultra,
~49s, no --vae-on-cpu.
NOTE: the ggml submodule now points at a local branch commit. To make the
fork cloneable, this commit must be pushed to a ggml fork (e.g.
tetherto/qvac-ext-ggml) and .gitmodules updated to that URL.
Co-authored-by: Cursor <cursoragent@cursor.com>
…/ltx-metal-unified) Pins the ggml submodule to the unified branch = leejet/ggml v0.12 + Metal IM2COL_3D/PAD + GGML_OP_ROPE_FLUX + qvac packaging deltas. This single ggml supports LTX on Metal and stays backwards-compatible with qvac-ext-stable-diffusion (feature/ltx-support, which calls ggml_rope_flux) and the qvac monorepo DL integration. Co-authored-by: Cursor <cursoragent@cursor.com>
…l pin - ggml_graph_cut.cpp: include "ggml-impl.h" by name instead of the submodule path "../ggml/src/ggml-impl.h" so it resolves under both the bundled-submodule build and SD_USE_SYSTEM_GGML=ON (where the ggml port installs ggml-impl.h) - CMakeLists: add ggml/src to the include path when building the submodule - bump ggml submodule to the unified-ggml commit that exports GGML_MAX_NAME and ggml-impl.h via the package config Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds full LTX-2.3 video generation support to the stable-diffusion.cpp fork with complete Metal GPU acceleration.
Key changes:
Tested:
Implementation
6 commits implementing:
All based on upstream/master and ready for team review.
Made with Cursor