Add Helios 14B streaming T2V integration#347
Conversation
Register three flashdreams-run slugs wrapping HeliosPyramidPipeline, with smoke tests and model gallery docs. Closes NVIDIA#276.
Greptile SummaryThis PR adds a first-party FlashDreams integration for the Helios 14B real-time streaming text-to-video model, wrapping
Confidence Score: 3/5The core runner and smoke tests work, but the benchmark tool crashes immediately and the video-conditioning fallback silently corrupts the AR history for all remaining chunks. The benchmark's
Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant Runner as HeliosT2VRunner
participant Pipeline as HeliosStreamingPipeline
participant Encoder as HeliosEncoder
participant Diffusers as HeliosPyramidPipeline
participant Cache as HeliosPipelineCache
Runner->>Pipeline: "initialize_cache(text=[prompt])"
Pipeline->>Encoder: encode(prompt, negative_prompt, guidance_scale)
Encoder->>Diffusers: encode_prompt(...)
Diffusers-->>Encoder: prompt_embeds, negative_prompt_embeds
Encoder-->>Pipeline: HeliosConditionings
Pipeline-->>Runner: "HeliosPipelineCache(cond=...)"
loop "AR step i = 0..total_blocks-1"
Runner->>Pipeline: generate(i, cache, width, height)
Pipeline->>Pipeline: _video_conditioning(cache)
Note over Pipeline: Returns prior decoded frames as [B,T,C,H,W] if >= 33 frames exist
Pipeline->>Diffusers: "pipe(prompt_embeds, video=video_cond, ...)"
alt Diffusers call fails and video_cond present
Pipeline->>Diffusers: "pipe(...) without video="
Note over Pipeline,Cache: Discontinuous frames still appended to decoded_chunks
end
Diffusers-->>Pipeline: result.frames [T,C,H,W]
Pipeline->>Pipeline: _normalize_frames_to_flashdreams(frames)
Pipeline->>Cache: decoded_chunks.append(normalized)
Pipeline->>Cache: "pending_history = normalized[-history_len:]"
Pipeline-->>Runner: normalized [T,C,H,W]
Runner->>Pipeline: finalize(i, cache)
Pipeline->>Cache: "history_frames = pending_history"
end
Runner->>Runner: torch.cat(chunks) then write MP4
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant Runner as HeliosT2VRunner
participant Pipeline as HeliosStreamingPipeline
participant Encoder as HeliosEncoder
participant Diffusers as HeliosPyramidPipeline
participant Cache as HeliosPipelineCache
Runner->>Pipeline: "initialize_cache(text=[prompt])"
Pipeline->>Encoder: encode(prompt, negative_prompt, guidance_scale)
Encoder->>Diffusers: encode_prompt(...)
Diffusers-->>Encoder: prompt_embeds, negative_prompt_embeds
Encoder-->>Pipeline: HeliosConditionings
Pipeline-->>Runner: "HeliosPipelineCache(cond=...)"
loop "AR step i = 0..total_blocks-1"
Runner->>Pipeline: generate(i, cache, width, height)
Pipeline->>Pipeline: _video_conditioning(cache)
Note over Pipeline: Returns prior decoded frames as [B,T,C,H,W] if >= 33 frames exist
Pipeline->>Diffusers: "pipe(prompt_embeds, video=video_cond, ...)"
alt Diffusers call fails and video_cond present
Pipeline->>Diffusers: "pipe(...) without video="
Note over Pipeline,Cache: Discontinuous frames still appended to decoded_chunks
end
Diffusers-->>Pipeline: result.frames [T,C,H,W]
Pipeline->>Pipeline: _normalize_frames_to_flashdreams(frames)
Pipeline->>Cache: decoded_chunks.append(normalized)
Pipeline->>Cache: "pending_history = normalized[-history_len:]"
Pipeline-->>Runner: normalized [T,C,H,W]
Runner->>Pipeline: finalize(i, cache)
Pipeline->>Cache: "history_frames = pending_history"
end
Runner->>Runner: torch.cat(chunks) then write MP4
|
| def step() -> None: | ||
| pipe.generate(STEP_TO_MEASURE, cache, width=640, height=384) |
There was a problem hiding this comment.
Benchmark crashes on the second call to
step()
pipe.generate() asserts that each call increments autoregressive_index by exactly 1 (expected = (prev + 1)). The warmup loop at line 87–89 drives the cache up to STEP_TO_MEASURE - 1 = 5. The first invocation of step() (passing STEP_TO_MEASURE = 6) succeeds and sets cache.autoregressive_index = 6. Every subsequent call in measure() again passes 6, so expected = 7 but autoregressive_index = 6 → AssertionError. The benchmark crashes on the second warmup iteration and no timing data is collected.
A minimal fix is to maintain an incrementing counter inside step() and call pipe.finalize() after each generate so the cache stays consistent across the N_WARMUP + N_MEASURE = 13 timed repetitions.
| def get_helios_vae_class() -> Type[Any]: | ||
| try: | ||
| from diffusers import AutoencoderKLWan | ||
|
|
||
| return AutoencoderKLWan | ||
| except ImportError: | ||
| from diffusers.models import AutoencoderKLWan | ||
|
|
||
| return AutoencoderKLWan |
There was a problem hiding this comment.
Redundant fallback import that cannot actually help
If from diffusers import AutoencoderKLWan raises ImportError, it means AutoencoderKLWan is not in the top-level diffusers namespace. The fallback from diffusers.models import AutoencoderKLWan imports from the same underlying module; if the symbol is absent at the top level due to a version mismatch, the submodule import will still succeed, but the caller will silently use a different code path than expected. Conversely, if diffusers itself is not installed, both lines raise and the fallback provides no value. The redundant try/except should be collapsed.
| def get_helios_vae_class() -> Type[Any]: | |
| try: | |
| from diffusers import AutoencoderKLWan | |
| return AutoencoderKLWan | |
| except ImportError: | |
| from diffusers.models import AutoencoderKLWan | |
| return AutoencoderKLWan | |
| def get_helios_vae_class() -> Type[Any]: | |
| from diffusers import AutoencoderKLWan | |
| return AutoencoderKLWan |
Summary
Adds FlashDreams integration for Helios 14B real-time streaming text-to-video, addressing #276.
integrations/helios/wrappingHeliosPyramidPipelinefrom diffusershelios-distilled-t2v-14b—BestWishYsh/Helios-Distilled, pyramid[2,2,2]helios-base-t2v-14b—BestWishYsh/Helios-Base, pyramid[20,20,20], CFG 5.0helios-distilled-t2v-14b-2gpu— distilled checkpoint with Ulysses context parallelism (torchrun)generate()call yields one native 33-frame chunk through the FlashDreams streaming interfacedocs/source/models/helios.rstRequirements note
HeliosPyramidPipelinerequires a recent diffusers build. If the resolved PyPI release does not export it, install from source: