Skip to content

Add Helios 14B streaming T2V integration#347

Open
shy982 wants to merge 1 commit into
NVIDIA:mainfrom
shy982:add-helios-integration
Open

Add Helios 14B streaming T2V integration#347
shy982 wants to merge 1 commit into
NVIDIA:mainfrom
shy982:add-helios-integration

Conversation

@shy982

@shy982 shy982 commented Jun 24, 2026

Copy link
Copy Markdown

Summary

Adds FlashDreams integration for Helios 14B real-time streaming text-to-video, addressing #276.

  • New plugin under integrations/helios/ wrapping HeliosPyramidPipeline from diffusers
  • Three registered runner slugs:
    • helios-distilled-t2v-14bBestWishYsh/Helios-Distilled, pyramid [2,2,2]
    • helios-base-t2v-14bBestWishYsh/Helios-Base, pyramid [20,20,20], CFG 5.0
    • helios-distilled-t2v-14b-2gpu — distilled checkpoint with Ulysses context parallelism (torchrun)
  • Each generate() call yields one native 33-frame chunk through the FlashDreams streaming interface
  • CPU smoke tests for runner registration and entry-point discovery
  • Model gallery docs at docs/source/models/helios.rst

Requirements note

HeliosPyramidPipeline requires a recent diffusers build. If the resolved PyPI release does not export it, install from source:

pip install git+https://github.com/huggingface/diffusers.git
uv sync --project integrations/helios

Register three flashdreams-run slugs wrapping HeliosPyramidPipeline,
with smoke tests and model gallery docs. Closes NVIDIA#276.
@copy-pr-bot

copy-pr-bot Bot commented Jun 24, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps

greptile-apps Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a first-party FlashDreams integration for the Helios 14B real-time streaming text-to-video model, wrapping HeliosPyramidPipeline from diffusers and exposing three runner slugs (helios-distilled-t2v-14b, helios-base-t2v-14b, helios-distilled-t2v-14b-2gpu) through the standard flashdreams-run entry point. The core AR loop, runner, and smoke tests follow existing integration conventions correctly.

  • pipeline.py: implements HeliosStreamingPipeline with a 33-frame chunk AR loop, video-conditioning carry-over between steps, compile/FlashAttention toggles, and a silent fallback when video conditioning fails — the fallback path has a continuity bug where discontinuous frames are still appended to decoded_chunks.
  • run_benchmark.py: the FlashDreams timed path crashes immediately because step() repeatedly passes the fixed STEP_TO_MEASURE index to generate(), which enforces strictly-sequential AR indices.
  • config.py: PIPELINE_HELIOS_DISTILLED_T2V_14B_OPTIMIZED is defined but never registered as an entry point or included in RUNNER_CONFIGS, leaving it as dead code.

Confidence Score: 3/5

The core runner and smoke tests work, but the benchmark tool crashes immediately and the video-conditioning fallback silently corrupts the AR history for all remaining chunks.

The benchmark's step() function hard-codes the same AR index on every call; because generate() enforces strictly-sequential indices the benchmark crashes on the second call, producing no timing data. Separately, when video conditioning raises an exception, the fallback silently generates a frame without temporal context and still appends it to decoded_chunks, so every AR step thereafter is conditioned on a discontinuous frame — degrading quality for the rest of the generation. Both are defects in the changed code that manifest during normal use.

integrations/helios/tests/benchmark/run_benchmark.py (benchmark crashes on second call), integrations/helios/helios/pipeline.py (fallback path corrupts history, fragile pixel-range heuristic).

Important Files Changed

Filename Overview
integrations/helios/tests/benchmark/run_benchmark.py FlashDreams benchmark is broken: step() passes the fixed STEP_TO_MEASURE index on every call, triggering an AssertionError on the second invocation because the pipeline enforces strictly-sequential AR indices.
integrations/helios/helios/pipeline.py Core streaming pipeline; two issues found: the pixel-range heuristic in _normalize_frames_to_flashdreams can silently double-remap bright frames, and the video-conditioning fallback poisons decoded_chunks with discontinuous frames.
integrations/helios/helios/config.py Three runner configs correctly registered; PIPELINE_HELIOS_DISTILLED_T2V_14B_OPTIMIZED is defined but not wired to any entry point or runner slug.
integrations/helios/helios/helios_loader.py Model loading logic is sound; redundant try/except fallback in get_helios_vae_class can be removed.
integrations/helios/helios/runner.py Runner drives AR loop correctly with generate + finalize pairing; output video writing and stats serialization look correct.
integrations/helios/helios/encoder.py Thin T5 encoder wrapper; CFG gating logic is clear and correct.
integrations/helios/helios/cache.py Simple dataclass for AR state; no issues.
integrations/helios/helios/compiler.py FlashAttention and torch.compile helpers look correct; dynamic=True is appropriate for varying AR chunk sizes.
integrations/helios/tests/test_smoke.py CPU smoke tests thoroughly validate runner registration, entry-point discovery, and the chunk-frame constant.
integrations/helios/pyproject.toml Package metadata and entry points are consistent with the three registered runner slugs.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Runner as HeliosT2VRunner
    participant Pipeline as HeliosStreamingPipeline
    participant Encoder as HeliosEncoder
    participant Diffusers as HeliosPyramidPipeline
    participant Cache as HeliosPipelineCache

    Runner->>Pipeline: "initialize_cache(text=[prompt])"
    Pipeline->>Encoder: encode(prompt, negative_prompt, guidance_scale)
    Encoder->>Diffusers: encode_prompt(...)
    Diffusers-->>Encoder: prompt_embeds, negative_prompt_embeds
    Encoder-->>Pipeline: HeliosConditionings
    Pipeline-->>Runner: "HeliosPipelineCache(cond=...)"

    loop "AR step i = 0..total_blocks-1"
        Runner->>Pipeline: generate(i, cache, width, height)
        Pipeline->>Pipeline: _video_conditioning(cache)
        Note over Pipeline: Returns prior decoded frames as [B,T,C,H,W] if >= 33 frames exist
        Pipeline->>Diffusers: "pipe(prompt_embeds, video=video_cond, ...)"
        alt Diffusers call fails and video_cond present
            Pipeline->>Diffusers: "pipe(...) without video="
            Note over Pipeline,Cache: Discontinuous frames still appended to decoded_chunks
        end
        Diffusers-->>Pipeline: result.frames [T,C,H,W]
        Pipeline->>Pipeline: _normalize_frames_to_flashdreams(frames)
        Pipeline->>Cache: decoded_chunks.append(normalized)
        Pipeline->>Cache: "pending_history = normalized[-history_len:]"
        Pipeline-->>Runner: normalized [T,C,H,W]
        Runner->>Pipeline: finalize(i, cache)
        Pipeline->>Cache: "history_frames = pending_history"
    end

    Runner->>Runner: torch.cat(chunks) then write MP4
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Runner as HeliosT2VRunner
    participant Pipeline as HeliosStreamingPipeline
    participant Encoder as HeliosEncoder
    participant Diffusers as HeliosPyramidPipeline
    participant Cache as HeliosPipelineCache

    Runner->>Pipeline: "initialize_cache(text=[prompt])"
    Pipeline->>Encoder: encode(prompt, negative_prompt, guidance_scale)
    Encoder->>Diffusers: encode_prompt(...)
    Diffusers-->>Encoder: prompt_embeds, negative_prompt_embeds
    Encoder-->>Pipeline: HeliosConditionings
    Pipeline-->>Runner: "HeliosPipelineCache(cond=...)"

    loop "AR step i = 0..total_blocks-1"
        Runner->>Pipeline: generate(i, cache, width, height)
        Pipeline->>Pipeline: _video_conditioning(cache)
        Note over Pipeline: Returns prior decoded frames as [B,T,C,H,W] if >= 33 frames exist
        Pipeline->>Diffusers: "pipe(prompt_embeds, video=video_cond, ...)"
        alt Diffusers call fails and video_cond present
            Pipeline->>Diffusers: "pipe(...) without video="
            Note over Pipeline,Cache: Discontinuous frames still appended to decoded_chunks
        end
        Diffusers-->>Pipeline: result.frames [T,C,H,W]
        Pipeline->>Pipeline: _normalize_frames_to_flashdreams(frames)
        Pipeline->>Cache: decoded_chunks.append(normalized)
        Pipeline->>Cache: "pending_history = normalized[-history_len:]"
        Pipeline-->>Runner: normalized [T,C,H,W]
        Runner->>Pipeline: finalize(i, cache)
        Pipeline->>Cache: "history_frames = pending_history"
    end

    Runner->>Runner: torch.cat(chunks) then write MP4
Loading

Comments Outside Diff (3)

  1. integrations/helios/helios/pipeline.py, line 983-1003 (link)

    P1 Video-conditioning fallback silently poisons future AR steps

    When self.pipe(**pipe_kwargs) fails and video conditioning is to blame, the code retries without "video" and proceeds. The resulting frames — generated without temporal context — are then appended to cache.decoded_chunks and stored in cache.pending_history (lines 1000–1001). Every subsequent AR step will call _video_conditioning(cache) and feed those discontinuous frames as context, compounding the discontinuity. At minimum, cache.decoded_chunks should not receive a frame generated without its expected history, or the history should be explicitly invalidated so future steps know the continuity chain is broken.

  2. integrations/helios/helios/pipeline.py, line 1006-1014 (link)

    P2 Pixel-range heuristic can silently double-remap frames

    The condition frames.max() <= 1.0 and frames.min() >= 0.0 is used to detect [0, 1]-range output and remap to [-1, 1]. For a uniformly bright scene (e.g., snow, sky) every pixel in the native [-1, 1] representation can be positive, causing min() >= 0.0 to be True and incorrectly triggering the 2× scale-shift. The resulting tensor would have values outside the expected range after the _video_conditioning round-trip, producing colour artifacts on the first video-conditioned AR step. A more reliable check is to use a known output_type contract from the diffusers pipeline rather than inferring it from tensor values.

  3. integrations/helios/helios/config.py, line 492-502 (link)

    P2 PIPELINE_HELIOS_DISTILLED_T2V_14B_OPTIMIZED is unreachable dead code

    This config object is defined and has a comment referencing "Panel C," but it is not included in RUNNER_CONFIGS, not listed as an entry point in pyproject.toml, and is not exported from __init__.py. It can never be loaded by flashdreams-run or discovered by the plugin system. If it's intentionally deferred, a # TODO: note would make that clear; otherwise it should be either wired up or removed.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Reviews (1): Last reviewed commit: "Add Helios 14B streaming T2V integration" | Re-trigger Greptile

Comment on lines +91 to +92
def step() -> None:
pipe.generate(STEP_TO_MEASURE, cache, width=640, height=384)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Benchmark crashes on the second call to step()

pipe.generate() asserts that each call increments autoregressive_index by exactly 1 (expected = (prev + 1)). The warmup loop at line 87–89 drives the cache up to STEP_TO_MEASURE - 1 = 5. The first invocation of step() (passing STEP_TO_MEASURE = 6) succeeds and sets cache.autoregressive_index = 6. Every subsequent call in measure() again passes 6, so expected = 7 but autoregressive_index = 6AssertionError. The benchmark crashes on the second warmup iteration and no timing data is collected.

A minimal fix is to maintain an incrementing counter inside step() and call pipe.finalize() after each generate so the cache stays consistent across the N_WARMUP + N_MEASURE = 13 timed repetitions.

Comment on lines +23 to +31
def get_helios_vae_class() -> Type[Any]:
try:
from diffusers import AutoencoderKLWan

return AutoencoderKLWan
except ImportError:
from diffusers.models import AutoencoderKLWan

return AutoencoderKLWan

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Redundant fallback import that cannot actually help

If from diffusers import AutoencoderKLWan raises ImportError, it means AutoencoderKLWan is not in the top-level diffusers namespace. The fallback from diffusers.models import AutoencoderKLWan imports from the same underlying module; if the symbol is absent at the top level due to a version mismatch, the submodule import will still succeed, but the caller will silently use a different code path than expected. Conversely, if diffusers itself is not installed, both lines raise and the fallback provides no value. The redundant try/except should be collapsed.

Suggested change
def get_helios_vae_class() -> Type[Any]:
try:
from diffusers import AutoencoderKLWan
return AutoencoderKLWan
except ImportError:
from diffusers.models import AutoencoderKLWan
return AutoencoderKLWan
def get_helios_vae_class() -> Type[Any]:
from diffusers import AutoencoderKLWan
return AutoencoderKLWan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant