Skip to content

Add VSR video postprocessing pipeline#316

Open
gtong-nv wants to merge 7 commits into
mainfrom
dev/gtong/vsr
Open

Add VSR video postprocessing pipeline#316
gtong-nv wants to merge 7 commits into
mainfrom
dev/gtong/vsr

Conversation

@gtong-nv

@gtong-nv gtong-nv commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Add VSR Video Postprocessing Pipeline

Summary

Adds a generic video postprocessing layer that runners can apply after video generation and before writing outputs. The first concrete implementation is FlashVSR, exposed as a lazy, swappable postprocessor that can be enabled from flashdreams-run.

Changes

  • Added reusable postprocessing contracts and tensor utilities under flashdreams.infra.postprocess, including VideoSpec, VideoChunk, processor/session interfaces, postprocess chaining, layout conversion, value-range conversion, and chunk concatenation.
  • Added RunnerConfig.postprocess plus a Runner.postprocess_video_tensor() helper, then wired it into common single-stream offline runners before MP4 output.
  • Added top-level flashdreams-run --postprocess.* options for enabling FlashVSR presets without constructing VSR models during --no-instantiate config inspection.
  • Added flashvsr.postprocess.FlashVSRPostProcessorConfig, which lazily builds FlashVSR from the first stream dimensions, coalesces arbitrary runner chunks into FlashVSR-compatible chunk sizes, and supports replicate-pad or drop tail handling.
  • Added CPU-safe tests for generic postprocess behavior and mocked FlashVSR orchestration.

CLI options

Top-level postprocess flags are available on every runner and append a FlashVSR processor to the runner's configured chain when mode is not none:

Flag Choices / type Default Description
--postprocess.mode none, flashvsr-v1.1-sparse-2.0, flashvsr-v1.1-sparse-1.5, flashvsr-v1.1-full-attn none Post-processing preset appended to the runner's configured chain
--postprocess.scale 2, 4 2 Spatial upsample factor for VSR post-processing
--postprocess.chunk-size 8, 16 16 Steady-state VSR chunk size
--postprocess.device string cuda Device used by the post-processing model
--postprocess.tail-policy replicate_pad, drop replicate_pad How VSR handles the final partial chunk
--postprocess.compile-network / --postprocess.no-compile-network bool False Enable torch.compile in the VSR post-processor
--postprocess.use-cuda-graph / --postprocess.no-use-cuda-graph bool False Enable CUDA graph replay in the VSR post-processor

Example:

flashdreams-run \
  --postprocess.mode flashvsr-v1.1-sparse-2.0 \
  --postprocess.scale 2 \
  --postprocess.chunk-size 8 \
  --postprocess.tail-policy replicate_pad \
  wan21-t2v-1.3b-480p

Inspect the resolved config without loading models:

flashdreams-run --no-instantiate \
  --postprocess.mode flashvsr-v1.1-sparse-2.0 \
  --postprocess.scale 2 \
  wan21-t2v-1.3b-480p

@copy-pr-bot

copy-pr-bot Bot commented Jun 9, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Gangzheng Tong <gtong@nvidia.com>
Signed-off-by: Gangzheng Tong <gtong@nvidia.com>
@gtong-nv gtong-nv marked this pull request as ready for review June 23, 2026 18:49
@greptile-apps

greptile-apps Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR introduces a generic video post-processing pipeline — contracts, tensor utilities, a chain executor, and a FlashVSR-backed implementation — and wires it into all single-stream offline runners via a new RunnerConfig.postprocess field and Runner.postprocess_video_tensor() helper.

  • Core infrastructure (flashdreams.infra.postprocess): VideoSpec, VideoChunk, session/processor ABCs, chain config/session, and layout/value-range conversion utilities are cleanly abstracted. The chain flush() correctly threads flushed output from each session through all downstream sessions before calling those sessions' own flush.
  • FlashVSR integration (flashvsr/postprocess.py): The _FlashVSRPostProcessorSession coalesces arbitrary runner chunks to FlashVSR's first/steady-state sizes, uses try/finally to guarantee finalize() is always called even when generate() raises, and supports replicate_pad or drop tail policies. Module-level preset singletons are non-frozen dataclasses returned directly by entry-point loading, so any mutation of a preset field would silently corrupt subsequent callers in the same process.
  • Runner wiring: The six single-stream runners uniformly apply postprocess_video_tensor().cpu() before MP4 write; the omnidreams runner additionally frees generation-pipeline VRAM before starting VSR.

Confidence Score: 5/5

The change is safe to merge; the only open items are efficiency and defensive-coding observations that do not affect correctness.

The core logic — chain execution, layout/value-range conversions, chunk coalescing, and the try/finally fix for finalize — is correct and well-tested by the new CPU test suite. The del self.pipeline concern in the omnidreams runner is not reachable from _num_views() in practice: all _num_views() calls complete before _rollout_and_save enters its post-generation cleanup branch. No new correctness-level defects were found.

integrations/flashvsr/flashvsr/postprocess.py deserves a second look for the mutable-singleton preset objects and the eager model-load behaviour when tail_policy=drop produces no output.

Important Files Changed

Filename Overview
flashdreams/flashdreams/infra/postprocess/base.py New file: defines VideoSpec, VideoChunk, VideoPostProcessorConfig/Session/Processor ABCs, VideoPostprocessChainConfig/Session, and layout/value-range tensor utilities. Logic is sound; chain flush correctly threads flushed output through downstream sessions.
flashdreams/flashdreams/infra/postprocess/init.py New file: re-exports all public symbols from base.py; all is complete and consistent with imports.
integrations/flashvsr/flashvsr/postprocess.py New FlashVSR postprocessor session with chunk coalescing, try/finally for finalize, and tail handling. Minor concerns: mutable module-level preset singletons, double call to _chunk_modes() in the error path, and eager model load when tail_policy=drop will discard all frames.
flashdreams/flashdreams/infra/runner.py Adds VideoPostprocessChainConfig to RunnerConfig and a postprocess_video_tensor() helper on Runner; changes are minimal and correct.
flashdreams/flashdreams/plugins/registry.py Adds discover_postprocess_presets() and resolve_postprocess_preset() using the same defensive entry-point pattern as discover_runners(). resolve_postprocess_preset() re-scans all entry points on every call (P2 efficiency concern).
integrations/omnidreams/omnidreams/runner.py Adds postprocessing path with VRAM reclamation (del self.pipeline + empty_cache) before VSR; splits canvas preparation into _prepare_canvas_for_write and _postprocess_generated_views. video is already on CPU (chunks collected via .cpu()), so device consistency is maintained.
integrations/causal_forcing/causal_forcing/runner.py Inserts postprocess_video_tensor().cpu() before MP4 write; generation pipeline is not freed before VSR (previously flagged concern).
integrations/wan21/wan21/runner.py Replaces bare generated.cpu() with postprocess_video_tensor().cpu(); straightforward and correct.
integrations/flashvsr/tests/test_postprocess.py New CPU tests with a fake FlashVSR pipeline covering chunk coalescing, tail drop, generate-raises/finalize-still-called, and multi-view rejection. Coverage is thorough for the new session logic.
integrations/flashvsr/pyproject.toml Adds three flashdreams.postprocess_presets entry points mapping preset slugs to module-level config constants; correctly mirrors the flashdreams.runner_configs pattern.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Runner
    participant postprocess_video_tensor
    participant VideoPostprocessChainConfig
    participant VideoPostprocessChainSession
    participant FlashVSRPostProcessorSession
    participant FlashVSRPipeline

    Runner->>postprocess_video_tensor: tensor, layout, value_range, fps
    postprocess_video_tensor->>VideoPostprocessChainConfig: setup(VideoSpec)
    VideoPostprocessChainConfig->>VideoPostprocessChainConfig: resolved_processors() → discover_postprocess_presets()
    VideoPostprocessChainConfig->>FlashVSRPostProcessorSession: start(spec)
    VideoPostprocessChainConfig-->>postprocess_video_tensor: VideoPostprocessChainSession

    postprocess_video_tensor->>VideoPostprocessChainSession: process(VideoChunk)
    VideoPostprocessChainSession->>FlashVSRPostProcessorSession: process(chunk)
    FlashVSRPostProcessorSession->>FlashVSRPostProcessorSession: _ensure_pipeline() [lazy build]
    FlashVSRPostProcessorSession->>FlashVSRPostProcessorSession: _append_to_buffer()
    FlashVSRPostProcessorSession->>FlashVSRPostProcessorSession: _drain_ready_chunks()
    opt "buffer >= next_target_size"
        FlashVSRPostProcessorSession->>FlashVSRPipeline: generate(ar_idx, cache, clip)
        FlashVSRPipeline-->>FlashVSRPostProcessorSession: upscaled clip
        FlashVSRPostProcessorSession->>FlashVSRPipeline: finalize(ar_idx, cache) [always, via finally]
    end
    FlashVSRPostProcessorSession-->>VideoPostprocessChainSession: list[VideoChunk]
    VideoPostprocessChainSession-->>postprocess_video_tensor: list[VideoChunk]

    postprocess_video_tensor->>VideoPostprocessChainSession: flush()
    opt "tail_policy == replicate_pad"
        FlashVSRPostProcessorSession->>FlashVSRPostProcessorSession: pad tail, run, trim
        FlashVSRPostProcessorSession->>FlashVSRPipeline: generate + finalize
    end
    VideoPostprocessChainSession-->>postprocess_video_tensor: list[VideoChunk]
    postprocess_video_tensor->>postprocess_video_tensor: concatenate_video_chunks(layout, value_range)
    postprocess_video_tensor-->>Runner: Tensor (same layout, upscaled)
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Runner
    participant postprocess_video_tensor
    participant VideoPostprocessChainConfig
    participant VideoPostprocessChainSession
    participant FlashVSRPostProcessorSession
    participant FlashVSRPipeline

    Runner->>postprocess_video_tensor: tensor, layout, value_range, fps
    postprocess_video_tensor->>VideoPostprocessChainConfig: setup(VideoSpec)
    VideoPostprocessChainConfig->>VideoPostprocessChainConfig: resolved_processors() → discover_postprocess_presets()
    VideoPostprocessChainConfig->>FlashVSRPostProcessorSession: start(spec)
    VideoPostprocessChainConfig-->>postprocess_video_tensor: VideoPostprocessChainSession

    postprocess_video_tensor->>VideoPostprocessChainSession: process(VideoChunk)
    VideoPostprocessChainSession->>FlashVSRPostProcessorSession: process(chunk)
    FlashVSRPostProcessorSession->>FlashVSRPostProcessorSession: _ensure_pipeline() [lazy build]
    FlashVSRPostProcessorSession->>FlashVSRPostProcessorSession: _append_to_buffer()
    FlashVSRPostProcessorSession->>FlashVSRPostProcessorSession: _drain_ready_chunks()
    opt "buffer >= next_target_size"
        FlashVSRPostProcessorSession->>FlashVSRPipeline: generate(ar_idx, cache, clip)
        FlashVSRPipeline-->>FlashVSRPostProcessorSession: upscaled clip
        FlashVSRPostProcessorSession->>FlashVSRPipeline: finalize(ar_idx, cache) [always, via finally]
    end
    FlashVSRPostProcessorSession-->>VideoPostprocessChainSession: list[VideoChunk]
    VideoPostprocessChainSession-->>postprocess_video_tensor: list[VideoChunk]

    postprocess_video_tensor->>VideoPostprocessChainSession: flush()
    opt "tail_policy == replicate_pad"
        FlashVSRPostProcessorSession->>FlashVSRPostProcessorSession: pad tail, run, trim
        FlashVSRPostProcessorSession->>FlashVSRPipeline: generate + finalize
    end
    VideoPostprocessChainSession-->>postprocess_video_tensor: list[VideoChunk]
    postprocess_video_tensor->>postprocess_video_tensor: concatenate_video_chunks(layout, value_range)
    postprocess_video_tensor-->>Runner: Tensor (same layout, upscaled)
Loading

Reviews (6): Last reviewed commit: "Remove unnecessary test" | Re-trigger Greptile

Comment thread integrations/omnidreams/omnidreams/runner.py Outdated
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Gangzheng Tong <tonggangzheng@gmail.com>
@gtong-nv

Copy link
Copy Markdown
Collaborator Author

/ok to test 91f0c21

Signed-off-by: Gangzheng Tong <gtong@nvidia.com>
Comment thread integrations/lingbot/lingbot/runner.py
Comment thread flashdreams/flashdreams/scripts/cli.py Outdated
Comment thread flashdreams/tests/test_video_postprocess.py Outdated
Comment thread flashdreams/flashdreams/scripts/cli.py Outdated
Comment thread integrations/flashvsr/flashvsr/postprocess.py Outdated
Comment thread integrations/flashvsr/flashvsr/postprocess.py Outdated
Comment thread integrations/flashvsr/flashvsr/postprocess.py Outdated
gtong-nv added 2 commits June 26, 2026 16:05
Replace the bespoke CLI postprocess mode selector with
VideoPostprocessChainConfig.preset discovery, lazy FlashVSR pipeline
building, finalize-on-failure handling, and runner-aligned defaults.
@gtong-nv

Copy link
Copy Markdown
Collaborator Author

/ok to test 89aab9c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants