Add VSR video postprocessing pipeline by gtong-nv · Pull Request #316 · NVIDIA/flashdreams

gtong-nv · 2026-06-09T20:28:24Z

Add VSR Video Postprocessing Pipeline

Summary

Adds a generic video postprocessing layer that runners can apply after video generation and before writing outputs. The first concrete implementation is FlashVSR, exposed as a lazy, swappable postprocessor that can be enabled from flashdreams-run.

Changes

Added reusable postprocessing contracts and tensor utilities under flashdreams.infra.postprocess, including VideoSpec, VideoChunk, processor/session interfaces, postprocess chaining, layout conversion, value-range conversion, and chunk concatenation.
Added RunnerConfig.postprocess plus a Runner.postprocess_video_tensor() helper, then wired it into common single-stream offline runners before MP4 output.
Added top-level flashdreams-run --postprocess.* options for enabling FlashVSR presets without constructing VSR models during --no-instantiate config inspection.
Added flashvsr.postprocess.FlashVSRPostProcessorConfig, which lazily builds FlashVSR from the first stream dimensions, coalesces arbitrary runner chunks into FlashVSR-compatible chunk sizes, and supports replicate-pad or drop tail handling.
Added CPU-safe tests for generic postprocess behavior and mocked FlashVSR orchestration.

CLI options

Top-level postprocess flags are available on every runner and append a FlashVSR processor to the runner's configured chain when mode is not none:

Flag	Choices / type	Default	Description
`--postprocess.mode`	`none`, `flashvsr-v1.1-sparse-2.0`, `flashvsr-v1.1-sparse-1.5`, `flashvsr-v1.1-full-attn`	`none`	Post-processing preset appended to the runner's configured chain
`--postprocess.scale`	`2`, `4`	`2`	Spatial upsample factor for VSR post-processing
`--postprocess.chunk-size`	`8`, `16`	`16`	Steady-state VSR chunk size
`--postprocess.device`	string	`cuda`	Device used by the post-processing model
`--postprocess.tail-policy`	`replicate_pad`, `drop`	`replicate_pad`	How VSR handles the final partial chunk
`--postprocess.compile-network` / `--postprocess.no-compile-network`	bool	`False`	Enable `torch.compile` in the VSR post-processor
`--postprocess.use-cuda-graph` / `--postprocess.no-use-cuda-graph`	bool	`False`	Enable CUDA graph replay in the VSR post-processor

Example:

flashdreams-run \
  --postprocess.mode flashvsr-v1.1-sparse-2.0 \
  --postprocess.scale 2 \
  --postprocess.chunk-size 8 \
  --postprocess.tail-policy replicate_pad \
  wan21-t2v-1.3b-480p

Inspect the resolved config without loading models:

flashdreams-run --no-instantiate \
  --postprocess.mode flashvsr-v1.1-sparse-2.0 \
  --postprocess.scale 2 \
  wan21-t2v-1.3b-480p

copy-pr-bot · 2026-06-09T20:28:27Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Gangzheng Tong <gtong@nvidia.com>

greptile-apps · 2026-06-23T18:58:15Z

Greptile Summary

This PR introduces a generic video post-processing pipeline — contracts, tensor utilities, a chain executor, and a FlashVSR-backed implementation — and wires it into all single-stream offline runners via a new RunnerConfig.postprocess field and Runner.postprocess_video_tensor() helper.

Core infrastructure (flashdreams.infra.postprocess): VideoSpec, VideoChunk, session/processor ABCs, chain config/session, and layout/value-range conversion utilities are cleanly abstracted. The chain flush() correctly threads flushed output from each session through all downstream sessions before calling those sessions' own flush.
FlashVSR integration (flashvsr/postprocess.py): The _FlashVSRPostProcessorSession coalesces arbitrary runner chunks to FlashVSR's first/steady-state sizes, uses try/finally to guarantee finalize() is always called even when generate() raises, and supports replicate_pad or drop tail policies. Module-level preset singletons are non-frozen dataclasses returned directly by entry-point loading, so any mutation of a preset field would silently corrupt subsequent callers in the same process.
Runner wiring: The six single-stream runners uniformly apply postprocess_video_tensor().cpu() before MP4 write; the omnidreams runner additionally frees generation-pipeline VRAM before starting VSR.

Confidence Score: 5/5

The change is safe to merge; the only open items are efficiency and defensive-coding observations that do not affect correctness.

The core logic — chain execution, layout/value-range conversions, chunk coalescing, and the try/finally fix for finalize — is correct and well-tested by the new CPU test suite. The del self.pipeline concern in the omnidreams runner is not reachable from _num_views() in practice: all _num_views() calls complete before _rollout_and_save enters its post-generation cleanup branch. No new correctness-level defects were found.

integrations/flashvsr/flashvsr/postprocess.py deserves a second look for the mutable-singleton preset objects and the eager model-load behaviour when tail_policy=drop produces no output.

Important Files Changed

Filename	Overview
flashdreams/flashdreams/infra/postprocess/base.py	New file: defines VideoSpec, VideoChunk, VideoPostProcessorConfig/Session/Processor ABCs, VideoPostprocessChainConfig/Session, and layout/value-range tensor utilities. Logic is sound; chain flush correctly threads flushed output through downstream sessions.
flashdreams/flashdreams/infra/postprocess/init.py	New file: re-exports all public symbols from base.py; all is complete and consistent with imports.
integrations/flashvsr/flashvsr/postprocess.py	New FlashVSR postprocessor session with chunk coalescing, try/finally for finalize, and tail handling. Minor concerns: mutable module-level preset singletons, double call to _chunk_modes() in the error path, and eager model load when tail_policy=drop will discard all frames.
flashdreams/flashdreams/infra/runner.py	Adds VideoPostprocessChainConfig to RunnerConfig and a postprocess_video_tensor() helper on Runner; changes are minimal and correct.
flashdreams/flashdreams/plugins/registry.py	Adds discover_postprocess_presets() and resolve_postprocess_preset() using the same defensive entry-point pattern as discover_runners(). resolve_postprocess_preset() re-scans all entry points on every call (P2 efficiency concern).
integrations/omnidreams/omnidreams/runner.py	Adds postprocessing path with VRAM reclamation (del self.pipeline + empty_cache) before VSR; splits canvas preparation into _prepare_canvas_for_write and _postprocess_generated_views. video is already on CPU (chunks collected via .cpu()), so device consistency is maintained.
integrations/causal_forcing/causal_forcing/runner.py	Inserts postprocess_video_tensor().cpu() before MP4 write; generation pipeline is not freed before VSR (previously flagged concern).
integrations/wan21/wan21/runner.py	Replaces bare generated.cpu() with postprocess_video_tensor().cpu(); straightforward and correct.
integrations/flashvsr/tests/test_postprocess.py	New CPU tests with a fake FlashVSR pipeline covering chunk coalescing, tail drop, generate-raises/finalize-still-called, and multi-view rejection. Coverage is thorough for the new session logic.
integrations/flashvsr/pyproject.toml	Adds three flashdreams.postprocess_presets entry points mapping preset slugs to module-level config constants; correctly mirrors the flashdreams.runner_configs pattern.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Runner
    participant postprocess_video_tensor
    participant VideoPostprocessChainConfig
    participant VideoPostprocessChainSession
    participant FlashVSRPostProcessorSession
    participant FlashVSRPipeline

    Runner->>postprocess_video_tensor: tensor, layout, value_range, fps
    postprocess_video_tensor->>VideoPostprocessChainConfig: setup(VideoSpec)
    VideoPostprocessChainConfig->>VideoPostprocessChainConfig: resolved_processors() → discover_postprocess_presets()
    VideoPostprocessChainConfig->>FlashVSRPostProcessorSession: start(spec)
    VideoPostprocessChainConfig-->>postprocess_video_tensor: VideoPostprocessChainSession

    postprocess_video_tensor->>VideoPostprocessChainSession: process(VideoChunk)
    VideoPostprocessChainSession->>FlashVSRPostProcessorSession: process(chunk)
    FlashVSRPostProcessorSession->>FlashVSRPostProcessorSession: _ensure_pipeline() [lazy build]
    FlashVSRPostProcessorSession->>FlashVSRPostProcessorSession: _append_to_buffer()
    FlashVSRPostProcessorSession->>FlashVSRPostProcessorSession: _drain_ready_chunks()
    opt "buffer >= next_target_size"
        FlashVSRPostProcessorSession->>FlashVSRPipeline: generate(ar_idx, cache, clip)
        FlashVSRPipeline-->>FlashVSRPostProcessorSession: upscaled clip
        FlashVSRPostProcessorSession->>FlashVSRPipeline: finalize(ar_idx, cache) [always, via finally]
    end
    FlashVSRPostProcessorSession-->>VideoPostprocessChainSession: list[VideoChunk]
    VideoPostprocessChainSession-->>postprocess_video_tensor: list[VideoChunk]

    postprocess_video_tensor->>VideoPostprocessChainSession: flush()
    opt "tail_policy == replicate_pad"
        FlashVSRPostProcessorSession->>FlashVSRPostProcessorSession: pad tail, run, trim
        FlashVSRPostProcessorSession->>FlashVSRPipeline: generate + finalize
    end
    VideoPostprocessChainSession-->>postprocess_video_tensor: list[VideoChunk]
    postprocess_video_tensor->>postprocess_video_tensor: concatenate_video_chunks(layout, value_range)
    postprocess_video_tensor-->>Runner: Tensor (same layout, upscaled)

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Runner
    participant postprocess_video_tensor
    participant VideoPostprocessChainConfig
    participant VideoPostprocessChainSession
    participant FlashVSRPostProcessorSession
    participant FlashVSRPipeline

    Runner->>postprocess_video_tensor: tensor, layout, value_range, fps
    postprocess_video_tensor->>VideoPostprocessChainConfig: setup(VideoSpec)
    VideoPostprocessChainConfig->>VideoPostprocessChainConfig: resolved_processors() → discover_postprocess_presets()
    VideoPostprocessChainConfig->>FlashVSRPostProcessorSession: start(spec)
    VideoPostprocessChainConfig-->>postprocess_video_tensor: VideoPostprocessChainSession

    postprocess_video_tensor->>VideoPostprocessChainSession: process(VideoChunk)
    VideoPostprocessChainSession->>FlashVSRPostProcessorSession: process(chunk)
    FlashVSRPostProcessorSession->>FlashVSRPostProcessorSession: _ensure_pipeline() [lazy build]
    FlashVSRPostProcessorSession->>FlashVSRPostProcessorSession: _append_to_buffer()
    FlashVSRPostProcessorSession->>FlashVSRPostProcessorSession: _drain_ready_chunks()
    opt "buffer >= next_target_size"
        FlashVSRPostProcessorSession->>FlashVSRPipeline: generate(ar_idx, cache, clip)
        FlashVSRPipeline-->>FlashVSRPostProcessorSession: upscaled clip
        FlashVSRPostProcessorSession->>FlashVSRPipeline: finalize(ar_idx, cache) [always, via finally]
    end
    FlashVSRPostProcessorSession-->>VideoPostprocessChainSession: list[VideoChunk]
    VideoPostprocessChainSession-->>postprocess_video_tensor: list[VideoChunk]

    postprocess_video_tensor->>VideoPostprocessChainSession: flush()
    opt "tail_policy == replicate_pad"
        FlashVSRPostProcessorSession->>FlashVSRPostProcessorSession: pad tail, run, trim
        FlashVSRPostProcessorSession->>FlashVSRPipeline: generate + finalize
    end
    VideoPostprocessChainSession-->>postprocess_video_tensor: list[VideoChunk]
    postprocess_video_tensor->>postprocess_video_tensor: concatenate_video_chunks(layout, value_range)
    postprocess_video_tensor-->>Runner: Tensor (same layout, upscaled)

_{Reviews (6): Last reviewed commit: "Remove unnecessary test" | Re-trigger Greptile}

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Gangzheng Tong <tonggangzheng@gmail.com>

gtong-nv · 2026-06-23T20:01:07Z

/ok to test 91f0c21

Signed-off-by: Gangzheng Tong <gtong@nvidia.com>

Replace the bespoke CLI postprocess mode selector with VideoPostprocessChainConfig.preset discovery, lazy FlashVSR pipeline building, finalize-on-failure handling, and runner-aligned defaults.

gtong-nv · 2026-06-27T00:30:11Z

/ok to test 89aab9c

Add VSR video postprocessing pipeline

d91a390

Signed-off-by: Gangzheng Tong <gtong@nvidia.com>

gtong-nv force-pushed the dev/gtong/vsr branch from 437a8d9 to d91a390 Compare June 23, 2026 16:35

Add OmniDreams postprocessing output path

2f4db46

Signed-off-by: Gangzheng Tong <gtong@nvidia.com>

gtong-nv marked this pull request as ready for review June 23, 2026 18:49

greptile-apps Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread integrations/omnidreams/omnidreams/runner.py Outdated

Update integrations/omnidreams/omnidreams/runner.py

91f0c21

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Gangzheng Tong <tonggangzheng@gmail.com>

Fix OmniDreams postprocess pipeline cleanup type

1879cc1

Signed-off-by: Gangzheng Tong <gtong@nvidia.com>

wilsonCernWq reviewed Jun 25, 2026

View reviewed changes

Merge branch 'main' into dev/gtong/vsr

2ebd133

greptile-apps Bot reviewed Jun 26, 2026

View reviewed changes

Comment thread integrations/flashvsr/flashvsr/postprocess.py Outdated

gtong-nv added 2 commits June 26, 2026 16:05

refactor(postprocess): register presets via entry points

e757172

Replace the bespoke CLI postprocess mode selector with VideoPostprocessChainConfig.preset discovery, lazy FlashVSR pipeline building, finalize-on-failure handling, and runner-aligned defaults.

Remove unnecessary test

89aab9c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add VSR video postprocessing pipeline#316

Add VSR video postprocessing pipeline#316
gtong-nv wants to merge 7 commits into
mainfrom
dev/gtong/vsr

gtong-nv commented Jun 9, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 9, 2026

Uh oh!

greptile-apps Bot commented Jun 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

gtong-nv commented Jun 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gtong-nv commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

gtong-nv commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add VSR Video Postprocessing Pipeline

Summary

Changes

CLI options

Uh oh!

copy-pr-bot Bot commented Jun 9, 2026

Uh oh!

greptile-apps Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

gtong-nv commented Jun 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gtong-nv commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gtong-nv commented Jun 9, 2026 •

edited

Loading

greptile-apps Bot commented Jun 23, 2026 •

edited

Loading