Skip to content

feat(realesrgan): add video upsampler integration#344

Draft
gtong-nv wants to merge 6 commits into
mainfrom
dev/gtong/realesrgan
Draft

feat(realesrgan): add video upsampler integration#344
gtong-nv wants to merge 6 commits into
mainfrom
dev/gtong/realesrgan

Conversation

@gtong-nv

Copy link
Copy Markdown
Collaborator

Add Real-ESRGAN Upsampler Integration

Summary

Adds a new flashdreams-realesrgan workspace integration for Real-ESRGAN image and video upsampling. The integration provides a reusable Python API, an OpenCV-based realesrgan-upsample CLI, a FlashDreams video postprocessor config, focused CPU tests, and a checkpoint-backed GPU smoke script.

This also adds a generic flashdreams.infra.postprocess video chunk/postprocessor interface so Real-ESRGAN can plug into the same postprocessing shape as the FlashVSR work.

Changes

  • Add integrations/realesrgan with RRDBNet/SRVGG architecture definitions compatible with public Real-ESRGAN checkpoints.
  • Add RealESRGANUpsampler for frame-local RGB tensor upsampling, OpenCV BGR/BGRA/gray image handling, optional tiling, fp16 CUDA inference, and optional torch.compile.
  • Add realesrgan-upsample CLI for image/video files with --compile, --compile-mode, and steady FPS profiling via --profile-warmup-frames.
  • Add RealESRGANPostProcessorConfig for FlashDreams RGB video chunks.
  • Optimize output conversion by emitting contiguous OpenCV-ready BGR/BGRA/gray arrays directly from tensors.
  • Add integrations/realesrgan/scripts/gpu_smoke.py for real checkpoint/CUDA validation.
  • Add third-party notices for Real-ESRGAN, BasicSR, and the optional OpenCV dependency.
  • Register the new integration in workspace lock/type-check paths.

Notes

  • The source port intentionally avoids copying the internal proprietary-header RealESRGAN source verbatim. The implementation is Apache-2.0 and references public Real-ESRGAN/BasicSR architecture compatibility.
  • Real-ESRGAN is frame-local and does not need temporal chunks or future frames. The video CLI reads, upscales, profiles, and writes frames sequentially.
  • The new standalone CLI preserves full 2x output dimensions. FlashVSR may crop output dimensions to 128-multiples and may drop trailing partial chunks in its standalone runner.

gtong-nv and others added 3 commits June 22, 2026 23:51
Signed-off-by: Gangzheng Tong <gtong@nvidia.com>
Signed-off-by: Gangzheng Tong <gtong@nvidia.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Gangzheng Tong <tonggangzheng@gmail.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 23, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

gtong-nv added 3 commits June 23, 2026 20:16
Signed-off-by: Gangzheng Tong <gtong@nvidia.com>
Signed-off-by: Gangzheng Tong <gtong@nvidia.com>
Signed-off-by: Gangzheng Tong <gtong@nvidia.com>
@gtong-nv gtong-nv force-pushed the dev/gtong/realesrgan branch from d832ae7 to b794ddb Compare June 24, 2026 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant