Official implementation of RiGS: Rigid-aware 4D Gaussian Splatting from a Single Monocular Video.
RiGS reconstructs dynamic scenes from a monocular video by combining static, dynamic, transient, and rigid-aware 4D Gaussian components. The codebase includes preprocessing utilities for video/frame conversion and TAPIR/TAP-Net tracks, training and evaluation entry points, rendering utilities, and an interactive Gaussian viewer.
Clone the repository with its ViPE submodule:
git clone https://github.com/ladvu/RiGS.git --recursive
cd RiGSIf the repository was cloned without --recursive, initialize the submodule manually:
git submodule update --init --recursiveCreate and activate a conda environment, then install the Python dependencies:
conda create -n rigs python=3.10 -y
conda activate rigs
pip install -r requirements.txtInstall the our modified version of ViPE dependency used for monocular preprocessing. We have integrated dynamic mask prediction into ViPE system.
pip install -e dependencies/vipe --no-build-isolationDownload the TAPIR/BootsTAPIR checkpoint expected by scripts/infer_tapnet.py:
cd scripts/checkpoints
bash download_ckpts.sh
cd ../..The default requirements target CUDA 12.x. Make sure your PyTorch, CUDA toolkit, and GPU driver versions are compatible with the pinned packages in requirements.txt.
RiGS expects each scene under a data root such as data/custom:
data/custom/
videos/<scene>.mp4
images/<scene>/00000.png
depth/<scene>.zip
flow/<scene>.zip
flow_consistency/<scene>.zip
intrinsics/<scene>.npz
pose/<scene>.npz
static_mask/<scene>.zip
tapnet/<scene>/00000_00000.npy
For custom monocular videos, place videos in data/custom/videos. The helper below converts between videos and image sequences when one representation is missing:
python scripts/video_image_pair.py --data_root data/custom --fps 12The script scripts/custom/prepare_videos_custom.sh shows the intended preprocessing pipeline:
- Convert videos/images with
scripts/video_image_pair.py. - Run ViPE to generate depth, camera poses, intrinsics, optical flow, flow consistency, and static masks.
- Run TAPIR/BootsTAPIR to generate 2D tracks under
tapnet/<scene>/.
Before using that script, edit data_root, output_dir, and the check_input, run_vipe, and run_tapnet toggles at the top of the file.
For the NVIDIA Dynamic Scenes dataset, refer to the dataset page: https://gorokee.github.io/jsyoon/dynamic_synth/
For standalone script of dynamic mask extraction, refer to repo: https://github.com/ladvu/EasyMoSeg.git
The main entry point is src/main.py. Configuration is defined in src/config/config.py and exposed as command-line flags through Tyro.
Train on the default example scene:
cd src
python main.py \
--data_dir ../data/custom \
--val_dir ../data/custom \
--data_name dog-example \
--exp_name dog-example \
--result_dir ../results \
--cache_dir ../cache \
--save_steps 15000 30000Or run the custom training script after editing its scene paths:
bash scripts/custom/reconstruct_custom.shTraining writes outputs to:
results/<exp_name>/
ckpts/ # model checkpoints
renders/ # rendered videos/images
stats/ # train/eval JSON metrics
plys/ # initial point clouds
tb/ # TensorBoard logs
visuals/ # preprocessing visualizations
Evaluate or render from a checkpoint:
cd src
# Evaluation
python main.py --ckpt /path/to/ckpt.pt --eval_step 30000
# Render videos
python main.py --ckpt /path/to/ckpt.pt --render_video
Launch the interactive viewer:
cd src
python run_viewer.py \
--ckpt ../results/dog-example/ckpts/ckpt_29999.pt \
--port 8080 \
--output_dir ../viewer_outputsThen open the viewer URL printed in the terminal.
Useful flags include:
--data_factor: image downsample factor.--max_steps: total training steps.--steps_scaler: scales major schedule values together.--batch_size: training batch size.--num_fg: number of foreground/dynamic Gaussians.--num_motion_bases: number of motion bases.--pose_opt: enable camera pose optimization during training.--test_time_pose_opt: enable test-time pose alignment for evaluation/rendering.--render_trajectories: render trajectories such asarc,lemniscate,spiral,wander, andfixed.--render_types: render components such asfused,static,dynamic,transient, andrigid.
If you find this work useful, please cite:
@misc{wu2026rigsrigidaware4dgaussian,
title={RiGS: Rigid-aware 4D Gaussian Splatting from a Single Monocular Video},
author={Chenyu Wu and Wanhua Li and Zhu-Tian Chen and Hanspeter Pfister},
year={2026},
eprint={2605.23672},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2605.23672},
}