This repository contains the official implementation of Anchoring and Rescaling Attention for Semantically Coherent Inbetweening, a training-free approach for text-conditioned generative inbetweening that improves semantic fidelity, frame consistency, and pace stability. Given the first frame, last frame, and a text prompt, our method generates semantically coherent intermediate frames while enhancing semantic alignment, temporal consistency, and motion pacing without additional model training. We also introduce TGI-Bench, a benchmark for evaluating text-conditioned generative inbetweening across diverse sequence lengths and motion scenarios.
The TGI-Bench dataset is available on Hugging Face here.
We recommend using a conda environment.
Python 3.10 or higher is required.
conda create -n tgi python=3.10
conda activate tgipip install -r requirements.txtOnce this is done, the environment setup is complete.
To run inference with the default settings:
python inference.pyYou can customize inference with additional arguments:
python inference.py \
--prompt "A freight train moves forward through heavy falling snow." \
--img_first example/first.jpg \
--img_last example/last.jpg \
--seed 0 \
--num_frames 81 \
--w_edge 8 \
--s_edge 1.06 \
--s_mid 0.94 \
--beta_end 0.7 \
--beta_mid 0.3--prompt: text prompt--img_first: path to the first frame--img_last: path to the last frame--seed: random seed--num_frames: number of frames (25,33,65,81)--w_edge: width of the fast region near both ends--s_edge: scaling parameter near keyframes--s_mid: scaling parameter for middle frames--beta_end: endpoint weighting parameter--beta_mid: middle-region weighting parameter
If not specified, default example values are used.
