Hi, authors of FlashVSR. Thanks for contributing so excellent work for the community.
But I still meet some problems when using it on some extremely bad cases, So I would like to fine-tune it.
And thus I am trying to reproduce the training code, but I got some problems during reproducing stage 2, and I hope you can share some details about that:
- LR_proj module is converted into Causal one for only inference or both inference and training?
- generate_draft_block_mask in wan_video_dit.py is only for inference? As it can be derivable. Do you use other mask strategy during your training?
Hi, authors of FlashVSR. Thanks for contributing so excellent work for the community.
But I still meet some problems when using it on some extremely bad cases, So I would like to fine-tune it.
And thus I am trying to reproduce the training code, but I got some problems during reproducing stage 2, and I hope you can share some details about that: