LightRFT: Light, Efficient, Omni-modal & Reward-model Driven Reinforcement Fine-Tuning Framework
-
Updated
Apr 29, 2026 - Python
LightRFT: Light, Efficient, Omni-modal & Reward-model Driven Reinforcement Fine-Tuning Framework
Codebase of GRPO: Implementations and Resources of GRPO and Its Variants
Open Ended Medical Reinforcement Learning
Process-supervised RL for a multi-step reasoning agent — DAPO + a learned Process Reward Model (PRM) training a Qwen3-8B Planner. A modern, from-scratch rebuild of the AgentFlow paper (ICLR 2026).
Unofficial PyTorch reproduction for DAPO: An Open-Source LLM Reinforcement Learning System at Scale.
Enable manifold-aware reinforcement learning for video generation with GRPO to improve exploration and training efficiency
Add a description, image, and links to the dapo topic page so that developers can more easily learn about it.
To associate your repository with the dapo topic, visit your repo's landing page and select "manage topics."