dapo

Process-supervised RL for a multi-step reasoning agent — DAPO + a learned Process Reward Model (PRM) training a Qwen3-8B Planner. A modern, from-scratch rebuild of the AgentFlow paper (ICLR 2026).

machine-learning reinforcement-learning ai-agents trl llm rlhf reward-model qlora ollama qwen llm-fine-tuning unsloth agentic-ai process-reward-model grpo dapo

Updated Jun 2, 2026
Python

VocabVictor / verl-plus

Star

增加verl ascend适配；做一些小的改进

ppo dpo grpo dapo

Updated Nov 29, 2025
Python

StaryMoon / DAPO-Unofficial

Star

Unofficial PyTorch reproduction for DAPO: An Open-Source LLM Reinforcement Learning System at Scale.

reinforcement-learning pytorch reproduction policy-optimization llm-reasoning dapo unofficial-implementation

Updated Jun 10, 2026
Python

mapi-developer / dapo

Star

Simple, zero-dependency tabular data manipulation and analysis for Python.

python data dapo

Updated Dec 1, 2025
Python

Palaeoclimatologygurnard179 / SAGE-GRPO

Star

Enable manifold-aware reinforcement learning for video generation with GRPO to improve exploration and training efficiency

Updated Jun 10, 2026

Improve this page

Add a description, image, and links to the dapo topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the dapo topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dapo

Here are 11 public repositories matching this topic...

opendilab / LightRFT

WangJingyao07 / Awesome-GRPO

saikiranrallabandi / inframind

mbzuai-oryx / MediX-R1

AchoWu / GCPO

teilomillet / materl

awesome-pro / agentflow-pro

VocabVictor / verl-plus

StaryMoon / DAPO-Unofficial

mapi-developer / dapo

Palaeoclimatologygurnard179 / SAGE-GRPO

Improve this page

Add this topic to your repo