Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…

Python 8,182 707 Updated Jun 19, 2025

BytedTsinghua-SIA / DAPO

An Open-source RL System from ByteDance Seed and Tsinghua AIR

Python 1,357 57 Updated May 11, 2025

RUCAIBox / Slow_Thinking_with_LLMs

A series of technical report on Slow Thinking with LLM

Python 699 39 Updated Jun 9, 2025

eddycmu / demystify-long-cot

Python 297 18 Updated May 31, 2025

bruno686 / Awesome-RL-based-LLM-Reasoning

Awesome RL-based LLM Reasoning

523 27 Updated May 4, 2025

All-Hands-AI / OpenHands

🙌 OpenHands: Code Less, Make More

Python 58,658 6,754 Updated Jun 19, 2025

huggingface / Math-Verify

Python 777 34 Updated Apr 28, 2025

simplescaling / s1

s1: Simple test-time scaling

Python 6,453 749 Updated May 19, 2025

agentica-project / rllm

Democratizing Reinforcement Learning for LLMs

Jupyter Notebook 3,384 311 Updated May 13, 2025

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,826 278 Updated May 15, 2025

Open-Reasoner-Zero / Open-Reasoner-Zero

Official Repo for Open-Reasoner-Zero

Python 1,969 104 Updated Jun 2, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Python 2,520 160 Updated Jun 18, 2025

AccumulateMore / CV

✔（已完结）最全面的深度学习笔记【土堆 Pytorch】【李沐动手学深度学习】【吴恩达深度学习】

Jupyter Notebook 11,021 1,330 Updated Jun 17, 2025

wangshusen / DRL

Deep Reinforcement Learning

3,965 633 Updated Dec 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

waterwaterrr

Achievements

Achievements

Block or report waterwaterrr

Stars

WindyLab / LLM-RL-Papers

inclusionAI / AReaL

RyanLiu112 / Awesome-Process-Reward-Models

WooooDyy / AgentGym

Hannibal046 / Awesome-LLM

mlabonne / llm-datasets

OpenManus / OpenManus-RL

safety-research / circuit-tracer

NJU-RL / GLIDER

hkust-nlp / Laser

Zeyi-Lin / HivisionIDPhotos

malody2014 / llm_benchmark

KCORES / kcores-llm-arena

ByteDance-Seed / VeOmni

ByteDance-Seed / ByteCheckpoint

Eclipsess / Awesome-Efficient-Reasoning-LLMs

modelscope / ms-swift