Stars
Efficient Triton Kernels for LLM Training
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Helpful tools and examples for working with flex-attention
NUS CS5242 Neural Networks and Deep Learning, Xavier Bresson, 2025
Understanding R1-Zero-Like Training: A Critical Perspective
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
Official Repo for Open-Reasoner-Zero
A very simple GRPO implement for reproducing r1-like LLM thinking.
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
Minimal reproduction of DeepSeek R1-Zero
Democratizing Reinforcement Learning for LLMs
verl: Volcano Engine Reinforcement Learning for LLMs
Synthetic data curation for post-training and structured data extraction
800,000 step-level correctness labels on LLM solutions to MATH problems
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Flax is a neural network library for JAX that is designed for flexibility.
Really Fast End-to-End Jax RL Implementations
A bibliography and survey of the papers surrounding o1
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
This repository contains levels for boxoban, a box-pushing puzzle game inspired by Sokoban.
Let your Claude able to think