Stars
Monitoring recent cross-research on LLM & RL on arXiv for control. If there are good papers, PRs are welcome.
A comprehensive collection of process reward models.
Code and implementations for the paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhiheng Xi et al.
Awesome-LLM: a curated list of Large Language Model
Curated list of datasets and tools for post-training.
A live stream development of RL tunning for LLM agents
Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。
VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework
ByteCheckpoint: An Unified Checkpointing Library for LFMs
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…
An Open-source RL System from ByteDance Seed and Tsinghua AIR
A series of technical report on Slow Thinking with LLM
Democratizing Reinforcement Learning for LLMs
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Official Repo for Open-Reasoner-Zero
My learning notes/codes for ML SYS.
✔(已完结)最全面的 深度学习 笔记【土堆 Pytorch】【李沐 动手学深度学习】【吴恩达 深度学习】