- Tsinghua University
-
11:44
(UTC +08:00)
Highlights
- Pro
Lists (4)
Sort Name ascending (A-Z)
Stars
The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
Online RL with Simple Reward Enables Training VLA Models with Only One Trajectory
A Framework for LLM-based Multi-Agent Reinforced Training and Inference
The official code implementation for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing"
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
An elegant PyTorch deep reinforcement learning library.
Official Repository of "Learning to Reason under Off-Policy Guidance"
Democratizing Reinforcement Learning for LLMs
Understanding R1-Zero-Like Training: A Critical Perspective
Minimal reproduction of DeepSeek R1-Zero
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agent RL)
😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Awesome RL Reasoning Recipes ("Triple R")
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
My learning notes/codes for ML SYS.
Code of paper 'UltraIF: Advancing Instruction Following from the Wild'.
从0到1构建一个MiniLLM (pretrain+sft+dpo实践中)
Scalable RL solution for advanced reasoning of language models
verl: Volcano Engine Reinforcement Learning for LLMs
Discrete-time Signal Processing 3rd edition (Oppenheim)
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.