8000 BambiSheng (li sheng) / Starred · GitHub

More Web Proxy on the site http://driver.im/

BambiSheng

Follow

😴

Sleeping

li sheng BambiSheng

😴

Sleeping

Follow

6 followers · 11 following

Tsinghua University
11:44 (UTC +08:00)

Achievements

Achievements

Highlights

Pro

Lists (4)

Sort

Awesome Tool

Awesome Tutorial

Infracture

Scalable RL for Language Model

11 repositories

Stars

inclusionAI / AReaL

Distributed RL System for LLM Reasoning

Python 1,743 85 Updated Jun 16, 2025

PRIME-RL / Entropy-Mechanism-of-RL

The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.

Python 177 6 Updated Jun 12, 2025

PRIME-RL / SimpleVLA-RL

Online RL with Simple Reward Enables Training VLA Models with Only One Trajectory

Python 212 5 Updated May 30, 2025

TsinghuaC3I / MARTI

A Framework for LLM-based Multi-Agent Reinforced Training and Inference

Python 121 3 Updated Jun 13, 2025

thu-nics / R2R

The official code implementation for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing"

Python 34 1 Updated Jun 5, 2025

jingyaogong / minimind

🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT！🌏 Train a 26M-parameter GPT from scratch in just 2h!

Python 22,086 2,601 Updated Apr 30, 2025

thu-ml / tianshou

An elegant PyTorch deep reinforcement learning library.

Python 8,577 1,159 Updated Jun 5, 2025

simplescaling / s1

s1: Simple test-time scaling

Python 6,442 750 Updated May 19, 2025

PRIME-RL / TTRL

TTRL: Test-Time Reinforcement Learning

Python 628 45 Updated Jun 6, 2025

ElliottYan / LUFFY

Official Repository of "Learning to Reason under Off-Policy Guidance"

Python 229 23 Updated Jun 3, 2025

imagination-research / aimo2

AIMO2 2nd place solution

Python 58 9 Updated May 28, 2025

agentica-project / rllm

Democratizing Reinforcement Learning for LLMs

Jupyter Notebook 3,367 308 Updated May 13, 2025

sail-sg / understand-r1-zero

Understanding R1-Zero-Like Training: A Critical Perspective

Python 985 46 Updated May 24, 2025

Jiayi-Pan / TinyZero

Minimal reproduction of DeepSeek R1-Zero

Python 11,899 B1D5 1,491 Updated Apr 24, 2025

Z841973620 / llama.cpp-tegra

llama.cpp for jetpack4.6

C++ 3 Updated Jun 14, 2025

OpenRLHF / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agent RL)

Python 7,080 686 Updated Jun 16, 2025

XiaoYee / Awesome_Efficient_LRM_Reasoning

😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

246 7 Updated Jun 10, 2025

TsinghuaC3I / Awesome-RL-Reasoning-Recipes

Awesome RL Reasoning Recipes ("Triple R")

677 39 Updated Jun 15, 2025

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 49,709 8,005 Updated Jun 16, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 15,156 2,024 Updated Jun 16, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Python 2,464 156 Updated Jun 14, 2025

A-suozhang / cuda_learning

Cuda 4 Updated Jul 29, 2024

kkk-an / UltraIF

Code of paper 'UltraIF: Advancing Instruction Following from the Wild'.

Python 17 2 Updated Apr 3, 2025

Tongjilibo / build_MiniLLM_from_scratch

从0到1构建一个MiniLLM (pretrain+sft+dpo实践中)

Python 441 55 Updated Mar 23, 2025

PRIME-RL / PRIME

Scalable RL solution for advanced reasoning of language models

Python 1,608 94 Updated Mar 18, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 9,486 1,317 Updated Jun 16, 2025

berkeleydeeprlcourse / homework_fall2023

Jupyter Notebook 219 291 Updated Dec 6, 2024

haozheji / Discrete-time-Signal-Processing-Solution

Discrete-time Signal Processing 3rd edition (Oppenheim)

246 82 Updated May 16, 2019

opendatalab / MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具，将PDF转换成Markdown和JSON格式。

Python 35,099 2,849 Updated Jun 15, 2025

open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 5,506 597 Updated Jun 13, 2025

0