8000 BambiSheng (li sheng) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View BambiSheng's full-sized avatar
😴
Sleeping
😴
Sleeping
  • Tsinghua University
  • 11:44 (UTC +08:00)

Highlights

  • Pro

Block or report BambiSheng

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Distributed RL System for LLM Reasoning

Python 1,743 85 Updated Jun 16, 2025

The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.

Python 177 6 Updated Jun 12, 2025

Online RL with Simple Reward Enables Training VLA Models with Only One Trajectory

Python 212 5 Updated May 30, 2025

A Framework for LLM-based Multi-Agent Reinforced Training and Inference

Python 121 3 Updated Jun 13, 2025

The official code implementation for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing"

Python 34 1 Updated Jun 5, 2025

🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!

Python 22,086 2,601 Updated Apr 30, 2025

An elegant PyTorch deep reinforcement learning library.

Python 8,577 1,159 Updated Jun 5, 2025

s1: Simple test-time scaling

Python 6,442 750 Updated May 19, 2025

TTRL: Test-Time Reinforcement Learning

Python 628 45 Updated Jun 6, 2025

Official Repository of "Learning to Reason under Off-Policy Guidance"

Python 229 23 Updated Jun 3, 2025

AIMO2 2nd place solution

Python 58 9 Updated May 28, 2025

Democratizing Reinforcement Learning for LLMs

Jupyter Notebook 3,367 308 Updated May 13, 2025

Understanding R1-Zero-Like Training: A Critical Perspective

Python 985 46 Updated May 24, 2025

Minimal reproduction of DeepSeek R1-Zero

Python 11,899 B1D5 1,491 Updated Apr 24, 2025

llama.cpp for jetpack4.6

C++ 3 Updated Jun 14, 2025

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agent RL)

Python 7,080 686 Updated Jun 16, 2025

😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

246 7 Updated Jun 10, 2025

Awesome RL Reasoning Recipes ("Triple R")

677 39 Updated Jun 15, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 49,709 8,005 Updated Jun 16, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 15,156 2,024 Updated Jun 16, 2025

My learning notes/codes for ML SYS.

Python 2,464 156 Updated Jun 14, 2025
Cuda 4 Updated Jul 29, 2024

Code of paper 'UltraIF: Advancing Instruction Following from the Wild'.

Python 17 2 Updated Apr 3, 2025

从0到1构建一个MiniLLM (pretrain+sft+dpo实践中)

Python 441 55 Updated Mar 23, 2025

Scalable RL solution for advanced reasoning of language models

Python 1,608 94 Updated Mar 18, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 9,486 1,317 Updated Jun 16, 2025
Jupyter Notebook 219 291 Updated Dec 6, 2024

Discrete-time Signal Processing 3rd edition (Oppenheim)

246 82 Updated May 16, 2019

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

Python 35,099 2,849 Updated Jun 15, 2025

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 5,506 597 Updated Jun 13, 2025
Next
0