QPHutu

Qi Penghui QPHutu

10 followers · 0 following

Achievements

Stars

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 5,136 343 Updated Jun 2, 2025

project-numina / aimo-progress-prize

Jupyter Notebook 439 32 Updated Jul 22, 2024

sail-sg / AnytimeReasoner

Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Python 36 2 Updated May 27, 2025

pytorch-labs / attention-gym

Helpful tools and examples for working with flex-attention

Python 810 48 Updated May 30, 2025

xbresson / CS5242_2025

NUS CS5242 Neural Networks and Deep Learning, Xavier Bresson, 2025

Jupyter Notebook 386 98 Updated Apr 19, 2025

sail-sg / understand-r1-zero

Understanding R1-Zero-Like Training: A Critical Perspective

Python 963 44 Updated May 24, 2025

deepseek-ai / DualPipe

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,794 296 Updated Mar 10, 2025

Open-Reasoner-Zero / Open-Reasoner-Zero

Official Repo for Open-Reasoner-Zero

Python 1,944 101 Updated Jun 2, 2025

lsdefine / simple_GRPO

A very simple GRPO implement for reproducing r1-like LLM thinking.

Python 1,082 86 Updated Apr 3, 2025

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 958 61 Updated May 28, 2025

Jiayi-Pan / TinyZero

Minimal reproduction of DeepSeek R1-Zero

Python 11,847 1,490 Updated Apr 24, 2025

agentica-project / rllm

Democratizing Reinforcement Learning for LLMs

Jupyter Notebook 3,309 306 Updated May 13, 2025

eddycmu / demystify-long-cot

Python 293 18 Updated May 31, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 8,884 1,113 Updated Jun 3, 2025

hkust-nlp / simpleRL-reason

Simple RL training for reasoning

Python 3,604 270 Updated Apr 10, 2025

simplescaling / s1

s1: Simple test-time scaling

Python 6,419 749 Updated May 19, 2025

bespokelabsai / curator

Synthetic data curation for post-training and structured data extraction

Python 1,366 106 Updated Jun 2, 2025

hendrycks / math

The MATH Dataset (NeurIPS 2021)

Python 1,123 99 Updated Aug 5, 2024

openai / prm800k

800,000 step-level correctness labels on LLM solutions to MATH problems

Python 2,001 118 Updated Jun 1, 2023

huggingface / peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python 18,631 1,890 Updated Jun 2, 2025

google / flax

Flax is a neural network library for JAX that is designed for flexibility.

Jupyter Notebook 6,590 704 Updated Jun 2, 2025

luchris429 / purejaxrl

Really Fast End-to-End Jax RL Implementations

Python 880 69 Updated Sep 9, 2024

google-deepmind / mctx

Monte Carlo tree search in JAX

Python 2,492 204 Updated Apr 10, 2025

Hwhitetooth / jax_muzero

An implementation of MuZero in JAX.

Python 56 8 Updated Nov 8, 2022

sotetsuk / pgx

♟️ Vectorized RL game environments in JAX

Python 480 34 Updated Mar 6, 2025

srush / awesome-o1

A bibliography and survey of the papers surrounding o1

TeX 1,193 50 Updated Nov 16, 2024

sail-sg / oat

🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

Python 370 27 Updated Jun 3, 2025

google-deepmind / boxoban-levels

This repository contains levels for boxoban, a box-pushing puzzle game inspired by Sokoban.

73 21 Updated Dec 28, 2022

mpSchrader / gym-sokoban

Sokoban environment for OpenAI Gym

Python 369 86 Updated Nov 8, 2023

richards199999 / Thinking-Claude

Let your Claude able to think

TypeScript 15,203 1,766 Updated Mar 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qi Penghui QPHutu

Achievements

Achievements

Block or report QPHutu

Stars

linkedin / Liger-Kernel

project-numina / aimo-progress-prize

sail-sg / AnytimeReasoner

pytorch-labs / attention-gym

xbresson / CS5242_2025

sail-sg / understand-r1-zero

deepseek-ai / DualPipe

Open-Reasoner-Zero / Open-Reasoner-Zero

lsdefine / simple_GRPO

bytedance / flux

Jiayi-Pan / TinyZero

agentica-project / rllm

eddycmu / demystify-long-cot

volcengine / verl

hkust-nlp / simpleRL-reason

simplescaling / s1

bespokelabsai / curator

hendrycks / math

openai / prm800k

huggingface / peft

google / flax

luchris429 / purejaxrl

google-deepmind / mctx

Hwhitetooth / jax_muzero

sotetsuk / pgx

srush / awesome-o1

sail-sg / oat

google-deepmind / boxoban-levels

mpSchrader / gym-sokoban

richards199999 / Thinking-Claude