10000 QPHutu (Qi Penghui) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View QPHutu's full-sized avatar

Block or report QPHutu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Efficient Triton Kernels for LLM Training

Python 5,136 343 Updated Jun 2, 2025
Jupyter Notebook 439 32 Updated Jul 22, 2024

Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Python 36 2 Updated May 27, 2025

Helpful tools and examples for working with flex-attention

Python 810 48 Updated May 30, 2025

NUS CS5242 Neural Networks and Deep Learning, Xavier Bresson, 2025

Jupyter Notebook 386 98 Updated Apr 19, 2025

Understanding R1-Zero-Like Training: A Critical Perspective

Python 963 44 Updated May 24, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,794 296 Updated Mar 10, 2025

Official Repo for Open-Reasoner-Zero

Python 1,944 101 Updated Jun 2, 2025

A very simple GRPO implement for reproducing r1-like LLM thinking.

Python 1,082 86 Updated Apr 3, 2025

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 958 61 Updated May 28, 2025

Minimal reproduction of DeepSeek R1-Zero

Python 11,847 1,490 Updated Apr 24, 2025

Democratizing Reinforcement Learning for LLMs

Jupyter Notebook 3,309 306 Updated May 13, 2025
Python 293 18 Updated May 31, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 8,884 1,113 Updated Jun 3, 2025

Simple RL training for reasoning

Python 3,604 270 Updated Apr 10, 2025

s1: Simple test-time scaling

Python 6,419 749 Updated May 19, 2025

Synthetic data curation for post-training and structured data extraction

Python 1,366 106 Updated Jun 2, 2025

The MATH Dataset (NeurIPS 2021)

Python 1,123 99 Updated Aug 5, 2024

800,000 step-level correctness labels on LLM solutions to MATH problems

Python 2,001 118 Updated Jun 1, 2023

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python 18,631 1,890 Updated Jun 2, 2025

Flax is a neural network library for JAX that is designed for flexibility.

Jupyter Notebook 6,590 704 Updated Jun 2, 2025

Really Fast End-to-End Jax RL Implementations

Python 880 69 Updated Sep 9, 2024

Monte Carlo tree search in JAX

Python 2,492 204 Updated Apr 10, 2025

An implementation of MuZero in JAX.

Python 56 8 Updated Nov 8, 2022

♟️ Vectorized RL game environments in JAX

Python 480 34 Updated Mar 6, 2025

A bibliography and survey of the papers surrounding o1

TeX 1,193 50 Updated Nov 16, 2024

🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

Python 370 27 Updated Jun 3, 2025

This repository contains levels for boxoban, a box-pushing puzzle game inspired by Sokoban.

73 21 Updated Dec 28, 2022

Sokoban environment for OpenAI Gym

Python 369 86 Updated Nov 8, 2023

Let your Claude able to think

TypeScript 15,203 1,766 Updated Mar 10, 2025
Next
0