jianzhu

🎯

Focusing

steve jianzhu

🎯

Focusing

85 followers · 20 following

Beijing, China

Achievements

Stars

dvlab-research / ARPO

Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay

Python 73 5 Updated May 29, 2025

yfzhang114 / r1_reward

✨✨R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

Python 141 7 Updated May 9, 2025

MoonshotAI / Kimi-VL

Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities

878 40 Updated Apr 20, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 9,115 1,177 Updated Jun 8, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 24,704 2,286 Updated Jun 2, 2025

sail-sg / understand-r1-zero

Understanding R1-Zero-Like Training: A Critical Perspective

Python 974 45 Updated May 24, 2025

lukDev / awr_pytorch

PyTorch implementation of AWR.

Python 4 1 Updated Apr 29, 2020

NVlabs / COAT

[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training

Python 205 14 Updated May 19, 2025

DigiRL-agent / digiq

Python 102 8 Updated Apr 8, 2025

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,806 278 Updated May 15, 2025

simplescaling / s1

s1: Simple test-time scaling

Python 6,429 749 Updated May 19, 2025

deepseek-ai / DeepSeek-R1

89,958 11,618 Updated Apr 9, 2025

LeslieTrue / SFTvsRL

Official implementation of paper: SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Python 280 15 Updated Apr 28, 2025

deepseek-ai / DeepSeek-V3

Python 97,479 15,836 Updated Apr 9, 2025

princeton-nlp / LESS

[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning

Jupyter Notebook 454 46 Updated Oct 20, 2024

karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 41,720 6,958 Updated Dec 9, 2024

nikhilvyas / SOAP

Python 185 11 Updated Dec 2, 2024

MARIO-Math-Reasoning / Super_MARIO

Python 330 21 Updated Jun 5, 2025

hijkzzz / Awesome-LLM-Strawberry

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.

6,744 376 Updated Jun 5, 2025

sangmichaelxie / doremi

< 89BF /div>

Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets

HTML 328 33 Updated Dec 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

steve jianzhu

Achievements

Achievements

Block or report jianzhu

Stars

dvlab-research / ARPO

yfzhang114 / r1_reward

MoonshotAI / Kimi-VL

volcengine / verl

huggingface / open-r1

sail-sg / understand-r1-zero

lukDev / awr_pytorch

NVlabs / COAT

DigiRL-agent / digiq

deepseek-ai / open-infra-index

simplescaling / s1

deepseek-ai / DeepSeek-R1

LeslieTrue / SFTvsRL

deepseek-ai / DeepSeek-V3

princeton-nlp / LESS

karpathy / nanoGPT

nikhilvyas / SOAP

MARIO-Math-Reasoning / Super_MARIO

hijkzzz / Awesome-LLM-Strawberry

sangmichaelxie / doremi

zhentingqi / rStar

gpt-omni / mini-omni

facebookresearch / audiocraft

datawhalechina / leedl-tutorial

google-deepmind / asyncdiloco

linkedin / Liger-Kernel

X-LANCE / UniCATS-CTX-vec2wav

LTH14 / mar

facebookresearch / seamless_communication

fixie-ai / ultravox