zheng-z18

zheng-z18

2 followers · 0 following

Stars

xufangzhi / phi-Decoding

[ACL 2025] An inference-time decoding strategy with adaptive foresight sampling

Python 92 7 Updated May 18, 2025

WindyLee0822 / Process_Q_Model

official implementation of paper "Process Reward Model with Q-value Rankings"

Python 58 6 Updated Feb 5, 2025

chenwxOggai / BiRM

Code & Dataset for Paper: "Better Process Supervision with Bi-directional Rewarding Signals"

Python 6 Updated Mar 9, 2025

CJReinforce / PURE

Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"

Python 118 5 Updated May 6, 2025

NJUNLP / R-PRM

Python 24 Updated Apr 1, 2025

THUDM / ReST-MCTS

ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)

Python 629 48 Updated Jan 20, 2025

NineAbyss / S2R

This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"

Python 64 2 Updated Apr 22, 2025

CMU-AIRe / MRT

Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".

94 3 Updated Mar 12, 2025

hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Python 4F6C 26,520 2,572 Updated Apr 30, 2025

SimonAytes / SoT

Official code repository for Sketch-of-Thought (SoT)

Python 115 23 Updated May 8, 2025

ganler / code-r1

Reproducing R1 for Code with Reliable Rewards

Python 198 13 Updated May 5, 2025

agentica-project / rllm

Democratizing Reinforcement Learning for LLMs

Jupyter Notebook 3,289 304 Updated May 13, 2025

xhluca / bm25s

Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy

Python 1,164 67 Updated May 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly