8000 zheng-z18 / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View zheng-z18's full-sized avatar

Block or report zheng-z18

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ACL 2025] An inference-time decoding strategy with adaptive foresight sampling

Python 92 7 Updated May 18, 2025

official implementation of paper "Process Reward Model with Q-value Rankings"

Python 58 6 Updated Feb 5, 2025

Code & Dataset for Paper: "Better Process Supervision with Bi-directional Rewarding Signals"

Python 6 Updated Mar 9, 2025

Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"

Python 118 5 Updated May 6, 2025
Python 24 Updated Apr 1, 2025

ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)

Python 629 48 Updated Jan 20, 2025

This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"

Python 64 2 Updated Apr 22, 2025

Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".

94 3 Updated Mar 12, 2025

Open-Sora: Democratizing Efficient Video Production for All

Python 4F6C 26,520 2,572 Updated Apr 30, 2025

Official code repository for Sketch-of-Thought (SoT)

Python 115 23 Updated May 8, 2025

Reproducing R1 for Code with Reliable Rewards

Python 198 13 Updated May 5, 2025

Democratizing Reinforcement Learning for LLMs

Jupyter Notebook 3,289 304 Updated May 13, 2025

Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy

Python 1,164 67 Updated May 20, 2025
0