Stars
[ACL 2025] An inference-time decoding strategy with adaptive foresight sampling
official implementation of paper "Process Reward Model with Q-value Rankings"
Code & Dataset for Paper: "Better Process Supervision with Bi-directional Rewarding Signals"
Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)
This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"
Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".
Open-Sora: Democratizing Efficient Video Production for All
Official code repository for Sketch-of-Thought (SoT)
Democratizing Reinforcement Learning for LLMs
Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy