CS PhD at UIUC,
Former B.Eng. at IIIS, Tsinghua University
-
Tsinghua University
Highlights
- Pro
Pinned Loading
-
Online-DPO-R1
Online-DPO-R1 PublicForked from RLHFlow/Online-DPO-R1
Codebase for Iterative DPO Using Rule-based Rewards
Python
-
RLHFlow/RLHF-Reward-Modeling
RLHFlow/RLHF-Reward-Modeling PublicRecipes to train reward model for RLHF.
-
shizhediao/Post-Training-Data-Flywheel
shizhediao/Post-Training-Data-Flywheel PublicWe aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.