8000 MaxwellJryao (Jiarui Yao) · GitHub

More Web Proxy on the site http://driver.im/

MaxwellJryao

Follow

Jiarui Yao MaxwellJryao

Follow

CS PhD at UIUC, Former B.Eng. at IIIS, Tsinghua University

16 followers · 28 following

Tsinghua University

Achievements

Achievements

Highlights

Pro

Pinned Loading

Online-DPO-R1 Online-DPO-R1 Public

Forked from RLHFlow/Online-DPO-R1

Codebase for Iterative DPO Using Rule-based Rewards

Python
RLHFlow/RLHF-Reward-Modeling RLHFlow/RLHF-Reward-Modeling Public

Recipes to train reward model for RLHF.

Python 1.4k 99
shizhediao/Post-Training-Data-Flywheel shizhediao/Post-Training-Data-Flywheel Public

We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.

Python 56 5

0