8000 MaxwellJryao (Jiarui Yao) · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View MaxwellJryao's full-sized avatar
  • Tsinghua University

Highlights

  • Pro

Block or report MaxwellJryao

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. Online-DPO-R1 Online-DPO-R1 Public

    Forked from RLHFlow/Online-DPO-R1

    Codebase for Iterative DPO Using Rule-based Rewards

    Python

  2. RLHFlow/RLHF-Reward-Modeling RLHFlow/RLHF-Reward-Modeling Public

    Recipes to train reward model for RLHF.

    Python 1.4k 99

  3. shizhediao/Post-Training-Data-Flywheel shizhediao/Post-Training-Data-Flywheel Public

    We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.

    Python 56 5

0