8000 huoliangyu (Xuntian) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View huoliangyu's full-sized avatar

Block or report huoliangyu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Copy large files to multiple machines

C++ 4 Updated Dec 1, 2024

Distributed RL System for LLM Reasoning

Python 1,273 58 Updated May 21, 2025

Making large AI models cheaper, faster and more accessible

Python 40,897 4,512 Updated May 23, 2025

Democratizing Reinforcement Learning for LLMs

Jupyter Notebook 3,286 305 Updated May 13, 2025

ReasonFlux: The Open-Source Strong Reasoning Model Series

Python 390 30 Updated May 9, 2025

Simple RL training for reasoning

Python 3,583 266 Updated Apr 10, 2025

Fully open reproduction of DeepSeek-R1

Python 24,523 2,259 Updated May 23, 2025

DeepSeek R1 distilled into smaller OSS models

Python 12 4 Updated Jan 21, 2025

OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning

Python 140 3 Updated Dec 24, 2024
Python 47 3 Updated Dec 17, 2024

添加了simpo方法的OpenRLHF,个人修改,原仓库链接:https://github.com/OpenLLMAI/OpenRLHF

Python 8 Updated Jun 19, 2024

This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.

Jupyter Notebook 312 30 Updated Aug 6, 2024

AN O1 REPLICATION FOR CODING

Python 335 22 Updated Dec 11, 2024

Awesome Reinforcement Fine Tuning

3 Updated Dec 8, 2024

Lamorel is a Python library designed for RL practitioners eager to use Large Language Models (LLMs).

Python 233 21 Updated Nov 5, 2024

A framework for few-shot evaluation of autoregressive language models.

Python 151 47 Updated Sep 13, 2024

Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"

Python 156 15 Updated Nov 11, 2024

A library for advanced large language model reasoning

Python 2,130 188 Updated Apr 9, 2025

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.

6,732 375 Updated May 13, 2025

implement reinforcement learning(RL)and chain of thought(COT)like o1.

Python 1 Updated Oct 6, 2024

Large Reasoning Models

Python 804 45 Updated Dec 3, 2024

[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$

Python 45 2 Updated Oct 23, 2024

O1 Replication Journey

1,992 65 Updated Jan 14, 2025

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

Python 1,772 134 Updated Jan 17, 2025

SFT/Reward Model/DPO/SPO

Python 1 1 Updated May 30, 2024
Python 9 Updated Jan 4, 2024

The Open Assistant API is a ready-to-use, open-source, self-hosted agent/gpts orchestration creation framework, supporting customized extensions for LLM, RAG, function call, and tools capabilities.…

Python 338 84 Updated Mar 21, 2025

北京大学博士后研究工作报告 LaTeX 模板

TeX 15 Updated Mar 13, 2023

大连海事大学博士后研究工作报告模版,基于中科大学位论文latex模版修改

TeX 1 Updated Nov 14, 2023

RLHF implementation details of OAI's 2019 codebase

Python 187 9 Updated Jan 14, 2024
Next
0