8000 huoliangyu (Xuntian) / Starred · GitHub

More Web Proxy on the site http://driver.im/

huoliangyu

Follow

Xuntian huoliangyu

Follow

4 followers · 5 following

Achievements

Achievements

Lists (15)

Sort

clean code

dataset

evaluation LLM

latex2word

Tools to convert *.tex to MS Word *.doc

LLM agent

NLP

other tools

student resources with edu email

11 repositories

RL codebase

RLHF

41 repositories

rllib相关

traffic4cast

transformer

alphastar tranformer

教程

星际争霸

网易比赛

Stars

kedixa / fcopy

Copy large files to multiple machines

C++ 4 Updated Dec 1, 2024

inclusionAI / AReaL

Distributed RL System for LLM Reasoning

Python 1,273 58 Updated May 21, 2025

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

Python 40,897 4,512 Updated May 23, 2025

agentica-project / rllm

Democratizing Reinforcement Learning for LLMs

Jupyter Notebook 3,286 305 Updated May 13, 2025

Gen-Verse / ReasonFlux

ReasonFlux: The Open-Source Strong Reasoning Model Series

Python 390 30 Updated May 9, 2025

hkust-nlp / simpleRL-reason

Simple RL training for reasoning

Python 3,583 266 Updated Apr 10, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 24,523 2,259 Updated May 23, 2025

Emericen / deepseek-r1-distilled

DeepSeek R1 distilled into smaller OSS models

Python 12 4 Updated Jan 21, 2025

ADaM-BJTU / OpenRFT

OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning

Python 140 3 Updated Dec 24, 2024

thu-coai / SPaR

Python 47 3 Updated Dec 17, 2024

victorShawFan / OpenRLHF_add_simpo

添加了simpo方法的OpenRLHF，个人修改，原仓库链接：https://github.com/OpenLLMAI/OpenRLHF

Python 8 Updated Jun 19, 2024

YuxiXie / MCTS-DPO

This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.

Jupyter Notebook 312 30 Updated Aug 6, 2024

ADaM-BJTU / O1-CODER

AN O1 REPLICATION FOR CODING

Python 335 22 Updated Dec 11, 2024

XxFChen / awesome-reinforcement-fine-tuning

Awesome Reinforcement Fine Tuning

3 Updated Dec 8, 2024

flowersteam / lamorel

Lamorel is a Python library designed for RL practitioners eager to use Large Language Models (LLMs).

Python 233 21 Updated Nov 5, 2024

Stability-AI / lm-evaluation-harness

Forked from EleutherAI/lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.

Python 151 47 Updated Sep 13, 2024

McGill-NLP / VinePPO

Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"

Python 156 15 Updated Nov 11, 2024

maitrix-org / llm-reasoners

A library for advanced large language model reasoning

Python 2,130 188 Updated Apr 9, 2025

hijkzzz / Awesome-LLM-Strawberry

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.

6,732 375 Updated May 13, 2025

sfdeggb / RL_Like_o1

implement reinforcement learning(RL)and chain of thought(COT)like o1.

Python 1 Updated Oct 6, 2024

SimpleBerry / LLaMA-O1

Large Reasoning Models

Python 804 45 Updated Dec 3, 2024

junkangwu / beta-DPO

[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$

Python 45 2 Updated Oct 23, 2024

GAIR-NLP / O1-Journey

O1 Replication Journey

1,992 65 Updated Jan 14, 2025

openreasoner / openr

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

Python 1,772 134 Updated Jan 17, 2025

jessicazhu123 / Deepspeed_LLM

SFT/Reward Model/DPO/SPO

Python 1 1 Updated May 30, 2024

gouqi666 / DPO-deepspeed

Python 9 Updated Jan 4, 2024

MLT-OSS / open-assistant-api

The Open Assistant API is a ready-to-use, open-source, self-hosted agent/gpts orchestration creation framework, supporting customized extensions for LLM, RAG, function call, and tools capabilities.…

Python 338 84 Updated Mar 21, 2025

Jiayin-Gu / PKUreport

北京大学博士后研究工作报告 LaTeX 模板

TeX 15 Updated Mar 13, 2023

4tarXu / dlmu_postdoctor_latex

大连海事大学博士后研究工作报告模版，基于中科大学位论文latex模版修改

TeX 1 Updated Nov 14, 2023

vwxyzjn / lm-human-preference-details

RLHF implementation details of OAI's 2019 codebase

Python 187 9 Updated Jan 14, 2024

0