8000 waterwaterrr / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View waterwaterrr's full-sized avatar

Block or report waterwaterrr

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Monitoring recent cross-research on LLM & RL on arXiv for control. If there are good papers, PRs are welcome.

428 21 Updated Sep 12, 2024

Distributed RL System for LLM Reasoning

Python 1,813 90 Updated Jun 19, 2025

A comprehensive collection of process reward models.

92 1 Updated Jun 9, 2025

Code and implementations for the paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhiheng Xi et al.

Python 490 62 Updated Mar 11, 2025

Awesome-LLM: a curated list of Large Language Model

23,873 2,006 Updated May 9, 2025

Curated list of datasets and tools for post-training.

3,171 268 Updated Jan 29, 2025

A live stream development of RL tunning for LLM agents

Python 3,028 421 Updated Jun 19, 2025

[ICML 2025] Official Implementation of GLIDER

Python 45 1 Updated May 27, 2025

Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping

Python 47 3 Updated May 22, 2025

⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。

Python 17,741 1,922 Updated Apr 4, 2025

LLM Arena by KCORES team

HTML 843 38 Updated Apr 29, 2025

VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework

Python 353 19 Updated May 12, 2025

ByteCheckpoint: An Unified Checkpointing Library for LFMs

Python 219 9 Updated Apr 2, 2025

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

453 13 Updated Jun 16, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…

Python 8,182 707 Updated Jun 19, 2025

An Open-source RL System from ByteDance Seed and Tsinghua AIR

Python 1,357 57 Updated May 11, 2025

A series of technical report on Slow Thinking with LLM

Python 699 39 Updated Jun 9, 2025
Python 297 18 Updated May 31, 2025

Awesome RL-based LLM Reasoning

523 27 Updated May 4, 2025

🙌 OpenHands: Code Less, Make More

Python 58,658 6,754 Updated Jun 19, 2025
Python 777 34 Updated Apr 28, 2025

s1: Simple test-time scaling

Python 6,453 749 Updated May 19, 2025

Democratizing Reinforcement Learning for LLMs

Jupyter Notebook 3,384 311 Updated May 13, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,826 278 Updated May 15, 2025

Official Repo for Open-Reasoner-Zero

Python 1,969 104 Updated Jun 2, 2025

My learning notes/codes for ML SYS.

Python 2,520 160 Updated Jun 18, 2025

✔(已完结)最全面的 深度学习 笔记【土堆 Pytorch】【李沐 动手学深度学习】【吴恩达 深度学习】

Jupyter Notebook 11,021 1,330 Updated Jun 17, 2025

Deep Reinforcement Learning

3,965 633 Updated Dec 10, 2022
Next
0