- Beijing, China
Stars
Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay
✨✨R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities
verl: Volcano Engine Reinforcement Learning for LLMs
Fully open reproduction of DeepSeek-R1
Understanding R1-Zero-Like Training: A Critical Perspective
[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Official implementation of paper: SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
The simplest, fastest repository for training/finetuning medium-sized GPTs.
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
sangmichaelxie / doremi
< 89BF /div>Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
《李宏毅深度学习教程》(李宏毅老师推荐👍,苹果书🍎),PDF下载地址:https://github.com/datawhalechina/leedl-tutorial/releases
Efficient Triton Kernels for LLM Training
[AAAI 2024] Code for CTX-vec2wav in UniCATS
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
Foundational Models for State-of-the-Art Speech and Text Translation