Lists (1)
Sort Name ascending (A-Z)
Stars
Any-Order GPT as Masked Diffusion Model: Decoupling Formulation and Architecture. Training an MDM using GPT with this repo!
MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
Open-source Multi-agent Poster Generation from Papers
Reverse Engineering Gemma 3n: Google's New Edge-Optimized Language Model
Scaling Computer-Use Grounding via UI Decomposition and Synthesis
Painless Evaluation of Flash Linear Attention models on Synthetic Tasks
The official implementation for Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)
SkyRL: A Modular Full-stack RL Library for LLMs
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
FlashInfer: Kernel Library for LLM Serving
Computer Agent Arena: Test & compare AI agents in real desktop apps & web environments. Code/data coming soon!
Scaling Deep Research via Reinforcement Learning in Real-world Environments.
Distributed Compiler based on Triton for Parallel Systems
[ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"
Stick-breaking attention
The Open All-in-One Multimodal AI Agent Stack connecting Cutting-edge AI Models and Agent Infra.
[ACL 2025] An inference-time decoding strategy with adaptive foresight sampling
Pretraining infrastructure for multi-hybrid AI model architectures