Highlights
- Pro
Stars
An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
Scalable toolkit for efficient model alignment
Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"
Karras et al. (2022) diffusion models for PyTorch
Official code for "DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps" (Neurips 2022 Oral)
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Implementing DeepSeek R1's GRPO algorithm from scratch
The simplest, fastest repository for training/finetuning medium-sized GPTs.
DDPO for finetuning diffusion models, implemented in PyTorch with LoRA support
Code for "Diffusion Model Alignment Using Direct Preference Optimization"
PyTorch implementation for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR 2021, Oral)
MMseqs2: ultra fast and sensitive search and clustering suite
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
[ICLR2025] Official Implementation of IgGM: A Generative Model for Functional Antibody and Nanobody Design
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Efficient triton implementation of Native Sparse Attention.
SpargeAttention: A training-free sparse attention that can accelerate any model inference.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
DeepEP: an efficient expert-parallel communication library
FlashMLA: Efficient MLA decoding kernels
Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This repository contains the code for the experiments in the paper.
Joint embedding of protein sequence and structure with discrete and continuous compressions of protein folding model latent spaces. http://bit.ly/cheap-proteins
📚FFPA(Split-D): Extend FlashAttention with Split-D for large headdim, O(1) GPU SRAM complexity, 1.8x~3x↑🎉 faster than SDPA EA.