awgu

😴

Andrew Gu awgu

😴

155 followers · 9 following

New York, NY

Achievements

x3 x3

Achievements

x3 x3

Stars

fla-org / flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models in Torch and Triton

Python 2,791 198 Updated Jun 24, 2025

pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,384 606 Updated Jun 24, 2025

ByteDance-Seed / Triton-distributed

Distributed Compiler Based on Triton for Parallel Systems

Python 842 67 Updated Jun 18, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 17,990 1,766 Updated Jun 21, 2025

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 12,642 2,863 Updated Jun 19, 2025

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…

Python 2,507 435 Updated Jun 19, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 15,421 2,171 Updated Jun 24, 2025

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,835 279 Updated May 15, 2025

deepseek-ai / profile-data

Analyze computation-communication overlap in V3/R1.

1,065 144 Updated Mar 21, 2025

deepseek-ai / DualPipe

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,808 297 Updated Mar 10, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,471 626 Updated Jun 23, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,200 819 Updated Jun 24, 2025

yifuwang / symm-mem-recipes

Python 90 7 Updated Dec 27, 2024

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 2,474 157 Updated Jun 22, 2025

higham / what-is

Important concepts in numerical linear algebra and related areas

761 63 Updated Jan 13, 2024

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 7,738 1,284 Updated Jun 12, 2025

pytorch / torchtitan

A PyTorch native platform for training generative AI models

Python 3,953 407 Updated Jun 24, 2025

python / cpython

The Python programming language

Python 67,609 32,198 Updated Jun 24, 2025

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 15,940 2,062 Updated Jun 24, 2025

pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 90,991 24,514 Updated Jun 24, 2025

jax-ml / jax

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 32,604 3,067 Updated Jun 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Andrew Gu awgu

Achievements

Achievements

Block or report awgu

Stars

fla-org / flash-linear-attention

pytorch / FBGEMM

ByteDance-Seed / Triton-distributed

Dao-AILab / flash-attention

NVIDIA / Megatron-LM

NVIDIA / TransformerEngine

sgl-project / sglang

deepseek-ai / open-infra-index

deepseek-ai / profile-data

deepseek-ai / DualPipe

deepseek-ai / DeepGEMM

deepseek-ai / DeepEP

yifuwang / symm-mem-recipes

HazyResearch / ThunderKittens

higham / what-is

NVIDIA / cutlass

pytorch / torchtitan

python / cpython

triton-lang / triton

pytorch / pytorch

jax-ml / jax