8000 awgu (Andrew Gu) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View awgu's full-sized avatar
😴
😴

Block or report awgu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

🚀 Efficient implementations of state-of-the-art linear attention models in Torch and Triton

Python 2,791 198 Updated Jun 24, 2025

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,384 606 Updated Jun 24, 2025

Distributed Compiler Based on Triton for Parallel Systems

Python 842 67 Updated Jun 18, 2025

Fast and memory-efficient exact attention

Python 17,990 1,766 Updated Jun 21, 2025

Ongoing research training transformer models at scale

Python 12,642 2,863 Updated Jun 19, 2025

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…

Python 2,507 435 Updated Jun 19, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 15,421 2,171 Updated Jun 24, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,835 279 Updated May 15, 2025

Analyze computation-communication overlap in V3/R1.

1,065 144 Updated Mar 21, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,808 297 Updated Mar 10, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,471 626 Updated Jun 23, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,200 819 Updated Jun 24, 2025
Python 90 7 Updated Dec 27, 2024

Tile primitives for speedy kernels

Cuda 2,474 157 Updated Jun 22, 2025

Important concepts in numerical linear algebra and related areas

761 63 Updated Jan 13, 2024

CUDA Templates for Linear Algebra Subroutines

C++ 7,738 1,284 Updated Jun 12, 2025

A PyTorch native platform for training generative AI models

Python 3,953 407 Updated Jun 24, 2025

The Python programming language

Python 67,609 32,198 Updated Jun 24, 2025

Development repository for the Triton language and compiler

MLIR 15,940 2,062 Updated Jun 24, 2025

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 90,991 24,514 Updated Jun 24, 2025

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 32,604 3,067 Updated Jun 24, 2025
0