8000 wdan (Yanhong Wu) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View wdan's full-sized avatar
💤
💤

Block or report wdan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

🚀 Efficient implementations of state-of-the-art linear attention models in Torch and Triton

Python 2,478 179 Updated Jun 7, 2025

Analyze computation-communication overlap in V3/R1.

1,049 143 Updated Mar 21, 2025

Expert Parallelism Load Balancer

Python 1,206 193 Updated Mar 24, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,797 296 Updated Mar 10, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,414 611 Updated May 27, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 7,746 786 Updated Jun 6, 2025

FlashMLA: Efficient MLA decoding kernels

Cuda 11,591 837 Updated Apr 29, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,805 278 Updated May 15, 2025

Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper

Python 642 35 Updated May 16, 2025

Introduction to Machine Learning Systems

TeX 1,870 219 Updated Jun 7, 2025

data sets for performance analyses of Johann Sebastian Bach’s 'Goldberg Variations' BWV 988

2 2 Updated Dec 11, 2021

Puzzles for learning Triton

Jupyter Notebook 1,682 134 Updated Nov 18, 2024

Repository for answers for exercises in Programming Massively Parallel Processors book

C++ 14 1 Updated Aug 10, 2024

Material for gpu-mode lectures

Jupyter Notebook 4,560 457 Updated Feb 9, 2025

Cataloging released Triton kernels.

231 11 Updated Jan 10, 2025

An ML Systems Onboarding list

800 29 Updated Jan 24, 2025

A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.

Python 360 22 Updated Mar 10, 2025

本仓库包含上海交通大学IPADS实验室设计的操作系统课程系列实验。

C 392 109 Updated May 14, 2025

All material for CS140E, winter 2023.

C 83 36 Updated Mar 12, 2024

An annotated implementation of the Transformer paper.

Jupyter Notebook 6,276 1,350 Updated Apr 7, 2024

SGLang is a fast serving framework for large language models and vision language models.

Python 14,953 1,942 Updated Jun 8, 2025

Tile primitives for speedy kernels

Cuda 2,428 149 Updated Jun 7, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥

Cuda 4,674 497 Updated Jun 7, 2025

Efficient Triton Kernels for LLM Training

Python 5,163 346 Updated Jun 7, 2025

The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".

366 20 Updated Mar 12, 2025

A curated list for Efficient Large Language Models

Python 1,707 135 Updated Apr 23, 2025

Bringing BERT into modernity via both architecture changes and scaling

Python 1,391 112 Updated May 16, 2025

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 10,675 1,481 Updated Jun 7, 2025

📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, Parallelism, MLA, etc.

Python 4,101 283 Updated Jun 7, 2025
Next
0