8000 wdan (Yanhong Wu) / Starred · GitHub

More Web Proxy on the site http://driver.im/

wdan

Follow

💤

Yanhong Wu wdan

💤

Follow

48 followers · 73 following

Fudan University

Achievements

Achievements

Stars

fla-org / flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models in Torch and Triton

Python 2,478 179 Updated Jun 7, 2025

deepseek-ai / profile-data

Analyze computation-communication overlap in V3/R1.

1,049 143 Updated Mar 21, 2025

deepseek-ai / EPLB

Expert Parallelism Load Balancer

Python 1,206 193 Updated Mar 24, 2025

deepseek-ai / DualPipe

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,797 296 Updated Mar 10, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,414 611 Updated May 27, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 7,746 786 Updated Jun 6, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA decoding kernels

Cuda 11,591 837 Updated Apr 29, 2025

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,805 278 Updated May 15, 2025

lucidrains / native-sparse-attention-pytorch

Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper

Python 642 35 Updated May 16, 2025

harvard-edge / cs249r_book

Introduction to Machine Learning Systems

TeX 1,870 219 Updated Jun 7, 2025

petal2020 / petal_bach_goldberg-variations

data sets for performance analyses of Johann Sebastian Bach’s 'Goldberg Variations' BWV 988

2 2 Updated Dec 11, 2021

srush / Triton-Puzzles

Puzzles for learning Triton

Jupyter Notebook 1,682 134 Updated Nov 18, 2024

BoliboliWJY / Programming-Massively-Parallel-Processors-A-Hands-on-Approach-4th

TeX 14 1 Updated May 18, 2025

Isalia20 / Programming_Massively_Parallel_Processors_Exercise_Answers

Repository for answers for exercises in Programming Massively Parallel Processors book

C++ 14 1 Updated Aug 10, 2024

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 4,560 457 Updated Feb 9, 2025

gpu-mode / triton-index

Cataloging released Triton kernels.

231 11 Updated Jan 10, 2025

gpu-mode / awesomeMLSys

An ML Systems Onboarding list

800 29 Updated Jan 24, 2025

rkinas / triton-resources

A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.

Python 360 22 Updated Mar 10, 2025

SJTU-IPADS / OS-Course-Lab

本仓库包含上海交通大学IPADS实验室设计的操作系统课程系列实验。

C 392 109 Updated May 14, 2025

dddrrreee / cs140e-23win

All material for CS140E, winter 2023.

C 83 36 Updated Mar 12, 2024

harvardnlp / annotated-transformer

An annotated implementation of the Transformer paper.

Jupyter Notebook 6,276 1,350 Updated Apr 7, 2024

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 14,953 1,942 Updated Jun 8, 2025

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 2,428 149 Updated Jun 7, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥

Cuda 4,674 497 Updated Jun 7, 2025

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 5,163 346 Updated Jun 7, 2025

withinmiaov / A-Survey-on-Mixture-of-Experts-in-LLMs

The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".

366 20 Updated Mar 12, 2025

horseee / Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

Python 1,707 135 Updated Apr 23, 2025

AnswerDotAI / ModernBERT

Bringing BERT into modernity via both architecture changes and scaling

Python 1,391 112 Updated May 16, 2025

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 10,675 1,481 Updated Jun 7, 2025

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, Parallelism, MLA, etc.

Python 4,101 283 Updated Jun 7, 2025

0