FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs o…

Python 3,882 751 Updated May 12, 2025

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 26,972 3,096 Updated Jun 26, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA.

Cuda 4,897 536 Updated Jun 21, 2025

FlagOpen / FlagAttention

A collection of memory efficient attention operators implemented in the Triton language.

Python 272 18 Updated Jun 5, 2024

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 4,637 467 Updated Jun 18, 2025

FasterDecoding / REST

REST: Retrieval-Based Speculative Decoding, NAACL 2024

C 204 15 Updated Dec 2, 2024

wenxin-zhu / mkdocs

基于mkdocs的文档网站

HTML 1 Updated Jun 12, 2024

cadedaniel / vllm-public

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 4 Updated Aug 9, 2024

ymwangg / vllm

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 3 Updated Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chen Shen dutsc

Highlights

Block or report dutsc

Stars

NVIDIA / nccl-tests

ml-explore / mlx-lm

linkedin / Liger-Kernel

huggingface / nanotron

lm-sys / FastChat

shreyansh26 / FlashAttention-PyTorch

ASISys / Adrenaline

ai-dynamo / dynamo

EndlessCheng / codeforces-go

tspeterkim / paged-attention-minimal

shreyansh26 / Triton-Puzzles-Solutions

LLMServe / SwiftTransformer

taishi-i / awesome-ChatGPT-repositories

microsoft / vattention

BBuf / how-to-optim-algorithm-in-cuda

srush / Triton-Puzzles

FedML-AI / FedML

karpathy / llm.c

xlite-dev / LeetCUDA

FlagOpen / FlagAttention

gpu-mode / lectures

FasterDecoding / REST

wenxin-zhu / mkdocs

cadedaniel / vllm-public

ymwangg / vllm

TJU-NSL / NSL-test

FasterDecoding / Medusa

iusztinpaul / hands-on-llms

hemingkx / SpeculativeDecodingPapers

karpathy / nanoGPT