8000 DefTruth (DefTruth) · GitHub

More Web Proxy on the site http://driver.im/

DefTruth

Follow

🎯

#pragma unroll

DefTruth DefTruth

🎯

#pragma unroll

Follow

@xlite-dev, @vipshop, @PaddlePaddle (Prev.), Contributor @vllm-project 🛠⚙

1.7k followers · 147 following

@xlite-dev, @vipshop
Guangzhou, China
20:06 (UTC +08:00)

Achievements

Achievements

Organizations

Pinned Loading

xlite-dev/lite.ai.toolkit xlite-dev/lite.ai.toolkit Public

🛠 A lite C++ AI toolkit: 100+🎉 models (Stable-Diffusion, Face-Fusion, YOLO series, Det, Seg, Matting) with MNN, ORT and TRT.

C++ 4.1k 739
vllm-project/vllm vllm-project/vllm Public

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 47k 7.3k
xlite-dev/Awesome-LLM-Inference xlite-dev/Awesome-LLM-Inference Public

📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc.

Python 4k 276
xlite-dev/LeetCUDA xlite-dev/LeetCUDA Public

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥

Cuda 4.1k 439
xlite-dev/ffpa-attn-mma xlite-dev/ffpa-attn-mma Public

📚FFPA(Split-D): Extend FlashAttention with Split-D for large headdim, O(1) GPU SRAM complexity, 1.8x~3x↑🎉 faster than SDPA EA.

Cuda 171 7
xlite-dev/HGEMM xlite-dev/HGEMM Public

⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.

Cuda 75 3

0