- Guangzhou, China
-
20:06
(UTC +08:00)
Pinned Loading
-
xlite-dev/lite.ai.toolkit
xlite-dev/lite.ai.toolkit Public🛠 A lite C++ AI toolkit: 100+🎉 models (Stable-Diffusion, Face-Fusion, YOLO series, Det, Seg, Matting) with MNN, ORT and TRT.
-
vllm-project/vllm
vllm-project/vllm PublicA high-throughput and memory-efficient inference and serving engine for LLMs
-
xlite-dev/Awesome-LLM-Inference
xlite-dev/Awesome-LLM-Inference Public📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc.
-
xlite-dev/LeetCUDA
xlite-dev/LeetCUDA Public📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥
-
xlite-dev/ffpa-attn-mma
xlite-dev/ffpa-attn-mma Public📚FFPA(Split-D): Extend FlashAttention with Split-D for large headdim, O(1) GPU SRAM complexity, 1.8x~3x↑🎉 faster than SDPA EA.
-
xlite-dev/HGEMM
xlite-dev/HGEMM Public⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.
If the problem persists, check the GitHub status page or contact support.