8000 SiriusNEO (Chaofan Lin) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View SiriusNEO's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report SiriusNEO

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Distributed Triton for Parallel Systems

Python 724 43 Updated May 12, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 4,035 363 Updated May 18, 2025

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 933 59 Updated Apr 15, 2025

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Python 1,114 73 Updated May 15, 2025

[ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference

Jupyter Notebook 20 1 Updated Apr 16, 2025
C++ 33 6 Updated May 17, 2025

[ICML2025] Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity

Python 251 9 Updated May 2, 2025

Kernel Tuner

Python 336 54 Updated May 16, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 8,887 883 Updated May 7, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,351 597 Updated May 16, 2025

SpargeAttention: A training-free sparse attention that can accelerate any model inference.

Cuda 551 34 Updated May 14, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 7,663 769 Updated May 12, 2025

FlashMLA: Efficient MLA decoding kernels

Cuda 11,550 834 Updated Apr 29, 2025

Muon is Scalable for LLM Training

1,047 47 Updated Mar 28, 2025

🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

Python 666 29 Updated Mar 19, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,771 276 Updated May 15, 2025

depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.

Python 670 24 Updated Apr 20, 2025

MoBA: Mixture of Block Attention for Long-Context LLMs

Python 1,774 105 Updated Apr 3, 2025

Open Overleaf/ShareLaTex projects in vscode, with full collaboration support.

TypeScript 975 24 Updated Apr 16, 2025

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Python 1,755 94 Updated May 17, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 14,065 986 Updated May 18, 2025

Fast low-bit matmul kernels in Triton

Python 301 23 Updated May 17, 2025

A bibliography and survey of the papers surrounding o1

TeX 1,191 50 Updated Nov 16, 2024

[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training

Python 195 12 Updated Apr 22, 2025

Canvas: End-to-End Kernel Architecture Search in Neural Networks

C++ 26 4 Updated Nov 18, 2024

[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration

Python 210 20 Updated Nov 18, 2024

A framework for few-shot evaluation of language models.

Python 8,955 2,395 Updated May 17, 2025

Minimal reproduction of DeepSeek R1-Zero

Python 11,765 1,486 Updated Apr 24, 2025
Next
0