SiriusNEO

🎯

Focusing

Chaofan Lin SiriusNEO

🎯

Focusing

Ph.D. @ IIIS, Tsinghua; Prev @ ACM Class, SJTU; MLSys

446 followers · 128 following

@tsinghua-ideal, Tsinghua University
Beijing, China
me.tric.space
https://chaofanlin.com/
@siriusneox

Achievements

x2 x2 x2

Achievements

x2 x2 x2

Highlights

Lists (4)

Sort

Stars

ByteDance-Seed / Seed-Thinking-v1.5

756 11 Updated Apr 20, 2025

ByteDance-Seed / Triton-distributed

Distributed Triton for Parallel Systems

Python 724 43 Updated May 12, 2025

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 4,035 363 Updated May 18, 2025

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 933 59 Updated Apr 15, 2025

thu-pacman / chitu

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Python 1,114 73 Updated May 15, 2025

LINs-lab / DeFT

[ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference

Jupyter Notebook 20 1 Updated Apr 16, 2025

mlc-ai / mlc-python

C++ 33 6 Updated May 17, 2025

svg-project / Sparse-VideoGen

[ICML2025] Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity

Python 251 9 Updated May 2, 2025

KernelTuner / kernel_tuner

Kernel Tuner

Python 336 54 Updated May 16, 2025

deepseek-ai / 3FS

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 8,887 883 Updated May 7, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,351 597 Updated May 16, 2025

thu-ml / SpargeAttn

SpargeAttention: A training-free sparse attention that can accelerate any model inference.

Cuda 551 34 Updated May 14, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 7,663 769 Updated May 12, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA decoding kernels

Cuda 11,550 834 Updated Apr 29, 2025

MoonshotAI / Moonlight

Muon is Scalable for LLM Training

1,047 47 Updated Mar 28, 2025

fla-org / native-sparse-attention

🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

Python 666 29 Updated Mar 19, 2025

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,771 276 Updated May 15, 2025

thuml / depyf

depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.

Python 670 24 Updated Apr 20, 2025

MoonshotAI / MoBA

MoBA: Mixture of Block Attention for Long-Context LLMs

Python 1,774 105 Updated Apr 3, 2025

iamhyc / Overleaf-Workshop

Open Overleaf/ShareLaTex projects in vscode, with full collaboration support.

TypeScript 975 24 Updated Apr 16, 2025

mit-han-lab / nunchaku

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Python 1,755 94 Updated May 17, 2025

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 14,065 986 Updated May 18, 2025

mobiusml / gemlite

Fast low-bit matmul kernels in Triton

Python 301 23 Updated May 17, 2025

srush / awesome-o1

A bibliography and survey of the papers surrounding o1

TeX 1,191 50 Updated Nov 16, 2024

NVlabs / COAT

[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training

Python 195 12 Updated Apr 22, 2025

tsinghua-ideal / Canvas

Canvas: End-to-End Kernel Architecture Search in Neural Networks

C++ 26 4 Updated Nov 18, 2024

efeslab / fiddler

[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration

Python 210 20 Updated Nov 18, 2024

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 8,955 2,395 Updated May 17, 2025

deepseek-ai / DeepSeek-R1

89,316 11,544 Updated Apr 9, 2025

Jiayi-Pan / TinyZero

Minimal reproduction of DeepSeek R1-Zero

Python 11,765 1,486 Updated Apr 24, 2025

Chaofan Lin SiriusNEO

Highlights

Lists (4)

🧠 DL Compilers and Frameworks

🦙 Large Language Models

⭐ Learning

🔧 Tools

Stars