-
ISCAS; UCAS
- Beijing, China
-
11:12
(UTC +08:00) - https://gipsyh.github.io/
- https://orcid.org/0009-0009-2571-8135
Highlights
- Pro
Starred repositories
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
A lightweight design for computation-communication overlap.
nnScaler: Compiling DNN models for Parallel Training
Optimized primitives for collective multi-GPU communication
Distributed Triton for Parallel Systems
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
ModelChecker: A bit-level model checking tool
Fast and memory-efficient exact attention
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
π€ Chat with your SQL database π. Accurate Text-to-SQL Generation via LLMs using RAG π.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
DeepEP: an efficient expert-parallel communication library
FlashMLA: Efficient MLA decoding kernels
A Fast, Low-Overhead On-chip Network
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Janus-Series: Unified Multimodal Understanding and Generation Models