Zhiy-Zhang

Zhiy-Zhang

1 follower · 16 following

Stars

infinigence / FlashOverlap

A lightweight design for computation-communication overlap.

Cuda 143 5 Updated Jun 20, 2025

kwai / Megatron-Kwai

Forked from NVIDIA/Megatron-LM

[USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Parallelism

Python 57 3 Updated Jul 31, 2024

AlibabaPAI / torchacc

PyTorch distributed training acceleration framework

Python 49 8 Updated Feb 13, 2025

fla-org / native-sparse-attention

🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

Python 703 32 Updated Mar 19, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA.

Cuda 4,891 536 Updated Jun 21, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Python 2,647 166 Updated Jun 25, 2025

andrewkchan / yalm

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

C++ 383 36 Updated Jun 7, 2025

IsaacRe / vllm-kvcompress

KV cache compression for high-throughput LLM inference

Python 131 5 Updated Feb 5, 2025

hulianyuyy / iLLaVA

iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models

Python 19 2 Updated Jan 29, 2025

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 830 37 Updated Jun 5, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 15,471 2,184 Updated Jun 26, 2025

facebookresearch / multimodal

TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.

Python 1,617 154 Updated Jun 23, 2025

facebookresearch / SONAR

SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.

Python 782 86 Updated Apr 1, 2025

Tencent / KsanaLLM

C++ 428 34 Updated Jun 26, 2025

kakaobrain / trident

A performance library for machine learning applications.

Python 184 13 Updated Oct 12, 2023

jy-yuan / KIVI

[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Python 304 31 Updated Jan 19, 2025

zjhellofss / KuiperInfer

校招、秋招、春招、实习好项目！带你从零实现一个高性能的深度学习推理库，支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step

C++ 2,996 333 Updated Jun 22, 2025

karpathy / micrograd

A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API

Jupyter Notebook 12,173 1,792 Updated Aug 8, 2024

xtekky / gpt4free

The official gpt4free repository | various collection of powerful language models | o4, o3 and deepseek r1, gpt-4.1, gemini 2.5

Python 64,504 13,656 Updated Jun 25, 2025

kornia / kornia

🐍 Geometric Computer Vision Library for Spatial AI

Python 10,557 1,028 Updated Jun 25, 2025

mosaicml / streaming

A Data Streaming Library for Efficient Neural Network Training

Python 1,327 164 Updated Jun 25, 2025

iarai / concurrent-dataloader

Profiling and Improving the PyTorch Dataloader for high-latency Storage

Jupyter Notebook 20 5 Updated Apr 18, 2023

CnTransGroup / EffectiveModernCppChinese

《Effective Modern C++》- 完成翻译

8,331 1,217 Updated Feb 14, 2025

Asthestarsfalll / MegBox

MegBox is an easy-to-use, well-rounded and safe toolbox of MegEngine. Aim to imporving usage experience and speeding up develop process.

Python 6 Updated Apr 9, 2023

facebookincubator / AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,648 382 Updated Apr 1, 2025

flame / how-to-optimize-gemm

C 1,888 357 Updated Jul 29, 2023

MegEngine / MegEngine

MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架

C++ 4,796 546 Updated Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly