10000 Zhiy-Zhang / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View Zhiy-Zhang's full-sized avatar

Block or report Zhiy-Zhang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A lightweight design for computation-communication overlap.

Cuda 143 5 Updated Jun 20, 2025

[USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Parallelism

Python 57 3 Updated Jul 31, 2024

PyTorch distributed training acceleration framework

Python 49 8 Updated Feb 13, 2025

🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

Python 703 32 Updated Mar 19, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA.

Cuda 4,891 536 Updated Jun 21, 2025

My learning notes/codes for ML SYS.

Python 2,647 166 Updated Jun 25, 2025

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

C++ 383 36 Updated Jun 7, 2025

KV cache compression for high-throughput LLM inference

Python 131 5 Updated Feb 5, 2025

iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models

Python 19 2 Updated Jan 29, 2025

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 830 37 Updated Jun 5, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 15,471 2,184 Updated Jun 26, 2025

TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.

Python 1,617 154 Updated Jun 23, 2025

SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.

Python 782 86 Updated Apr 1, 2025
C++ 428 34 Updated Jun 26, 2025

A performance library for machine learning applications.

Python 184 13 Updated Oct 12, 2023

[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Python 304 31 Updated Jan 19, 2025

校招、秋招、春招、实习好项目!带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step

C++ 2,996 333 Updated Jun 22, 2025

A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API

Jupyter Notebook 12,173 1,792 Updated Aug 8, 2024

The official gpt4free repository | various collection of powerful language models | o4, o3 and deepseek r1, gpt-4.1, gemini 2.5

Python 64,504 13,656 Updated Jun 25, 2025

🐍 Geometric Computer Vision Library for Spatial AI

Python 10,557 1,028 Updated Jun 25, 2025

A Data Streaming Library for Efficient Neural Network Training

Python 1,327 164 Updated Jun 25, 2025

Profiling and Improving the PyTorch Dataloader for high-latency Storage

Jupyter Notebook 20 5 Updated Apr 18, 2023

《Effective Modern C++》- 完成翻译

8,331 1,217 Updated Feb 14, 2025

MegBox is an easy-to-use, well-rounded and safe toolbox of MegEngine. Aim to imporving usage experience and speeding up develop process.

Python 6 Updated Apr 9, 2023

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,648 382 Updated Apr 1, 2025

MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架

C++ 4,796 546 Updated Oct 24, 2024
0