Stars
Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Transformers, and vLLM.
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
[ICML'21 Oral] I-BERT: Integer-only BERT Quantization
✨✨Latest Advances on Multimodal Large Language Models
Download DeepMind's Kinetics dataset.
[ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization
Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In CVPR 2022.
An Open-Source Library for Training Binarized Neural Networks
Pytorch implementation of our paper accepted by ICCV 2021 -- ReCU: Reviving the Dead Weights in Binary Neural Networks http://arxiv.org/abs/2103.12369
An official PyTorch implementation of the paper "Distance-aware Quantization", ICCV 2021.
[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
This is the official pytorch implementation for paper: BiDet: An Efficient Binarized Object Detector, which is accepted by CVPR2020.
PyHessian is a Pytorch library for second-order based analysis and training of Neural Networks
Neural Network Quantization With Fractional Bit-widths
[ICML 2021] "Double-Win Quant: Aggressively Winning Robustness of Quantized DeepNeural Networks via Random Precision Training and Inference" by Yonggan Fu, Qixuan Yu, Meng Li, Vikas Chandra, Yingya…
The pytorch re-implement of the official efficientdet with SOTA performance in real time and pretrained weights.
PyTorch implementation of Towards Efficient Training for Neural Network Quantization
Using ideas from product quantization for state-of-the-art neural network compression.
An official implementation of "Network Quantization with Element-wise Gradient Scaling" (CVPR 2021) in PyTorch.
Pytorch implementation for FAT: learning low-bitwidth parametric representation via frequency-aware transformation
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (p…
BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization (ICLR 2021)