- Aksarben
Highlights
- Pro
llm
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI
This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.
Reverse Engineering: Decompiling Binary Code with Large Language Models
Official inference framework for 1-bit LLMs
Run PyTorch LLMs locally on servers, desktop and mobile
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
lightweight, standalone C++ inference engine for Google's Gemma models.
fastllm是c++实现,后端无依赖(仅依赖CUDA,无需依赖PyTorch)的高性能大模型推理库。 可实现单4090推理DeepSeek R1 671B INT4模型,单路可达20+tps。
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…