Stars
A Datacenter Scale Distributed Inference Serving Framework
Ditto is an open-source framework that enables direct conversion of HuggingFace PreTrainedModels into TensorRT-LLM engines.
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
OwLite is a low-code AI model compression toolkit for AI models.