Stars
An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs
SystemPanic / vllm-windows
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs (Windows build & kernels)
An Extension for Forge Webui that implements Attention Couple
A fast inference library for running LLMs locally on modern consumer-class G 6095 PUs
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
LLM UI with advanced features, easy setup, and multiple backend support.
Windows compile of bitsandbytes for use in text-generation-webui.
Karras et al. (2022) diffusion models for PyTorch