-
vllm-fork Public
Forked from HabanaAI/vllm-forkA high-throughput and memory-efficient inference and serving engine for LLMs
Python Apache License 2.0 UpdatedMay 27, 2025 -
optimum-habana Public
Forked from huggingface/optimum-habanaEasy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
Python Apache License 2.0 UpdatedMay 6, 2025 -
neural-compressor Public
Forked from intel/neural-compressorSOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Python Apache License 2.0 UpdatedMar 11, 2025 -
vllm-hpu-extension Public
Forked from HabanaAI/vllm-hpu-extensionPython Apache License 2.0 UpdatedDec 27, 2024 -
bitsandbytes Public
Forked from bitsandbytes-foundation/bitsandbytes8-bit CUDA functions for PyTorch
Python MIT License UpdatedSep 5, 2023