Stars
llm
5 repositories
Universal LLM Deployment Engine with ML Compilation
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
A high-throughput and memory-efficient inference and serving engine for LLMs