SGLang is a fast serving framework for large language models and vision language models.
cuda inference pytorch transformer moe llama vlm blackwell llm llm-serving llava deepseek-llm deepseek llama3 llama3-1 deepseek-v3 deepseek-r1 deepseek-r1-zero qwen3 llama4
-
Updated
Jun 19, 2025 - Python