Releases: thu-pacman/chitu
Releases ยท thu-pacman/chitu
v0.3.1
v0.3.0
Added support for online conversion from FP4 to FP8 and BF16, supporting the FP4 quantized version of DeepSeek-R1 671B on non-Blackwell GPUs.
v0.2.3
v0.2.2
v0.2.1
What's new:
- [HIGHLIGHT] Hybrid CPU+GPU inference (compatible with multi-GPU and multi-request).
- Support of new models (see below for full list).
- Multiple optimizations to operator kernels.
Officially supported models:
- [NEW] QwQ-32B-FP8 (https://huggingface.co/qingcheng-ai/QWQ-32B-FP8)
Usage: Appendmodels=QwQ-32B-FP8
command line argument when starting Chitu - [NEW] QwQ-32B-AWQ (https://huggingface.co/Qwen/QwQ-32B-AWQ)
Usage: Appendmodels=QwQ-32B-AWQ
command line argument when starting Chitu - [NEW] Llama-3.3-70B-Instruct (https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)
Usage: Appendmodels=Llama-3.3-70B-Instruct
command line argument when starting Chitu - [NEW] DeepSeek-R1-Distill-Llama-70B (https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B)
Usage: Appendmodels=DeepSeek-R1-Distill-Llama-70B
command line argument when starting Chitu - Qwen2.5-32B (https://huggingface.co/Qwen/Qwen2.5-32B)
Usage: Appendmodels=Qwen2.5-32B
command line argument when starting Chitu - QwQ-32B (https://huggingface.co/Qwen/QwQ-32B)
Usage: Appendmodels=QwQ-32B
command line argument when starting Chitu - Mixtral-8x7B-Instruct-v0.1 (https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
Usage: Appendmodels=Mixtral-8x7B-Instruct-v0.1
command line argument when starting Chitu - Qwen2-72B-Instruct (https://huggingface.co/Qwen/Qwen2-72B-Instruct)
Usage: Appendmodels=Qwen2-72B-Instruct
command line argument when starting Chitu - Meta-Llama-3-8B-Instruct-original (https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct (Please use its "original" checkpoint))
Usage: Appendmodels=Meta-Llama-3-8B-Instruct-original
command line argument when starting Chitu - glm-4-9b-chat (https://huggingface.co/THUDM/glm-4-9b-chat)
Usage: Appendmodels=glm-4-9b-chat
command line argument when starting Chitu - DeepSeek-R1 (https://huggingface.co/deepseek-ai/DeepSeek-R1)
Usage: Appendmodels=DeepSeek-R1
command line argument when starting Chitu - DeepSeek-R1-Distill-Qwen-14B (https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)
Usage: Appendmodels=DeepSeek-R1-Distill-Qwen-14B
command line argument when starting Chitu - Qwen2-7B-Instruct (https://huggingface.co/Qwen/Qwen2-7B-Instruct)
Usage: Appendmodels=Qwen2-7B-Instruct
command line argument when starting Chitu - DeepSeek-R1-bf16 (https://huggingface.co/opensourcerelease/DeepSeek-R1-bf16)
Usage: Appendmodels=DeepSeek-R1-bf16
command line argument when starting Chitu - DeepSeek-V3 (https://huggingface.co/deepseek-ai/DeepSeek-V3)
Usage: Appendmodels=DeepSeek-V3
command line argument when starting Chitu
v0.2.0
v0.1.2
v0.1.1
NOTE: CUDA graph support in this release is broken. Use v0.1.2 instead.
What's new:
- Support of setting activation type to
float16
for DeepSeek R1 (via appendingkeep_dtype_in_checkpoint=False dtype=float16
in command line arguments). - Config file for QwQ-32B.
- A number of bug fixes for running with CUDA graph.
- Further optimizations of operator kernels.