8000 Releases ยท thu-pacman/chitu ยท GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Releases: thu-pacman/chitu

v0.3.1

30 Apr 07:21
Compare
Choose a tag to compare

Better support for MetaX (ๆฒๆ›ฆ) GPUs:

  • Support of both Llama-like models and DeepSeek models. Tested with DeepSeek-R1-Distill-Llama-70B and DeepSeek-R1-671B using bf16, fp16, and soft fp8 precision.
  • New infer.op_impl=muxi_custom_kernel mode optimized for small batches.

v0.3.0

29 Apr 09:59
Compare
Choose a tag to compare

Added support for online conversion from FP4 to FP8 and BF16, supporting the FP4 quantized version of DeepSeek-R1 671B on non-Blackwell GPUs.

v0.2.3

24 Apr 11:04
Compare
Choose a tag to compare

Multiple bugs fixed.

v0.2.2

22 Apr 13:48
Compare
Choose a tag to compare

Performance improvements on hybrid CPU+GPU inference.

v0.2.1

20 Apr 08:30
Compare
Choose a tag to compare

What's new:

  • [HIGHLIGHT] Hybrid CPU+GPU inference (compatible with multi-GPU and multi-request).
  • Support of new models (see below for full list).
  • Multiple optimizations to operator kernels.

Officially supported models:

v0.2.0

18 Apr 16:57
Compare
Choose a tag to compare

(This release has been yanked)

v0.1.2

01 Apr 14:01
Compare
Choose a tag to compare

HOT FIX: Fix major performance regression when CUDA graph is enabled (via infer.use_cuda_graph=True).

v0.1.1

28 Mar 15:00
Compare
Choose a tag to compare
v0.1.1 Pre-release
Pre-release

NOTE: CUDA graph support in this release is broken. Use v0.1.2 instead.

What's new:

  • Support of setting activation type to float16 for DeepSeek R1 (via appending keep_dtype_in_checkpoint=False dtype=float16 in command line arguments).
  • Config file for QwQ-32B.
  • A number of bug fixes for running with CUDA graph.
  • Further optimizations of operator kernels.

v0.1.0

14 Mar 02:21
aef62b2
Compare
Choose a tag to compare

Initial release.

0