Better support for MetaX (沐曦) GPUs:
- Support of both Llama-like models and DeepSeek models. Tested with
DeepSeek-R1-Distill-Llama-70B
andDeepSeek-R1-671B
using bf16, fp16, and soft fp8 precision. - New
infer.op_impl=muxi_custom_kernel
mode optimized for small batches.