Stars
My learning notes/codes for ML SYS.
DeepEP: an efficient expert-parallel communication library
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
FlashMLA: Efficient MLA decoding kernels
Reference implementations of MLPerf™ training benchmarks
Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
It is open source ebook about TensorFlow kernel and implementation mechanism.
Intel® Extension for TensorFlow*