8000 LuFinch (LuFengqing) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View LuFinch's full-sized avatar

Block or report LuFinch

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

My learning notes/codes for ML SYS.

Python 2,246 139 Updated May 22, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 7,685 773 Updated May 23, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,369 598 Updated May 20, 2025

FlashMLA: Efficient MLA decoding kernels

Cuda 11,567 835 Updated Apr 29, 2025

Extending JAX with custom C++ and CUDA code

Python 394 23 Updated Aug 18, 2024

Reference implementations of MLPerf™ training benchmarks

Python 1,672 567 Updated May 14, 2025

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.

LLVM 1,337 770 Updated May 23, 2025

It is open source ebook about TensorFlow kernel and implementation mechanism.

TeX 2,894 580 Updated May 5, 2023

tensorflow源码阅读笔记

190 42 Updated Sep 18, 2018

Intel® Extension for TensorFlow*

C++ 337 42 Updated Mar 18, 2025
0