8000 shenh10 (Han Shen) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View shenh10's full-sized avatar
🏠
Working from home
🏠
Working from home
  • Tsinghua University
  • Beijing

Organizations

@THVi-xTHU

Block or report shenh10

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,413 611 Updated May 27, 2025

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 449 40 Updated Apr 15, 2025

推荐/广告/搜索领域工业界经典以及最前沿论文集合。A collection of industry classics and cutting-edge papers in the field of recommendation/advertising/search.

Python 1,789 242 Updated May 27, 2025

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 721 126 Updated Feb 21, 2025

Bring portraits to life!

Python 16,025 1,675 Updated Jun 1, 2025

Advanced Profiling and Analytics for AMD Hardware

Python 156 58 Updated Jun 6, 2025

Third party assembler and GEMM library for NVIDIA Kepler GPU

CSS 81 20 Updated Oct 8, 2019

Patches to enable PCIe resizable BARs in the Linux NVIDIA kernel driver

Makefile 16 4 Updated Apr 22, 2022

NVIDIA Linux open GPU with P2P support

C 1,160 114 Updated Jun 6, 2025

A self-learning tutorail for CUDA High Performance Programing.

JavaScript 637 69 Updated Apr 12, 2025

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

C++ 1,724 193 Updated Apr 9, 2025

Long Range Arena for Benchmarking Efficient Transformers

Python 757 85 Updated Dec 16, 2023

Latency and Memory Analysis of Transformer Models for Training and Inference

Python 425 49 Updated Apr 19, 2025

CUDA Library Samples

Cuda 1,966 393 Updated Jun 6, 2025

Training material for Nsight developer tools

C 158 36 Updated Aug 8, 2024

Large Language Model (LLM) Systems Paper List

1,268 70 Updated Jun 5, 2025

Efficient Top-K implementation on the GPU

Cuda 179 21 Updated Apr 9, 2019

The ultimate vim distribution

Vim Script 15,562 3,605 Updated Nov 4, 2023

Building system of typhoon cloud computing platform of tencent, support C/C++/protobuf/thrift/lex/yacc/swig.

Python 1 Updated Mar 10, 2016

Pipeline Parallelism for PyTorch

Python 767 86 Updated Aug 21, 2024

A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.

C++ 587 61 Updated Jun 5, 2025

A model compilation solution for various hardware

MLIR 437 48 Updated Jun 4, 2025

ONNX Optimizer

C++ 718 96 Updated May 29, 2025

Simplify your onnx model

C++ 4,091 400 Updated Sep 3, 2024

A list of awesome compiler projects and papers for tensor computation and deep learning.

2,582 310 Updated Oct 19, 2024
C++ 4,729 509 Updated Jun 6, 2025

MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.

C++ 133 31 Updated Sep 25, 2023

Stores documents and resources used by the OpenXLA developer community

123 26 Updated Aug 2, 2024

Fast and memory-efficient exact attention

Python 17,703 1,721 Updated Jun 4, 2025
Next
0