shenh10

🏠

Working from home

Han Shen shenh10

🏠

Working from home

66 followers · 36 following

Tsinghua University
Beijing

Achievements

Organizations

Stars

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,413 611 Updated May 27, 2025

NVIDIA / nvbandwidth

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 449 40 Updated Apr 15, 2025

tangxyw / RecSysPapers

推荐/广告/搜索领域工业界经典以及最前沿论文集合。A collection of industry classics and cutting-edge papers in the field of recommendation/advertising/search.

Python 1,789 242 Updated May 27, 2025

NVIDIA / multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 721 126 Updated Feb 21, 2025

KwaiVGI / LivePortrait

Bring portraits to life!

Python 16,025 1,675 Updated Jun 1, 2025

ROCm / rocprofiler-compute

Advanced Profiling and Analytics for AMD Hardware

Python 156 58 Updated Jun 6, 2025

PAA-NCIC / PPoPP2017_artifact

Third party assembler and GEMM library for NVIDIA Kepler GPU

CSS 81 20 Updated Oct 8, 2019

JuliaComputing / nvidia-driver-pcie-rebar

Patches to enable PCIe resizable BARs in the Linux NVIDIA kernel driver

Makefile 16 4 Updated Apr 22, 2022

tinygrad / open-gpu-kernel-modules

Forked from NVIDIA/open-gpu-kernel-modules

NVIDIA Linux open GPU with P2P support

C 1,160 114 Updated Jun 6, 2025

PaddleJitLab / CUDATutorial

A self-learning tutorail for CUDA High Performance Programing.

JavaScript 637 69 Updated Apr 12, 2025

AlibabaResearch / AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

C++ 1,724 193 Updated Apr 9, 2025

google-research / long-range-arena

Long Range Arena for Benchmarking Efficient Transformers

Python 757 85 Updated Dec 16, 2023

cli99 / llm-analysis

Latency and Memory Analysis of Transformer Models for Training and Inference

Python 425 49 Updated Apr 19, 2025

NVIDIA / CUDALibrarySamples

CUDA Library Samples

Cuda 1,966 393 Updated Jun 6, 2025

NVIDIA / nsight-training

Training material for Nsight developer tools

C 158 36 Updated Aug 8, 2024

AmberLJC / LLMSys-PaperList

Large Language Model (LLM) Systems Paper List

1,268 70 Updated Jun 5, 2025

anilshanbhag / gpu-topk

Efficient Top-K implementation on the GPU

Cuda 179 21 Updated Apr 9, 2019

spf13 / spf13-vim

The ultimate vim distribution

Vim Script 15,562 3,605 Updated Nov 4, 2023

XiaokunDing / typhoon-blade

Forked from blade-build/blade-build

Building system of typhoon cloud computing platform of tencent, support C/C++/protobuf/thrift/lex/yacc/swig.

Python 1 Updated Mar 10, 2016

pytorch / PiPPy

Pipeline Parallelism for PyTorch

Python 767 86 Updated Aug 21, 2024

google / yggdrasil-decision-forests

A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.

C++ 587 61 Updated Jun 5, 2025

bytedance / byteir

A model compilation solution for various hardware

MLIR 437 48 Updated Jun 4, 2025

onnx / optimizer

ONNX Optimizer

C++ 718 96 Updated May 29, 2025

daquexian / onnx-simplifier

Simplify your onnx model

C++ 4,091 400 Updated Sep 3, 2024

merrymercy / awesome-tensor-compilers

A list of awesome compiler projects and papers for tensor computation and deep learning.

2,582 310 Updated Oct 19, 2024

google / tcmalloc

C++ 4,729 509 Updated Jun 6, 2025

mmperf / mmperf

MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.

C++ 133 31 Updated Sep 25, 2023

flame / how-to-optimize-gemm

C 1,884 357 Updated Jul 29, 2023

openxla / community

Stores documents and resources used by the OpenXLA developer community

123 26 Updated Aug 2, 2024

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 17,703 1,721 Updated Jun 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Han Shen shenh10

Achievements

Achievements

Organizations

Block or report shenh10

Stars

deepseek-ai / DeepGEMM

NVIDIA / nvbandwidth

tangxyw / RecSysPapers

NVIDIA / multi-gpu-programming-models

KwaiVGI / LivePortrait

ROCm / rocprofiler-compute

PAA-NCIC / PPoPP2017_artifact

JuliaComputing / nvidia-driver-pcie-rebar

tinygrad / open-gpu-kernel-modules

PaddleJitLab / CUDATutorial

AlibabaResearch / AdvancedLiterateMachinery

google-research / long-range-arena

cli99 / llm-analysis

NVIDIA / CUDALibrarySamples

NVIDIA / nsight-training

AmberLJC / LLMSys-PaperList

anilshanbhag / gpu-topk

spf13 / spf13-vim

XiaokunDing / typhoon-blade

pytorch / PiPPy

google / yggdrasil-decision-forests

bytedance / byteir

onnx / optimizer

daquexian / onnx-simplifier

merrymercy / awesome-tensor-compilers

google / tcmalloc

mmperf / mmperf

flame / how-to-optimize-gemm

openxla / community

Dao-AILab / flash-attention