ziyuhuang123

ziyuhuang123

5 followers · 1 following

Achievements

Highlights

Stars

KnowingNothing / Ditto

Enhanced compiler frontend. Support Auto Compute + Auto Schedule + Auto Tensorize for tensor compilers.

C 6 1 Updated Dec 19, 2022

summerspringwei / souffle-ae

Jupyter Notebook 18 2 Updated Jan 24, 2024

ByteDance-Seed / Triton-distributed

Distributed Compiler Based on Triton for Parallel Systems

Python 849 67 Updated Jun 18, 2025

ScalingIntelligence / KernelBench

KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems

Python 435 46 Updated Jun 1, 2025

academicpages / academicpages.github.io

Github Pages template based upon HTML and Markdown for personal, portfolio-based websites.

HTML 14,578 43,424 Updated Jun 27, 2025

Sha1rholder / Clash-against-GFW

傻瓜式教程——如何使用Clash翻墙

113 5 Updated Jul 4, 2024

huggingface / transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 146,095 29,468 Updated Jun 26, 2025

ccfddl / ccf-deadlines

⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~

Vue 7,573 508 Updated Jun 23, 2025

pranjalssh / fast.cu

Fastest kernels written from scratch

Cuda 284 39 Updated Apr 3, 2025

lenLRX / HopperTest

C++ 9 2 Updated Oct 30, 2024

66RING / tiny-flash-attention

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 377 40 Updated May 14, 2025

Ratbuyer / h100-features

Cuda 13 7 Updated Mar 12, 2025

gty111 / GEMM_WMMA

GEMM by WMMA (tensor core)

Cuda 13 9 Updated Jul 31, 2022

meta-llama / llama

Inference code for Llama models

Python 58,427 9,777 Updated Jan 26, 2025

TiledTensor / TiledCUDA

We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstra…

C++ 183 11 Updated Jan 28, 2025

ColfaxResearch / cfx-article-src

C++ 118 26 Updated May 7, 2025

galeselee / microbenchmark

Some microbenchmark practices

Cuda 1 Updated Apr 28, 2023

RRZE-HPC / gpu-benches

collection of benchmarks to measure basic GPU capabilities

C++ 386 55 Updated Feb 11, 2025

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 2,479 158 Updated Jun 22, 2025

haoliuhl / ringattention

Large Context Attention

Python 716 53 Updated Jan 24, 2025

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 7,758 1,286 Updated Jun 26, 2025

google-research / maxvit

[ECCV 2022] Official repository for "MaxViT: Multi-Axis Vision Transformer". SOTA foundation models for classification, detection, segmentation, image quality, and generative modeling...

Jupyter Notebook 476 35 Updated Jun 2, 2023

d2l-ai / d2l-zh

《动手学深度学习》：面向中文读者、能运行、可讨论。中英文版被70多个国家的500多所大学用于教学。

Python 70,193 11,695 Updated Jul 30, 2024

microsoft / cusync

C++ 27 5 Updated Feb 20, 2024

nuno-azevedo / floyd-warshall-mpi

Parallel Computing - Floyd-Warshall MPI

TeX 3 4 Updated Jan 22, 2017

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 50,838 8,346 Updated Jun 27, 2025

curtisseizert / CUDASieve

A GPU accelerated implementation of the sieve of Eratosthenes

Cuda 65 16 Updated Dec 18, 2022

KnowingNothing / Domino

Python 12 1 Updated Oct 20, 2023

Accelergy-Project / timeloop-accelergy-exercises

Exercises for exploring the Fibertree, Timeloop and Accelergy tools

Jupyter Notebook 98 32 Updated Apr 9, 2025

pku-liang / TileFlow

TileFlow is a performance analysis tool based on Timeloop for fusion dataflows

C++ 61 9 Updated Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ziyuhuang123

Achievements

Achievements

Highlights

Block or report ziyuhuang123

Stars

KnowingNothing / Ditto

summerspringwei / souffle-ae

ByteDance-Seed / Triton-distributed

ScalingIntelligence / KernelBench

academicpages / academicpages.github.io

Sha1rholder / Clash-against-GFW

huggingface / transformers

ccfddl / ccf-deadlines

pranjalssh / fast.cu

lenLRX / HopperTest

66RING / tiny-flash-attention

Ratbuyer / h100-features

gty111 / GEMM_WMMA

meta-llama / llama

TiledTensor / TiledCUDA

ColfaxResearch / cfx-article-src

galeselee / microbenchmark

RRZE-HPC / gpu-benches

HazyResearch / ThunderKittens

haoliuhl / ringattention

NVIDIA / cutlass

google-research / maxvit

d2l-ai / d2l-zh

microsoft / cusync

nuno-azevedo / floyd-warshall-mpi

vllm-project / vllm

curtisseizert / CUDASieve

KnowingNothing / Domino

Accelergy-Project / timeloop-accelergy-exercises

pku-liang / TileFlow