8000 ziyuhuang123 / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View ziyuhuang123's full-sized avatar

Highlights

  • Pro

Block or report ziyuhuang123

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Enhanced compiler frontend. Support Auto Compute + Auto Schedule + Auto Tensorize for tensor compilers.

C 6 1 Updated Dec 19, 2022
Jupyter Notebook 18 2 Updated Jan 24, 2024

Distributed Compiler Based on Triton for Parallel Systems

Python 849 67 Updated Jun 18, 2025

KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems

Python 435 46 Updated Jun 1, 2025

Github Pages template based upon HTML and Markdown for personal, portfolio-based websites.

HTML 14,578 43,424 Updated Jun 27, 2025

傻瓜式教程——如何使用Clash翻墙

113 5 Updated Jul 4, 2024

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 146,095 29,468 Updated Jun 26, 2025

⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~

Vue 7,573 508 Updated Jun 23, 2025

Fastest kernels written from scratch

Cuda 284 39 Updated Apr 3, 2025
C++ 9 2 Updated Oct 30, 2024

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 377 40 Updated May 14, 2025
Cuda 13 7 Updated Mar 12, 2025

GEMM by WMMA (tensor core)

Cuda 13 9 Updated Jul 31, 2022

Inference code for Llama models

Python 58,427 9,777 Updated Jan 26, 2025

We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstra…

C++ 183 11 Updated Jan 28, 2025

Some microbenchmark practices

Cuda 1 Updated Apr 28, 2023

collection of benchmarks to measure basic GPU capabilities

C++ 386 55 Updated Feb 11, 2025

Tile primitives for speedy kernels

Cuda 2,479 158 Updated Jun 22, 2025

Large Context Attention

Python 716 53 Updated Jan 24, 2025

CUDA Templates for Linear Algebra Subroutines

C++ 7,758 1,286 Updated Jun 26, 2025

[ECCV 2022] Official repository for "MaxViT: Multi-Axis Vision Transformer". SOTA foundation models for classification, detection, segmentation, image quality, and generative modeling...

Jupyter Notebook 476 35 Updated Jun 2, 2023

《动手学深度学习》:面向中文读者、能运行、可讨论。中英文版被70多个国家的500多所大学用于教学。

Python 70,193 11,695 Updated Jul 30, 2024
C++ 27 5 Updated Feb 20, 2024

Parallel Computing - Floyd-Warshall MPI

TeX 3 4 Updated Jan 22, 2017

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 50,838 8,346 Updated Jun 27, 2025

A GPU accelerated implementation of the sieve of Eratosthenes

Cuda 65 16 Updated Dec 18, 2022
Python 12 1 Updated Oct 20, 2023

Exercises for exploring the Fibertree, Timeloop and Accelergy tools

Jupyter Notebook 98 32 Updated Apr 9, 2025

TileFlow is a performance analysis tool based on Timeloop for fusion dataflows

C++ 61 9 Updated Apr 12, 2024
Next
0