Stars
Enhanced compiler frontend. Support Auto Compute + Auto Schedule + Auto Tensorize for tensor compilers.
Distributed Compiler Based on Triton for Parallel Systems
KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems
Github Pages template based upon HTML and Markdown for personal, portfolio-based websites.
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~
flash attention tutorial written in python, triton, cuda, cutlass
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstra…
collection of benchmarks to measure basic GPU capabilities
Tile primitives for speedy kernels
[ECCV 2022] Official repository for "MaxViT: Multi-Axis Vision Transformer". SOTA foundation models for classification, detection, segmentation, image quality, and generative modeling...
《动手学深度学习》:面向中文读者、能运行、可讨论。中英文版被70多个国家的500多所大学用于教学。
Parallel Computing - Floyd-Warshall MPI
A high-throughput and memory-efficient inference and serving engine for LLMs
A GPU accelerated implementation of the sieve of Eratosthenes
Exercises for exploring the Fibertree, Timeloop and Accelergy tools
TileFlow is a performance analysis tool based on Timeloop for fusion dataflows