TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 10,882 1,529 Updated Jun 27, 2025

gpu-mode / triton-index

Cataloging released Triton kernels.

240 12 Updated Jan 10, 2025

mdy666 / mdy_triton

Jupyter Notebook 138 14 Updated Apr 29, 2025

lazyparser / survival-manual-for-interns

给新员工和实习生的生存指南。 Good Luck and Survive!

Python 244 34 Updated Mar 6, 2025

FlagTree / flagtree

FlagTree is a unified compiler for multiple AI chips, which is forked from triton-lang/triton.

C++ 54 11 Updated Jun 27, 2025

micropuma / torch-mlir

Forked from llvm/torch-mlir

The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.

C++ 1 Updated Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leon micropuma

Highlights

Block or report micropuma

Lists (1)

🚀 My stack

Stars

InfiniTensor / InfiniTensor

ByteDance-Seed / Triton-distributed

ppl-ai / pplx-kernels

unslothai / unsloth

linkedin / Liger-Kernel

feynmanliang / 19f-ee290-notes

harrisonliew / cs252_ee290_project

tspeterkim / flash-attention-minimal

tile-ai / tilelang

ray-project / ray

srush / LLM-Training-Puzzles

srush / Transformer-Puzzles

Deep-Learning-Profiling-Tools / triton-viz

microsoft / triton-shared

NVIDIA / TensorRT-LLM

gpu-mode / triton-index

mdy666 / mdy_triton

lazyparser / survival-manual-for-interns

FlagTree / flagtree

micropuma / torch-mlir

arc-research-lab / Aries

UCLA-VAST / Stream-HLS

astral-sh / uv

byrzhm / cuda-examples

Tongkaio / CUDA_Kernel_Samples

stanford-cs336 / assignment1-basics

yzhaiustc / Optimizing-SGEMM-on-NVIDIA-Turing-GPUs

rwitten / HighPerfLLMs2024

zeux / calm

ggml-org / llama.cpp