Lists (8)
Sort Name ascending (A-Z)
Starred repositories
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
GPU programming related news and material links
Research prototype tool for modular formal verification of C, Rust and Java programs
Next-gen language engineering / DSL framework
an educational compiler intermediate representation
The book "Performance Analysis and Tuning on Modern CPU"
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。
A collection of out-of-tree Clang plugins for teaching and learning
A Comprehensive Toolkit for High-Quality PDF Content Extraction
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
Exact inference for discrete probabilistic programs. (Research code, more documentation and ergonomics to come)
Neural Turing Machines (NTM) - PyTorch Implementation
An educational resource to help anyone learn deep reinforcement learning.
🎆Interactive Online Platform that Visualizes Algorithms from Code
Deep Learning papers reading roadmap for anyone who are eager to learn this amazing tech!
Development repository for the Triton language and compiler
A framework for testing compilers' type checkers
Code generator and generated types for Language Server Protocol.
VAST is an experimental compiler pipeline designed for program analysis of C and C++. It provides a tower of IRs as MLIR dialects to choose the best fit representations for a program analysis or fu…
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
Implementation of a Transformer, but completely in Triton