-
Tsinghua University
- Beijing
Stars
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
A tool for bandwidth measurements on NVIDIA GPUs.
推荐/广告/搜索领域工业界经典以及最前沿论文集合。A collection of industry classics and cutting-edge papers in the field of recommendation/advertising/search.
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
Advanced Profiling and Analytics for AMD Hardware
Third party assembler and GEMM library for NVIDIA Kepler GPU
Patches to enable PCIe resizable BARs in the Linux NVIDIA kernel driver
NVIDIA Linux open GPU with P2P support
A self-learning tutorail for CUDA High Performance Programing.
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Long Range Arena for Benchmarking Efficient Transformers
Latency and Memory Analysis of Transformer Models for Training and Inference
Large Language Model (LLM) Systems Paper List
XiaokunDing / typhoon-blade
Forked from blade-build/blade-buildBuilding system of typhoon cloud computing platform of tencent, support C/C++/protobuf/thrift/lex/yacc/swig.
A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.
A model compilation solution for various hardware
A list of awesome compiler projects and papers for tensor computation and deep learning.
MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.
Stores documents and resources used by the OpenXLA developer community
Fast and memory-efficient exact attention