Highlights
- Pro
Lists (3)
Sort Name ascending (A-Z)
Starred repositories
Artifact for OSDI'23: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Multi-GPU Platforms.
A tool for bandwidth measurements on NVIDIA GPUs.
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Simple samples for TensorRT programming
Fast and memory-efficient exact attention
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥
Use ChatGPT to summarize the arXiv papers. 全流程加速科研,利用chatgpt进行论文全文总结+专业翻译+润色+审稿+审稿回复
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…
The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++
Multi-GPU dynamic scheduler using PGAS style cross-GPU communication
Enterprise graph machine learning framework for billion-scale graphs for ML scientists and data scientists.
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
stdgpu: Efficient STL-like Data Structures on the GPU
🎃 GPU load-balancing library for regular and irregular computations.
🔮 ChatGPT Desktop Application (Mac, Windows and Linux)