yofufufufu

Weikai Tang yofufufufu

Interested in HPC/DL System

7 followers · 10 following

Jilin University

Achievements

Highlights

Lists (3)

Sort

GNN

图神经网络相关

6 repositories

Learning

开源课程相关资料

5 repositories

八股

Starred repositories

YukeWang96 / MGG_OSDI23

Artifact for OSDI'23: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Multi-GPU Platforms.

Cuda 40 4 Updated Mar 17, 2024

CisMine / Guide-NVIDIA-Tools

NVIDIA tools guide

Cuda 133 5 Updated Jan 7, 2025

NVIDIA / nvbandwidth

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 443 40 Updated Apr 15, 2025

meta-llama / llama

Inference code for Llama models

Python 58,306 9,778 Updated Jan 26, 2025

meta-llama / llama3

The official Meta Llama 3 GitHub site

Python 28,751 3,393 Updated Jan 26, 2025

mlc-ai / mlc-zh

Python 607 66 Updated Jun 4, 2024

apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Python 12,326 3,591 Updated May 31, 2025

NVIDIA / trt-samples-for-hackathon-cn

Simple samples for TensorRT programming

Python 1,606 348 Updated May 27, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 17,605 1,710 Updated May 22, 2025

siboehm / SGEMM_CUDA

Fast CUDA matrix multiplication from scratch

Cuda 729 112 Updated Dec 28, 2023

Liu-xiandong / How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,054 155 Updated Jul 29, 2023

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥

Cuda 4,552 479 Updated May 28, 2025

lcpu-club / hpcgame_1st_problems

Repository for HPCGame 1st Problems.

Go 62 8 Updated Feb 6, 2024

kaixindelele / ChatPaper

Use ChatGPT to summarize the arXiv papers. 全流程加速科研，利用chatgpt进行论文全文总结+专业翻译+润色+审稿+审稿回复

Python 18,922 1,949 Updated Apr 4, 2024

xai-org / grok-1

Grok open release

Python 50,296 8,352 Updated Aug 30, 2024

lcpu-club / hpc-wiki

Wiki fo HPC

Python 112 10 Updated Jan 13, 2025

NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 11,656 2,197 Updated May 21, 2025

NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

C++ 5,414 639 Updated May 30, 2025

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 10,604 1,459 Updated May 31, 2025