czhxiaohuihui

czhxiaohuihui

Little by little, one goes far.

19 followers · 104 following

Shang hai

Achievements

Stars

CalebDu / Awesome-Cute

C++ 73 12 Updated May 16, 2025

ROCm / hipBLASLt

hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditional BLAS library

Assembly 97 135 Updated Jun 1, 2025

FlagOpen / FlagGems

FlagGems is an operator library for large language models implemented in the Triton Language.

Python 548 98 Updated Jun 1, 2025

microsoft / AttentionEngine

Python 70 3 Updated May 19, 2025

a-hamdi / GPU

100 days of building GPU kernels!

Cuda 430 43 Updated Apr 27, 2025

liuzengh / design-pattern

Design Patterns In Modern C++ 中文版翻译

C++ 527 85 Updated Oct 3, 2021

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 5,122 341 Updated Jun 1, 2025

ninehills / llm-inference-benchmark

LLM Inference benchmark

Python 419 39 Updated Jul 23, 2024

liguodongiot / llm-resource

LLM全栈优质资源汇总

Shell 565 67 Updated Nov 25, 2024

pytorch-labs / attention-gym

Helpful tools and examples for working with flex-attention

Python 805 47 Updated May 30, 2025

yinuotxie / Efficient-LLM-Inferencing-on-GPUs

Penn CIS 5650 (GPU Programming and Architecture) Final Project

C++ 31 4 Updated Dec 11, 2023

chaoswork / llm_illustrated

看图学大模型

Python 302 19 Updated Jul 30, 2024

baidu / babylon

High-Performance C++ Fundamental Library

C++ 574 80 Updated Dec 17, 2024

neuralmagic / AutoFP8

Python 193 25 Updated May 5, 2025

SmartFlowAI / Llama3-Tutorial

Llama3-Tutorial（XTuner、LMDeploy、OpenCompass）

Python 508 53 Updated May 10, 2024

UmerHA / triton_util

Make triton easier

Python 47 Updated Jun 12, 2024

google-ai-edge / ai-edge-torch

Supporting PyTorch models with the Google AI Edge TFLite runtime.

Jupyter Notebook 613 83 Updated May 31, 2025

fanshiqing / grouped_gemm

Forked from tgale96/grouped_gemm

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 123 38 Updated Jan 2, 2025

HMUNACHI / cuda-tutorials

CUDA tutorials for Maths & ML tutorials with examples, covers multi-gpus, fused attention, winograd convolution, reinforcement learning.

Cuda 183 5 Updated Apr 15, 2025

ifromeast / cuda_learning

learning how CUDA works

Cuda 264 36 Updated Mar 3, 2025

naklecha / llama3-from-scratch

llama3 implementation one matrix multiplication at a time

Jupyter Notebook 14,982 1,268 Updated May 23, 2024

microsoft / GSL

Guidelines Support Library

C++ 6,421 750 Updated May 22, 2025

XiaoSong9905 / CUDA-Optimization-Guide

Xiao's CUDA Optimization Guide [Active Adding New Contents]

297 20 Updated Nov 8, 2022

gpu-mode / resource-stream

GPU programming related news and material links

1,534 88 Updated Jan 6, 2025

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 4,514 452 Updated Feb 9, 2025

km1994 / LLMsNineStoryDemonTower

【LLMs九层妖塔】分享 LLMs在自然语言处理（ChatGLM、Chinese-LLaMA-Alpaca、小羊驼 Vicuna、LLaMA、GPT4ALL等）、信息检索（langchain）、语言合成、语言识别、多模态等领域（Stable Diffusion、MiniGPT-4、VisualGLM-6B、Ziya-Visual等）等实战与经验。

2,059 202 Updated Mar 30, 2024

facebookresearch / fairscale

PyTorch extensions for high performance and large scale training.

Python 3,328 288 Updated Apr 26, 2025

meta-llama / llama3

The official Meta Llama 3 GitHub site

Python 28,753 3,394 Updated Jan 26, 2025

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 26,757 3,071 Updated May 10, 2025

nlohmann / json

JSON for Modern C++

C++ 45,878 7,008 Updated May 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

czhxiaohuihui

Achievements

Achievements

Block or report czhxiaohuihui

Stars

CalebDu / Awesome-Cute

ROCm / hipBLASLt

FlagOpen / FlagGems

microsoft / AttentionEngine

a-hamdi / GPU

liuzengh / design-pattern

linkedin / Liger-Kernel

ninehills / llm-inference-benchmark

liguodongiot / llm-resource

pytorch-labs / attention-gym

yinuotxie / Efficient-LLM-Inferencing-on-GPUs

chaoswork / llm_illustrated

baidu / babylon

neuralmagic / AutoFP8

SmartFlowAI / Llama3-Tutorial

UmerHA / triton_util

google-ai-edge / ai-edge-torch

fanshiqing / grouped_gemm

HMUNACHI / cuda-tutorials

ifromeast / cuda_learning

naklecha / llama3-from-scratch

microsoft / GSL

XiaoSong9905 / CUDA-Optimization-Guide

gpu-mode / resource-stream

gpu-mode / lectures

km1994 / LLMsNineStoryDemonTower

facebookresearch / fairscale

meta-llama / llama3

karpathy / llm.c

nlohmann / json