10000 cjmcv (Chen Jianming) / Starred · GitHub

More Web Proxy on the site http://driver.im/

cjmcv

Follow

Chen Jianming cjmcv

Follow

40 followers · 145 following

Achievements

Achievements

Stars

cjmcv / SageAttention

Forked from thu-ml/SageAttention

Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

Cuda 1 Updated Jun 27, 2025

cjmcv / flash-attention

Forked from Dao-AILab/flash-attention

Fast and memory-efficient exact attention

Python 1 Updated Jun 28, 2025

thu-ml / SageAttention

Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

Cuda 1,836 139 Updated Jul 1, 2025

pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,390 609 Updated Jul 2, 2025

leimao / CUDA-GEMM-Optimization

CUDA Matrix Multiplication Optimization

Cuda 196 21 Updated Jul 19, 2024

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 18,114 1,778 Updated Jul 2, 2025

bytedance / InfiniStore

KV cache store for distributed LLM inference

C++ 278 28 Updated Jun 6, 2025

Anduin2017 / HowToCook

程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).

Dockerfile 90,537 10,326 Updated Jul 1, 2025

AsyncFuncAI / deepwiki-open

Open Source DeepWiki: AI-Powered Wiki Generator for GitHub/Gitlab/Bitbucket Repositories. Join the discord: https://discord.gg/gMwThUMeme

TypeScript 7,639 733 Updated Jul 1, 2025

pytorch / extension-cpp

C++ extensions in PyTorch

Python 1,110 233 Updated Jun 13, 2025

ppl-ai / pplx-kernels

Perplexity GPU Kernels

C++ 384 46 Updated Jun 10, 2025

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 12,722 2,891 Updated Jul 2, 2025

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 992 67 Updated May 28, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Python 2,719 169 Updated Jul 1, 2025

pigirons / cpufp

A CPU tool for benchmarking the peak of floating points

Assembly 551 131 Updated May 8, 2025

NVIDIA / accelerated-computing-hub

NVIDIA curated collection of educational resources related to general purpose GPU programming.

Jupyter Notebook 541 96 Updated Jun 30, 2025

thu-pacman / chitu

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Python 1,153 76 Updated Jun 26, 2025

mit-han-lab / smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python 1,433 177 Updated Jul 12, 2024

cjmcv / vllm

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 1 Updated Feb 9, 2025

cjmcv / lighteval

Forked from huggingface/lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

Python 1 Updated Feb 10, 2025

cjmcv / cutlass

Forked from NVIDIA/cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 1 Updated Jun 20, 2025

RRZE-HPC / gpu-benches

collection of benchmarks to measure basic GPU capabilities

C++ 386 55 Updated Feb 11, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA decoding kernels

Cuda 11,637 872 Updated Apr 29, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,239 829 Updated Jul 1, 2025

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 14,478 1,030 Updated Jul 1, 2025

cjmcv / ai-infra-notes

Reading notes on the open source code of AI infrastructure (sglang, llm, cutlass, hpc, etc.)

3 Updated Jun 29, 2025

cjmcv / sglang

Forked from sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 1 Updated Jun 26, 2025

alibaba / rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

C++ 805 68 Updated Jun 3, 2025

NVIDIA / cccl

CUDA Core Compute Libraries

C++ 1,724 231 Updated Jul 2, 2025

3335

huggingface / lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

Python 1,675 300 Updated Jun 30, 2025

0