10000 dutsc (Chen Shen) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View dutsc's full-sized avatar
  • Tianjin University
  • Tianjin

Highlights

  • Pro

Block or report dutsc

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

NCCL Tests

Cuda 1,157 290 Updated Jun 6, 2025

Run LLMs with MLX

Python 1,146 144 Updated Jun 26, 2025

Efficient Triton Kernels for LLM Training

Python 5,269 357 Updated Jun 25, 2025

Minimalistic large language model 3D-parallelism training

Python 1,944 197 Updated Jun 25, 2025

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 38,771 4,722 Updated Jun 2, 2025

Implementation of FlashAttention in PyTorch

Python 153 18 Updated Jan 12, 2025

Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention Disaggregation

24 Updated Mar 24, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 4,349 447 Updated Jun 26, 2025

算法竞赛模板库 by 灵茶山艾府 💭💡🎈

Go 6,876 703 Updated Jun 23, 2025

a minimal cache manager for PagedAttention, on top of llama3.

Python 92 9 Updated Aug 26, 2024

Solutions to Tensor puzzles by Sasha Rush - https://github.com/srush/Triton-Puzzles

Jupyter Notebook 3 Updated Aug 10, 2024

High performance Transformer implementation in C++.

C++ 125 16 Updated Jan 18, 2025

A curated list of resources dedicated to open source GitHub repositories related to ChatGPT and OpenAI API

2,575 310 Updated Jun 25, 2025

Dynamic Memory Management for Serving LLMs without PagedAttention

C 397 31 Updated May 30, 2025

how to optimize some algorithm in cuda.

Cuda 2,280 205 Updated Jun 26, 2025

Puzzles for learning Triton

Jupyter Notebook 1,726 137 Updated Nov 18, 2024

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs o…

Python 3,882 751 Updated May 12, 2025

LLM training in simple, raw C/CUDA

Cuda 26,972 3,096 Updated Jun 26, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA.

Cuda 4,897 536 Updated Jun 21, 2025

A collection of memory efficient attention operators implemented in the Triton language.

Python 272 18 Updated Jun 5, 2024

Material for gpu-mode lectures

Jupyter Notebook 4,637 467 Updated Jun 18, 2025

REST: Retrieval-Based Speculative Decoding, NAACL 2024

C 204 15 Updated Dec 2, 2024

基于mkdocs的文档网站

HTML 1 Updated Jun 12, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 4 Updated Aug 9, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 3 Updated Jul 30, 2024

This repo is used to assess NSL's scientific research assistants.

Python 12 11 Updated Jul 5, 2024

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,550 179 Updated Jun 25, 2024

🦖 𝗟𝗲𝗮𝗿𝗻 about 𝗟𝗟𝗠𝘀, 𝗟𝗟𝗠𝗢𝗽𝘀, and 𝘃𝗲𝗰𝘁𝗼𝗿 𝗗𝗕𝘀 for free by designing, training, and deploying a real-time financial advisor LLM system ~ 𝘴𝘰𝘶𝘳𝘤𝘦 𝘤𝘰𝘥𝘦 + 𝘷𝘪𝘥𝘦𝘰 & 𝘳𝘦𝘢𝘥𝘪𝘯𝘨 𝘮𝘢𝘵𝘦𝘳𝘪𝘢𝘭𝘴

Jupyter Notebook 3,291 529 Updated Dec 9, 2024

📰 Must-read papers and blogs on Speculative Decoding ⚡️

809 45 Updated Jun 22, 2025

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 42,230 7,043 Updated Dec 9, 2024
Next
0