-
Tianjin University
- Tianjin
Highlights
- Pro
Stars
Efficient Triton Kernels for LLM Training
Minimalistic large language model 3D-parallelism training
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Implementation of FlashAttention in PyTorch
Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention Disaggregation
A Datacenter Scale Distributed Inference Serving Framework
a minimal cache manager for PagedAttention, on top of llama3.
Solutions to Tensor puzzles by Sasha Rush - https://github.com/srush/Triton-Puzzles
High performance Transformer implementation in C++.
A curated list of resources dedicated to open source GitHub repositories related to ChatGPT and OpenAI API
Dynamic Memory Management for Serving LLMs without PagedAttention
how to optimize some algorithm in cuda.
FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs o…
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA.
A collection of memory efficient attention operators implemented in the Triton language.
REST: Retrieval-Based Speculative Decoding, NAACL 2024
cadedaniel / vllm-public
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
ymwangg / vllm
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
This repo is used to assess NSL's scientific research assistants.
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
🦖 𝗟𝗲𝗮𝗿𝗻 about 𝗟𝗟𝗠𝘀, 𝗟𝗟𝗠𝗢𝗽𝘀, and 𝘃𝗲𝗰𝘁𝗼𝗿 𝗗𝗕𝘀 for free by designing, training, and deploying a real-time financial advisor LLM system ~ 𝘴𝘰𝘶𝘳𝘤𝘦 𝘤𝘰𝘥𝘦 + 𝘷𝘪𝘥𝘦𝘰 & 𝘳𝘦𝘢𝘥𝘪𝘯𝘨 𝘮𝘢𝘵𝘦𝘳𝘪𝘢𝘭𝘴
📰 Must-read papers and blogs on Speculative Decoding ⚡️
The simplest, fastest repository for training/finetuning medium-sized GPTs.