8000 MaverickJune (June) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View MaverickJune's full-sized avatar
🏆
Focusing
🏆
Focusing
  • Seoul National University (NXCLAB)

Highlights

  • Pro

Block or report MaverickJune

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Fast CUDA matrix multiplication from scratch

Cuda 723 109 Updated Dec 28, 2023

Using Tree-of-Thought Prompting to boost ChatGPT's reasoning

764 71 Updated Dec 9, 2023

A dummy's guide to setting up (and using) HPC clusters on Ubuntu 22.04LTS using Slurm and Munge. Created by the Quant Club @ UIowa.

313 27 Updated Apr 3, 2024

FlashInfer: Kernel Library for LLM Serving

Cuda 3,011 310 Updated May 21, 2025

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,317 569 Updated Oct 28, 2024

A low-latency & high-throughput serving engine for LLMs

Python 365 47 Updated Apr 18, 2025

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 592 61 Updated Apr 6, 2025

Prod Env

Python 420 63 Updated Oct 9, 2023

A collection of benchmarks and datasets for evaluating LLM.

447 29 Updated Jul 13, 2024

Large Language Model (LLM) Systems Paper List

1,230 69 Updated May 17, 2025

Triton implementation of FlashAttention2 that adds Custom Masks.

Python 113 11 Updated Aug 14, 2024

LongBench v2 and LongBench (ACL 2024)

Python 877 83 Updated Jan 15, 2025

This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"

Python 51 4 Updated Jul 16, 2024

Doing simple retrieval from LLM models at various context lengths to measure accuracy

Jupyter Notebook 1,861 192 Updated Aug 17, 2024
Python 49 2 Updated May 13, 2024

The Hugging Face course on Transformers

MDX 2,955 932 Updated May 21, 2025

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 144,628 29,028 Updated May 21, 2025

The official Meta Llama 3 GitHub site

Python 28,713 3,385 Updated Jan 26, 2025
Python 111 12 Updated Dec 31, 2024

[ICLR 2023] "Learning to Grow Pretrained Models for Efficient Transformer Training" by Peihao Wang, Rameswar Panda, Lucas Torroba Hennigen, Philip Greengard, Leonid Karlinsky, Rogerio Feris, David …

Python 91 10 Updated Feb 26, 2024

MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation

Jupyter Notebook 226 18 Updated Jul 11, 2024

✨✨Latest Advances on Multimodal Large Language Models

15,254 985 Updated May 15, 2025

Reading list for research topics in multimodal machine learning

6,458 881 Updated Aug 20, 2024

A curated list for Efficient Large Language Models

Python 1,665 134 Updated Apr 23, 2025

[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Python 250 17 Updated Aug 31, 2024

Inference code for Llama models

Python 58,256 9,769 Updated Jan 26, 2025

📰 Must-read papers and blogs on Speculative Decoding ⚡️

739 42 Updated May 17, 2025

Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)

Python 103 9 Updated Mar 20, 2025

High-speed Large Language Model Serving for Local Deployment

C++ 8,208 431 Updated Feb 19, 2025

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Python 1,248 75 Updated Mar 6, 2025
Next
0