Stars
DFloat11: Lossless LLM Compression for Efficient GPU Inference
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
This is an official implementation for "GRIT: Graph Inductive Biases in Transformers without Message Passing".
Awesome-LLM-Tabular: a curated list of Large Language Model applied to Tabular Data
[CIKM2024] Retrieval-enhanced Knowledge Editing in Language Models for Multi-Hop Question Answering
Modeling, training, eval, and inference code for OLMo
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…
Retrieval and Retrieval-augmented LLMs
[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filli…
🚀 Efficient implementations of state-of-the-art linear attention models in Torch and Triton
A curated, but incomplete, list of data-centric AI resources.
Understanding Different Design Choices in Training Large Time Series Models
[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and cont…
Efficient Triton Kernels for LLM Training
Official implementation for Zhong & Le et al., GNNs Also Deserve Editing, and They Need It More Than Once. ICML 2024
GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
📰 Must-read papers and blogs on Speculative Decoding ⚡️
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.
Official Code of The Combinatorial Brain Surgeon: Pruning Weights That Cancel One Another in Neural Networks[ICML2022]
A k-means variation that produces clusters of the same size utilizing the scikit-learn API and related utilities
PyTorch implementation of adversarial attacks [torchattacks]
Collection of China illegal cases about web crawler 本项目用来整理所有中国大陆爬虫开发者涉诉与违规相关的新闻、资料与法律法规。致力于帮助在中国大陆工作的爬虫行业从业者了解我国相关法律,避免触碰数据合规红线。 [AD]企业租显卡算力部署AI请选Novagrid