Stars
SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding 8000 challenges. [NeurIPS 2024]
Development repository for the Triton language and compiler
TritonParse is a tool designed to help developers analyze and debug Triton kernels by visualizing the compilation process and source code mappings.
Code for the paper "VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use"
A benchmark that challenges language models to code solutions for scientific problems
Examples for Recommenders - easy to train and deploy on accelerated infrastructure.
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
Build and deploy stateful agents across federated resources
Github mirror of trition-lang/triton repo.
[ICML'25] Breaking Silos: Adaptive Model Fusion Unlocks Better Time Series Forecasting
Browser script to share and export ChatGPT chat logs to Markdown, JSON, or as Image (PNG)
NoakLiu / FastCache-xDiT
Forked from xdit-project/xDiTFastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation [Efficient ML Model]
A large-scale simulation framework for LLM inference
Free, simple, fast interactive diagrams for any GitHub repository
Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…
alexngng / CUDA-Learn-Note
Forked from xlite-dev/LeetCUDA🎉CUDA 笔记 / 高频面试题汇总 / C++笔记,个人笔记,更新随缘: sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
A Datacenter Scale Distributed Inference Serving Framework
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
FlashInfer: Kernel Library for LLM Serving
This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.
FlagGems is an operator library for large language models implemented in the Triton Language.
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA.🎉
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉