Stars
This repository delivers end-to-end, code-first tutorials covering every layer of production-grade GenAI agents, guiding you from spark to scale with proven patterns and reusable blueprints for re…
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems.
An open-source AI agent that brings the power of Gemini directly into your terminal.
An open protocol enabling communication and interoperability between opaque agentic applications.
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
Command-line tools for managing OCI model artifacts, which are bundled based on Model Spec
The Fastest Distributed Database for Transactional, Analytical, and AI Workloads. Welcome to our community: https://discord.gg/74cF8vbNEs
High-performance safetensors model loader
Cloud Native Artifacial Intelligence Model Format Specification
SGLang is a fast serving framework for large language models and vision language models.
Distributed Compiler based on Triton for Parallel Systems
Ling is a MoE LLM provided and open-sourced by InclusionAI.
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
DeepEP: an efficient expert-parallel communication library
AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.
A framework for few-shot evaluation of language models.
Supercharge Your LLM with the Fastest KV Cache Layer
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
Optimized primitives for collective multi-GPU communication
FlashInfer: Kernel Library for LLM Serving
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡