Stars
Build and publish crates with pyo3, cffi and uniffi bindings as well as rust binaries as python packages
FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.
FlashInfer: Kernel Library for LLM Serving
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
A Datacenter Scale Distributed Inference Serving Framework
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
No fortress, purely open ground. OpenManus is Coming.
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
Elegant reading of real-time and hottest news
DeepEP: an efficient expert-parallel communication library
A high-level distributed programming framework for Rust
Dataframes powered by a multithreaded, vectorized query engine, written in Rust
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
Meet OceanBase, the MySQL compatible distributed database for your cloud native apps. High Performance, High Available, Low Cost, Multi-Cloud. Welcome to our community: https://discord.gg/74cF8vbNEs
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
Optimized primitives for collective multi-GPU communication
Library providing helpers for the Linux kernel io_uring support
amd / blis
Forked from flame/blisBLAS-like Library Instantiation Software Framework
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…
A high-throughput and memory-efficient inference and serving engine for LLMs