Stars
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)
Supercharge Your LLM Application Evaluations 🚀
DeepEP: an efficient expert-parallel communication library
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
SearXNG is a free internet metasearch engine which aggregates results from various search services and databases. Users are neither tracked nor profiled.
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
LeaderWorkerSet: An API for deploying a group of pods as a unit of replication
FlashMLA: Efficient MLA decoding kernels
FlashInfer: Kernel Library for LLM Serving
A streamlined and customizable framework for efficient large model evaluation and performance benchmarking
SGLang is a fast serving framework for large language models and vision language models.
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 16+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
Underlay and RDMA network solution of the Kubernetes, for bare metal, VM and any public cloud
Fast container image distribution plugin with lazy pulling
Nydus - the Dragonfly image service, providing fast, secure and easy access to container images.
Dragonfly is an open source P2P-based file distribution and image acceleration system. It is hosted by the Cloud Native Computing Foundation (CNCF) as an Incubating Level Project.
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
Making large AI models cheaper, faster and more accessible
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and cont…
Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"
Heterogeneous AI Computing Virtualization Middleware
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations