Stars
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (…
Awesome LLMs on Device: A Comprehensive Survey
A multimodal image search engine built on the GME model, capable of handling diverse input types. Whether you're querying with text, images, or both, provides powerful and flexible image retrieval …
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
collection of diffusion model papers categorized by their subareas
A Comprehensive Toolkit for High-Quality PDF Content Extraction
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
OCR, layout analysis, reading order, table recognition in 90+ languages
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and cont…
My learning notes/codes for ML SYS.
Get your documents ready for gen AI
Layout Conditioned Image Generation, NeurIPS2024
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Large-scale LLM inference engine
A fast inference library for running LLMs locally on modern consumer-class GPUs
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
Create 🔥 videos with Stable Diffusion by exploring the latent space and morphing between text prompts
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
A large scale camera-taken table detection and recognition dataset.
Several simple examples for popular neural network toolkits calling custom CUDA operators.
A simple tool that can generate TensorRT plugin code quickly.
Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.