Stars
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Real-time webcam demo with SmolVLM and llama.cpp server
🏀 Basketball Video Analysis: Leverage automated detection and tracking of players, ball, and team assignments using advanced object tracking, zero-shot classification, and keypoint detection with Y…
SGLang is a fast serving framework for large language models and vision language models.
[WACV 2025] Implementation of RGB2Point:3D Point Cloud Generation from Single RGB Images
CAD-Recode: Reverse Engineering CAD Code from Point Clouds
Official repo and evaluation implementation of VSI-Bench
Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜
Quickly and securely turn your code projects into LLM prompts, all locally on your own machine!
Object detection in soccer scenes trained only with synthetic data (Blender renders)
A simple screen parsing tool towards pure vision based GUI agent
D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement [ICLR 2025 Spotlight]
DATAGEN: AI-driven multi-agent research assistant automating hypothesis generation, data analysis, and report writing. Now expanding into crypto market intelligence. Learn more: https://datagen.dig…
A Rust library integrated with ONNXRuntime, providing a collection of Computer Vison and Vision-Language models.
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Vision infrastructure to turn complex documents into RAG/LLM-ready data
songxxzp / vllm
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
A synthetic data generator for text recognition
Deep Learning based Image Segmentation Model to extract QR code regions from an Image
The official project of paper "Visual Text Processing: A Comprehensive Review and Unified Evaluation""
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
A comprehensive list [Hi-SAM@TPAMI'24, GoMatching@NeurIPS'24, DeepSolo(++)@ CVPR'23, DPText-DETR@AAAI'23, I3CL@IJCV'22] of our research works related to scene text detection, spotting, etc., includ…
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Awesome multilingual OCR and Document Parsing toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools,…
A packaged and flexible version of the CRAFT text detector and Keras CRNN recognition model.