Stars
An on-premises, OCR-free unstructured data extraction and benchmarking toolkit. (https://idp-leaderboard.org/)
Official repo of Griffon series including v1(ECCV 2024), v2, and G
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
Semantic segmentation models with 500+ pretrained convolutional and transformer-based backbones.
Efficient vision foundation models for high-resolution generation and perception.
ReadingBank: A Benchmark Dataset for Reading Order Detection
Everything about the SmolLM2 and SmolVLM family of models
OCR, layout analysis, reading order, table recognition in 90+ languages
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding
Compute benchmark of table structure recognition.
A colored formatter for the python logging module
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
Data annotation toolbox supports image, audio and video data.
"GraphAgent: Agentic Graph Language Assistant"
Convert PDF to markdown + JSON quickly with high accuracy
A toolbox for skeleton-based action recognition.
Solve Visual Understanding with Reinforced VLMs
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and cont…
📄 Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVINO, PaddlePaddle and PyTorch.
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
UniTable: Towards a Unified Table Foundation Model
基于序列表格识别算法推理库,集成PP-Structure和modelscope等表格识别算法。
目前已囊括232个大模型,覆盖chatgpt、gpt-4o、o3-mini、谷歌gemini、Claude3.5、智谱GLM-Zero、文心一言、qwen-max、百川、讯飞星火、商汤senseChat、minimax等商用模型, 以及DeepSeek-R1、qwq-32b、deepseek-v3、qwen2.5、llama3.3、phi-4、glm4、gemma3、mistral、书生in…
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception