Stars
[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation
All-in-One Development Tool based on PaddlePaddle
demonstrate how to use vision encoder decoder model
PaddlePaddle Code Convert Toolkit. 『飞桨』深度学习代码转换工具
Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
✨✨Latest Advances on Multimodal Large Language Models
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
A treasure chest for visual classification and recognition powered by PaddlePaddle
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
a state-of-the-art-level open visual language model | 多模态预训练模型
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high …
A Semantic Controllable Self-Supervised Learning Framework to learn general human representations from massive unlabeled human images, which can benefit downstream human-centric tasks to the maximu…
Official PyTorch implementation of "ML-Decoder: Scalable and Versatile Classification Head" (2021)
[CVPR 2023 Highlight] This is the official implementation of "Stitchable Neural Networks".
⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end…
This is an official implementation for "ResT: An Efficient Transformer for Visual Recognition".
PaddleClas ShiTu Image Manager PP-ShiTu 库管理工具
[ICLR2022] official implementation of UniFormer
EfficientFormerV2 [ICCV 2023] & EfficientFormer [NeurIPs 2022]
Download and visualize single or multiple classes from the huge Open Images v4 dataset
🤖 PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+