-
Institute of Automation, CAS
- Beijing
-
00:30
(UTC +08:00) - https://mashijie1028.github.io
- https://orcid.org/0009-0005-1131-5686
- https://scholar.google.com/citations?user=pLVzF3cAAAAJ&hl=en
Starred repositories
Long Context Transfer from Language to Vision
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
🎥 Python and OpenCV-based scene cut/transition detection program & library.
MAGI-1: Autoregressive Video Generation at Scale
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
Awesome papers & datasets specifically focused on long-term videos.
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
A Survey of Multimodal Retrieval-Augmented Generation
Official code for TPAMI 2025 paper "ProtoGCD: Unified and Unbiased Prototype Learning for Generalized Category 8000 Discovery"
Repository for our paper Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries
[CVPR 2025] Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
[TMLR 2025🔥] A survey for the autoregressive models in vision.
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
Code implementation of our paper: On Large Multimodal Models as Open-World Image Classifiers
[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Fully open reproduction of DeepSeek-R1
Awesome-RAG-Vision: a curated list of advanced retrieval augmented generation (RAG) for Computer Vision
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
A curated list of retrieval-augmented generation (RAG) in large language models
Emu Series: Generative Multimodal Models from BAAI
A Survey on Multimodal Retrieval-Augmented Generation