8000 mashijie1028 (Shijie Ma) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View mashijie1028's full-sized avatar
👨‍💻
working
👨‍💻
working

Block or report mashijie1028

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Long Context Transfer from Language to Vision

Python 374 18 Updated Mar 18, 2025

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,718 129 Updated Apr 21, 2025

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 20,732 1,358 Updated May 9, 2025

🎥 Python and OpenCV-based scene cut/transition detection program & library.

Python 3,858 435 Updated May 3, 2025
Python 10 Updated Apr 25, 2025

MAGI-1: Autoregressive Video Generation at Scale

Python 2,968 158 Updated May 8, 2025
Python 43 1 Updated Apr 5, 2025

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

2,266 102 Updated May 4, 2025

Awesome papers & datasets specifically focused on long-term videos.

270 12 Updated Nov 15, 2024

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 5,876 442 Updated Aug 7, 2024

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 10,300 731 Updated May 4, 2025

[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Python 1,030 45 Updated Mar 16, 2025

A Survey of Multimodal Retrieval-Augmented Generation

18 1 Updated Apr 17, 2025

Official code for TPAMI 2025 paper "ProtoGCD: Unified and Unbiased Prototype Learning for Generalized Category 8000 Discovery"

Python 19 Updated Apr 9, 2025

Repository for our paper Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries

3 Updated Apr 15, 2025

[CVPR 2025] Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering

Python 28 Updated Mar 31, 2025

[TMLR 2025🔥] A survey for the autoregressive models in vision.

566 15 Updated Apr 28, 2025

MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models

Python 34 2 Updated Apr 10, 2025

Code implementation of our paper: On Large Multimodal Models as Open-World Image Classifiers

Python 18 Updated Mar 26, 2025
Jupyter Notebook 2,541 346 Updated May 2, 2025

[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant

Python 107 4 Updated Mar 18, 2025

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 2,316 164 Updated May 9, 2025

Fully open reproduction of DeepSeek-R1

Python 24,346 2,237 Updated May 9, 2025

Awesome-RAG-Vision: a curated list of advanced retrieval augmented generation (RAG) for Computer Vision

147 3 Updated Apr 30, 2025

Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".

1,617 109 Updated Aug 20, 2024

A curated list of retrieval-augmented generation (RAG) in large language models

269 20 Updated Feb 14, 2025

Emu Series: Generative Multimodal Models from BAAI

Python 1,719 85 Updated Sep 27, 2024

A Survey on Multimodal Retrieval-Augmented Generation

165 8 Updated Apr 19, 2025
Next
0