-
Zhe Jiang University
- China
Starred repositories
Leveraging passage embeddings for efficient listwise reranking with large language models.
ICCV 2023 Paper Global Features are All You Need for Image Retrieval and Reranking Official Repository
Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training.
Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval
程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
"MiniRAG: Making RAG Simpler with Small and Free Language Models"
R1V, trained with AI feedback, answers open-ended visual questions.
An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Fully open reproduction of DeepSeek-R1
[ECCV'24] Official Implementation of Autoregressive Visual Entity Recognizer.
Witness the aha moment of VLM with less than $3.
ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities
Janus-Series: Unified Multimodal Understanding and Generation Models
Data and code for NeurIPS 2022 Paper "Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering".
Solve Visual Understanding with Reinforced VLMs
Align Anything: Training All-modality Model with Feedback
A very simple GRPO implement for reproducing r1-like LLM thinking.
This is the first released survey paper on hallucinations of large vision-language models (LVLMs). To keep track of this field and continuously update our survey, we maintain this repository of rel…
A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline
code for A Large-scale Dataset for Audio-Language Representation Learning
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…