Stars
Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval (CVPR 2023)
ConceptAttention: A method for interpreting multi-modal diffusion transformers.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/MCP/Docker/Zotero
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
LAVIS - A One-stop Library for Language-Vision Intelligence
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.