Stars
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, Lei Li, Sishuo Chen, Xu Sun, Lu Hou
Lyricify (/lɪ'rɪsəfaɪ/), a fantastic app to provide scroll lyrics for Spotify and other apps. 一款为 Spotify 等各种应用提供滚动歌词的软件。
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
Open-source Multi-agent Poster Generation from Papers
Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ
(Siggraph Asia 2023) Code of "IconShop: Text-Guided Vector Icon Synthesis with Autoregressive Transformers"
[CVPR 2025] Official implementation for "Empowering LLMs to Understand and Generate Complex Vector Graphics" https://arxiv.org/abs/2412.11102
(CVPR 2025) Code of "Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models"
[CVPR 2024] Official implementation for "SVGDreamer: Text Guided SVG Generation with Diffusion Model" https://arxiv.org/abs/2312.16476
Official code of "LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer"
🔥🔥 UNO: A Universal Customization Method for Both Single and Multi-Subject Conditioning
A curated list of free courses with certifications. Also available at https://free-certifications.com/
A Large-scale Dataset for training and evaluating model's ability on Dense Text Image Generation
OmniSVG is the first family of end-to-end multimodal SVG generators that leverage pre-trained Vision-Language Models (VLMs), capable of generating complex and detailed SVGs, from simple icons to in…
Align Anything: Training All-modality Model with Feedback
StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and te…
V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in MLLMs
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…