-
USTC
- Hefei China
-
20:29
(UTC -12:00)
Lists (7)
Sort Name ascending (A-Z)
Stars
What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness
Frontier Multimodal Foundation Models for Image and Video Understanding
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
A generative world for general-purpose robotics & embodied AI learning.
The official code for "How Control Information Influences Multilingual Text Image Generation and Editing?"
[IJCAI-2024] The official code of Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition
JingyeChen / LLaVolta
Forked from Beckschen/LLaVoltaEfficient Multi-modal Models via Stage-wise Visual Context Compression
AIGC-interview/CV-interview/LLMs-interview面试问题与答案集合仓,同时包含工作和科研过程中的新想法、新问题、新资源与新项目
A curated list of recent diffusion models for video generation, editing, and various other applications.
LLMs interview notes and answers:该仓库主要记录大模型(LLMs)算法工程师相关的面试题和参考答案
Vary-tiny codebase upon LAVIS (for training from scratch)and a PDF image-text pairs data (about 600k including English/Chinese)
Code for "MEBOW: Monocular Estimation of Body Orientation In the Wild", CVPR 2020
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
HaozheZhao / MIC 9B31
MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU
Data and code for NeurIPS 2021 Paper "IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning".
A Parametric Framework to Generate Visual Illusions using Python
[WIP] Layer Diffusion for WebUI (via Forge)
[ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
我的 ComfyUI 工作流合集 | My ComfyUI workflows collection
Official code for "FeatUp: A Model-Agnostic Frameworkfor Features at Any Resolution" ICLR 2024
[CVPR 2024] Official implementation of "DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations"
《代码随想录》LeetCode 刷题攻略:200道经典题目刷题顺序,共60w字的详细图解,视频难点剖析,50余张思维导图,支持C++,Java,Python,Go,JavaScript等多语言版本,从此算法学习不再迷茫!🔥🔥 来看看,你会发现相见恨晚!🚀
A collection of resources on controllable generation with text-to-image diffusion models.
3rd Place, Visual Prompt Tuning Challenge @ CVPR 2023 HIT Workshop (2023)