More Web Proxy on the site http://driver.im/

#

large-vision-language-models

Here are 30 public repositories matching this topic...

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

multi-modality instruction-following in-context-learning large-language-models chain-of-thought instruction-tuning visual-instruction-tuning large-vision-language-model multimodal-instruction-tuning large-vision-language-models multimodal-large-language-models multimodal-in-context-learning multimodal-chain-of-thought

Updated Dec 4, 2024

ShareGPT4Omni / ShareGPT4Video

[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

gpt sora text-to-video large-language-models chatgpt large-vision-language-models large-multimodal-models gpt-4v large-video-language-models

Updated Oct 9, 2024
Python

NVlabs / DoRA

[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation

deep-neural-networks deep-learning lora commonsense-reasoning vision-and-language large-language-models parameter-efficient-tuning instruction-tuning large-vision-language-models parameter-efficient-fine-tuning

Updated Oct 1, 2024
Python

BradyFU / Video-MME

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

video mme large-language-models large-vision-language-models multimodal-large-language-models video-mme

Updated Jun 18, 2024

Paranioar / Awesome_Matching_Pretraining_Transfering

The Paper List of Large Multi-Modality Model, Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.

tutorial awesome-list vision-and-language video-text-recognition cross-modal-retrieval visual-semantic-embedding image-text-matching video-text-retrieval image-text-retrieval multimodal-pretraining large-language-models large-vision-language-models multimodal-large-language-models memory-efficient-tuning parameter-efficient-fine-tuning large-vision-models

Updated Jul 11, 2024

YingqingHe / Awesome-LLMs-meet-Multimodal-Generation

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

text-to-speech multimodality text-to-image text-to-audio text-to-video text-to-music multimodal-models aigc large-language-models llm text-to-3d multimodal-generation mllm text-to-sound large-vision-language-models multimodal-large-language-models lvlm

Updated Nov 14, 2024
HTML

tianyi-lab / HallusionBench

[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

benchmark benchmarks lmm hallucination gpt-4 large-language-models llm llava large-vision-language-models vlms gpt-4v

Updated Nov 13, 2024
Python

burglarhobbit / Awesome-Medical-Large-Language-Models

Curated papers on Large Language Models in Healthcare and Medical domain

large-language-models large-vision-language-models multimodal-large-language-models

Updated Aug 7, 2024

khuangaf / Awesome-Chart-Understanding

A curated list of recent and past chart understanding work based on our survey paper: From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models.

awesome-list large-vision-language-models chart-understanding chart-question-answering chart-captioning chart-summarization

Updated Aug 8, 2024

ShareGPT4Omni / ShareGPT4V

[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions

gpt language-model large-language-models chatgpt instruction-tuning vision-language-model large-vision-language-models gpt4v large-multimodal-models gpt-4v eccv2024

Updated Jul 1, 2024
Python

MMStar-Benchmark / MMStar

[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

evaluation multimodality multimodal-learning visual-question-answering multimodal large-language-models llm llms large-vision-language-model large-vision-language-models large-multimodal-models lvlms lvlm

Updated Sep 26, 2024
Python

llmbev / talk2bev

Talk2BEV: Language-Enhanced Bird's Eye View Maps (ICRA'24)

autonomous-driving occupancy-grid-map birds-eye-view gpt-4 large-language-models large-vision-language-models

Updated Nov 4, 2024
Python

yfzhang114 / LLaVA-Align

This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strategy.

hallucination debiasing large-vision-language-models

Updated Mar 28, 2024
Python

yu-rp / apiprompting

[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models

visual-prompting prompting vision-language-model large-vision-language-model large-vision-language-models large-multimodal-models vision-language-models

Updated Oct 10, 2024
Python

ys-zong / VLGuard

[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.

alignment safety large-language-models vision-language-model large-vision-language-models

Updated Aug 21, 2024
Python

NishilBalar / Awesome-LVLM-Hallucination

up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources

mlm hallucination large-language-models llm mllm large-vision-language-models multimodal-large-language-models hallucination-evaluation hallucination-detection vision-language-models lvlm hallucination-mitigation hallucination-survey hallucination-research hallucination-benchmark multimodal-language-model

Updated Dec 3, 2024

FudanDISC / ReForm-Eval

An benchmark for evaluating the capabilities of large vision-language models (LVLMs)

benchmark multimodal pre-training reformulation embodied-ai instruction-following gpt4 in-context-learning large-language-models llm instruction-tuning large-vision-language-models visual-chain-of-thought multimodal-chain-of-thought

Updated Nov 17, 2023
Python

sled-group / moh

Official Repository of Multi-Object Hallucination in Vision-Language Models (NeurIPS 2024)

multimodal large-vision-language-models object-hallucination

Updated Nov 13, 2024
Python

khuangaf / CHOCOLATE

Code and data for the ACL 2024 Findings paper "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning"

factuality faithfulness large-vision-language-models chart-understanding chart-captioning chart-summarization

Updated Jun 5, 2024
Jupyter Notebook

zhaowc-ustc / TabPedia

This repository is the codebase of TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy

deep-learning pytorch large-vision-language-models

Updated Oct 16, 2024
Python

Improve this page

Add a description, image, and links to the large-vision-language-models topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the large-vision-language-models topic, visit your repo's landing page and select "manage topics."