More Web Proxy on the site http://driver.im/

#

vqa

Here are 261 public repositories matching this topic...

facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

deep-learning dialog pytorch vqa pretrained-models captioning multimodal multi-tasking textvqa hateful-memes

Updated Nov 15, 2024
Python

OpenGVLab / InternGPT

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

Updated Aug 20, 2024
Python

BDBC-KG-NLP / QA-Survey-CN

北京航空航天大学大数据高精尖中心自然语言处理研究团队开展了智能问答的研究与应用总结。包括基于知识图谱的问答（KBQA），基于文本的问答系统（TextQA），基于表格的问答系统（TableQA）、基于视觉的问答系统（VisualQA）和机器阅读理解（MRC）等，每类任务分别对学术界和工业界进行了相关总结。

nlp qa survey vqa question-answering cqa kbqa qa-survey tqa

Updated Apr 6, 2023

open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks

computer-vision evaluation pytorch gemini openai vqa vit gpt multi-modal clip claude openai-api gpt4 large-language-models llm chatgpt llava qwen gpt-4v

Updated Dec 9, 2024
Python

peteanderson80 / bottom-up-attention

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

caffe vqa faster-rcnn image-captioning captioning-images mscoco mscoco-dataset visual-question-answering

Updated Feb 3, 2023
Jupyter Notebook

roboflow / maestro

streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, and Qwen2-VL

transformers vqa objectdetection captioning fine-tuning multimodal vision-and-language phi-3-vision paligemma florence-2

Updated Dec 9, 2024
Python

NVlabs / prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

vqa image-captioning language-model multi-task-learning vision-and-language multi-modal-learning vision-language-model

Updated Jan 17, 2024
Python

Oscar

microsoft / Oscar

Oscar and VinVL

vqa image-captioning oscar vision-and-language pre-training image-text-search vinvl

Updated Aug 28, 2023
Python

hila-chefer / Transformer-MM-Explainability

[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

visualization transformers transformer vqa clip interpretability explainable-ai explainability detr lxmert visualbert

Updated Aug 24, 2023
Jupyter Notebook

hengyuan-hu / bottom-up-attention-vqa

An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.

pytorch vqa bottom-up-attention

Updated Mar 10, 2024
Python

Cadene / vqa.pytorch

Visual Question Answering in Pytorch

deep-learning torch pytorch vqa coco resnet skipthoughts clevr vgenome

Updated Dec 11, 2019
Python

jayleicn / ClipBERT

[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.

pytorch vqa vision-and-language video-retrieval video-question-answering cvpr2021

Updated Aug 8, 2023
Python

jokieleung / awesome-visual-question-answering

A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.

vqa awesome-list multi-modal multi-modal-learning attention-networks

Updated Jul 6, 2023

stanfordnlp / mac-network

Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)

tensorflow vqa question-answering attention clevr machine-reasoning compositional-attention-networks

Updated Jul 10, 2021
Python

OpenGVLab / Multi-Modality-Arena

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

chat chatbot vqa gradio multi-modality large-language-models llms chatgpt vision-language-model

Updated Apr 21, 2024
Python

chingyaoc / awesome-vqa

Visual Q&A reading list

vqa arxiv papers

Updated Oct 7, 2018

vacancy / NSCL-PyTorch-Release

PyTorch implementation for the Neuro-Symbolic Concept Learner (NS-CL).

vqa concept-learning neuro-symbolic-learning

Updated Oct 24, 2020
Python

davidmascharka / tbd-nets

PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

visualization machine-learning deep-learning pytorch neural-networks vqa visual-question-answering

Updated Dec 7, 2021
Jupyter Notebook

MILVLG / openvqa

A lightweight, scalable, and general framework for visual question answering research

benchmark deep-learning pytorch vqa visual-question-answering

Updated Sep 3, 2021
Python

FuxiaoLiu / LRV-Instruction

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

evaluation vision vqa llama object-detection gpt evaluation-metrics iclr multimodal vision-and-language hallucination vicuna gpt-4 foundation-models prompt-engineering chatgpt llava iclr2024

Updated Mar 13, 2024
Python

Improve this page

Add a description, image, and links to the vqa topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vqa topic, visit your repo's landing page and select "manage topics."