Manmadhan et al., 2020 - Google Patents

Visual question answering: a state-of-the-art review

Manmadhan et al., 2020

Document ID: 13994106084701901519
Author: Manmadhan S; Kovoor B
Publication year: 2020
Publication venue: Artificial Intelligence Review

External Links

Cited by

Snippet

Visual question answering (VQA) is a task that has received immense consideration from two major research communities: computer vision and natural language processing. Recently it has been widely accepted as an AI-complete task which can be used as an …

Continue reading at link.springer.com (other versions)

230000000007 visual effect 0 title abstract description 100

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/30675—Query execution
- G06F17/30684—Query execution using natural language analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G06F17/271—Syntactic parsing, e.g. based on context-free grammar [CFG], unification grammars
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30017—Multimedia data retrieval; Retrieval of more than one type of audiovisual media
- G06F17/30023—Querying
- G06F17/30038—Querying based on information manually generated or based on information not derived from the media content, e.g. tags, keywords, comments, usage information, user ratings
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30244—Information retrieval; Database structures therefor; File system structures therefor in image databases
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/18—Digital computers in general; Data processing equipment in general in which a programme is changed according to experience gained by the computer itself during a complete run; Learning machines
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR

Similar Documents

Publication	Publication Date	Title
Manmadhan et al.	2020	Visual question answering: a state-of-the-art review
Uppal et al.	2022	Multimodal research in vision and language: A review of current and emerging trends
Kumar et al.	2020	Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data
Samant et al.	2022	Framework for deep learning-based language models using multi-task learning in natural language understanding: A systematic literature review and future directions
Li et al.	2019	Visual to text: Survey of image and video captioning
Kulkarni et al.	2013	Babytalk: Understanding and generating simple image descriptions
Bernardi et al.	2016	Automatic description generation from images: A survey of models, datasets, and evaluation measures
Guo et al.	2020	LD-MAN: Layout-driven multimodal attention network for online news sentiment recognition
Sharma et al.	2023	A comprehensive survey on image captioning: from handcrafted to deep learning-based techniques, a taxonomy and open research issues
Liu et al.	2022	Fact-based visual question answering via dual-process system
Yang et al.	2019	A comprehensive survey on image aesthetic quality assessment
Chhabra et al.	2023	Multimodal hate speech detection via multi-scale visual kernels and knowledge distillation architecture
Sharma et al.	2023	Evolution of visual data captioning Methods, Datasets, and evaluation Metrics: A comprehensive survey
Dai et al.	2020	Visual relationship detection based on bidirectional recurrent neural network
Paul et al.	2024	A context-sensitive multi-tier deep learning framework for multimodal sentiment analysis
Rehman et al.	2020	Deep Learning Techniques for Future Intelligent Cross-Media Retrieval
Park et al.	2024	SAM: cross-modal semantic alignments module for image-text retrieval
Zhou et al.	2022	Multimodal embedding for lifelog retrieval
Jana et al.	2022	Network embeddings from distributional thesauri for improving static word representations
Dey et al.	2024	How Machine Learning is Innovating Today's World: A Concise Technical Guide
Mangalika	2024	Object Recognition to Content Based Image Retrieval: A Study of the Developments and Applications of Computer Vision
Singh et al.	2018	Neural approaches towards text summarization
Nag	2023	Text-based emotion recognition using contextual phrase embedding model
Town	2005	Ontology based visual information processing.
Müller-Budack	2022	Unsupervised quantification of entity consistency between photos and text in real-world news