Chae et al., 2022 - Google Patents
Uncertainty-based visual question answering: estimating semantic inconsistency between image and knowledge baseChae et al., 2022
View PDF- Document ID
- 3689315751260307147
- Author
- Chae J
- Kim J
- Publication year
- Publication venue
- 2022 International Joint Conference on Neural Networks (IJCNN)
External Links
Snippet
Knowledge-based visual question answering (KVQA) task aims to answer questions that require additional external knowledge as well as an understanding of images and questions. Recent studies on KVQA inject an external knowledge in a multi-modal form, and …
- 230000000007 visual effect 0 title abstract description 10
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation, e.g. computer aided management of electronic mail or groupware; Time management, e.g. calendars, reminders, meetings or time accounting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lu et al. | R-VQA: learning visual relation facts with semantic attention for visual question answering | |
Zellers et al. | From recognition to cognition: Visual commonsense reasoning | |
Messina et al. | Transformer reasoning network for image-text matching and retrieval | |
Zhang et al. | A gated peripheral-foveal convolutional neural network for unified image aesthetic prediction | |
Chen et al. | CAAN: Context-aware attention network for visual question answering | |
Moayeri et al. | Text-to-concept (and back) via cross-model alignment | |
Zablocki et al. | Context-aware zero-shot learning for object recognition | |
Zhang et al. | Relational graph learning for grounded video description generation | |
Yang et al. | Hierarchical scene graph encoder-decoder for image paragraph captioning | |
Xu et al. | Relation-aware compositional zero-shot learning for attribute-object pair recognition | |
Zhang et al. | Hierarchical scene parsing by weakly supervised learning with image descriptions | |
Wang et al. | Deep multi-person kinship matching and recognition for family photos | |
Khan et al. | A deep neural framework for image caption generation using gru-based attention mechanism | |
Li et al. | Inner knowledge-based Img2Doc scheme for visual question answering | |
CN110659392B (en) | Retrieval method and device, and storage medium | |
Lin et al. | Feature Enhancement in Attention for Visual Question Answering. | |
Chae et al. | Uncertainty-based visual question answering: estimating semantic inconsistency between image and knowledge base | |
CN113158672B (en) | Relationship analysis method and device based on news event | |
Wang et al. | Generalised zero-shot learning for entailment-based text classification with external knowledge | |
CN116089644A (en) | Event detection method integrating multi-mode features | |
Elu et al. | Inferring spatial relations from textual descriptions of images | |
Oura et al. | Multimodal Deep Neural Network with Image Sequence Features for Video Captioning | |
Bose et al. | Attention-based multimodal deep learning on vision-language data: models, datasets, tasks, evaluation metrics and applications | |
Prabhakar et al. | Question relevance in visual question answering | |
CN114003708A (en) | Automatic question answering method and device based on artificial intelligence, storage medium and server |