Mao et al., 2014 - Google Patents

Explain images with multimodal recurrent neural networks

Mao et al., 2014

Document ID: 17279621765771226004
Author: Mao J; Xu W; Yang Y; Wang J; Yuille A
Publication year: 2014
Publication venue: arXiv preprint arXiv:1410.1090

External Links

Cited by

Snippet

In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel sentence descriptions to explain the content of images. It directly models the probability distribution of generating a word given previous words and the image. Image …

Continue reading at arxiv.org (PDF) (other versions)

230000000306 recurrent 0 title abstract description 33

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/68—Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6288—Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis

Similar Documents

Publication	Publication Date	Title
Mao et al.	2014	Explain images with multimodal recurrent neural networks
US11593612B2 (en)	2023-02-28	Intelligent image captioning
Mao et al.	2014	Deep captioning with multimodal recurrent neural networks (m-rnn)
CN107688821B (en)	2021-08-06	Cross-modal image natural language description method based on visual saliency and semantic attributes
CN110852368B (en)	2022-08-26	Global and local feature embedding and image-text fusion emotion analysis method and system
CN111126069B (en)	2022-03-29	Social media short text named entity identification method based on visual object guidance
Pang et al.	2016	Text matching as image recognition
Kiros et al.	2014	Multimodal neural language models
Mao et al.	2015	Learning like a child: Fast novel visual concept learning from sentence descriptions of images
Fang et al.	2015	From captions to visual concepts and back
Jain et al.	2015	Objects2action: Classifying and localizing actions without any video example
Dong et al.	2016	Word2visualvec: Image and video to sentence matching by visual feature prediction
Zhang et al.	2017	Discriminative bimodal networks for visual localization and detection with natural language queries
CN108628823A (en)	2018-10-09	In conjunction with the name entity recognition method of attention mechanism and multitask coordinated training
CN111598183B (en)	2023-08-15	Multi-feature fusion image description method
CN108549658A (en)	2018-09-18	A kind of deep learning video answering method and system based on the upper attention mechanism of syntactic analysis tree
CN110163117B (en)	2021-03-05	Pedestrian re-identification method based on self-excitation discriminant feature learning
Gomez et al.	2018	Learning to learn from web data through deep semantic embeddings
WO2018196718A1 (en)	2018-11-01	Image disambiguation method and device, storage medium, and electronic device
Zhang et al.	2020	Image captioning via semantic element embedding
CN106845525A (en)	2017-06-13	A kind of depth confidence network image bracket protocol based on bottom fusion feature
CN113095072B (en)	2024-06-28	Text processing method and device
CN113627151B (en)	2022-02-22	Cross-modal data matching method, device, equipment and medium
CN109684928A (en)	2019-04-26	Chinese document recognition methods based on Internal retrieval
CN110489554B (en)	2021-06-18	Attribute-level sentiment classification method based on location-aware mutual attention network model