Mao et al., 2014 - Google Patents
Explain images with multimodal recurrent neural networksMao et al., 2014
View PDF- Document ID
- 17279621765771226004
- Author
- Mao J
- Xu W
- Yang Y
- Wang J
- Yuille A
- Publication year
- Publication venue
- arXiv preprint arXiv:1410.1090
External Links
Snippet
In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel sentence descriptions to explain the content of images. It directly models the probability distribution of generating a word given previous words and the image. Image …
- 230000000306 recurrent 0 title abstract description 33
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/68—Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6288—Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Mao et al. | Explain images with multimodal recurrent neural networks | |
US11593612B2 (en) | Intelligent image captioning | |
Mao et al. | Deep captioning with multimodal recurrent neural networks (m-rnn) | |
CN107688821B (en) | Cross-modal image natural language description method based on visual saliency and semantic attributes | |
CN110852368B (en) | Global and local feature embedding and image-text fusion emotion analysis method and system | |
CN111126069B (en) | Social media short text named entity identification method based on visual object guidance | |
Pang et al. | Text matching as image recognition | |
Kiros et al. | Multimodal neural language models | |
Mao et al. | Learning like a child: Fast novel visual concept learning from sentence descriptions of images | |
Fang et al. | From captions to visual concepts and back | |
Jain et al. | Objects2action: Classifying and localizing actions without any video example | |
Dong et al. | Word2visualvec: Image and video to sentence matching by visual feature prediction | |
Zhang et al. | Discriminative bimodal networks for visual localization and detection with natural language queries | |
CN108628823A (en) | In conjunction with the name entity recognition method of attention mechanism and multitask coordinated training | |
CN111598183B (en) | Multi-feature fusion image description method | |
CN108549658A (en) | A kind of deep learning video answering method and system based on the upper attention mechanism of syntactic analysis tree | |
CN110163117B (en) | Pedestrian re-identification method based on self-excitation discriminant feature learning | |
Gomez et al. | Learning to learn from web data through deep semantic embeddings | |
WO2018196718A1 (en) | Image disambiguation method and device, storage medium, and electronic device | |
Zhang et al. | Image captioning via semantic element embedding | |
CN106845525A (en) | A kind of depth confidence network image bracket protocol based on bottom fusion feature | |
CN113095072B (en) | Text processing method and device | |
CN113627151B (en) | Cross-modal data matching method, device, equipment and medium | |
CN109684928A (en) | Chinese document recognition methods based on Internal retrieval | |
CN110489554B (en) | Attribute-level sentiment classification method based on location-aware mutual attention network model |