Guo et al., 2019 - Google Patents
Deep multimodal representation learning: A surveyGuo et al., 2019
View PDF- Document ID
- 9278948631551619742
- Author
- Guo W
- Wang J
- Wang S
- Publication year
- Publication venue
- Ieee Access
External Links
Snippet
Multimodal representation learning, which aims to narrow the heterogeneity gap among different modalities, plays an indispensable role in the utilization of ubiquitous multimodal data. Due to the powerful representation ability with multiple levels of abstraction, deep …
- 230000003935 attention 0 abstract description 64
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6261—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation partitioning the feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30017—Multimedia data retrieval; Retrieval of more than one type of audiovisual media
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6279—Classification techniques relating to the number of classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Guo et al. | Deep multimodal representation learning: A survey | |
Zhang et al. | Multimodal intelligence: Representation learning, information fusion, and applications | |
Li et al. | A survey of multi-view representation learning | |
Mu | A survey of recommender systems based on deep learning | |
Baltrušaitis et al. | Multimodal machine learning: A survey and taxonomy | |
Wang et al. | Image captioning with deep bidirectional LSTMs and multi-task learning | |
Dong et al. | Predicting visual features from text for image and video caption retrieval | |
Koohzadi et al. | Survey on deep learning methods in human action recognition | |
Tang et al. | Graph-based multimodal sequential embedding for sign language translation | |
Chen et al. | CAAN: Context-aware attention network for visual question answering | |
Hossain et al. | Text to image synthesis for improved image captioning | |
Zhang et al. | Temporal sentence grounding in videos: A survey and future directions | |
Estevam et al. | Zero-shot action recognition in videos: A survey | |
CN112000818A (en) | Cross-media retrieval method and electronic device for texts and images | |
Sun et al. | Video question answering: a survey of models and datasets | |
Chen et al. | New ideas and trends in deep multimodal content understanding: A review | |
Xu et al. | Deep image captioning: A review of methods, trends and future challenges | |
Cao et al. | A review on multimodal zero‐shot learning | |
Liu et al. | Multimodal emotion recognition based on cascaded multichannel and hierarchical fusion | |
CN117574904A (en) | Named entity recognition method based on contrast learning and multi-modal semantic interaction | |
Abdar et al. | A review of deep learning for video captioning | |
Shou et al. | Adversarial representation with intra-modal and inter-modal graph contrastive learning for multimodal emotion recognition | |
Deng et al. | Multimodal affective computing with dense fusion transformer for inter-and intra-modality interactions | |
Dai et al. | Visual relationship detection based on bidirectional recurrent neural network | |
Zhao et al. | Toward Label-Efficient Emotion and Sentiment Analysis. |