Monaci et al., 2009 - Google Patents
Learning bimodal structure in audio–visual dataMonaci et al., 2009
View PDF- Document ID
- 3046412299507226284
- Author
- Monaci G
- Vandergheynst P
- Sommer F
- Publication year
- Publication venue
- IEEE Transactions on Neural Networks
External Links
Snippet
A novel model is presented to learn bimodally informative structures from audio-visual signals. The signal is represented as a sparse sum of audio-visual kernels. Each kernel is a bimodal function consisting of synchronous snippets of an audio waveform and a spatio …
- 230000002902 bimodal 0 title abstract description 25
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6232—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
- G06K9/6247—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Niu et al. | Multimodal spatiotemporal representation for automatic depression level detection | |
Avila et al. | Feature pooling of modulation spectrum features for improved speech emotion recognition in the wild | |
Norman-Haignere et al. | Distinct cortical pathways for music and speech revealed by hypothesis-free voxel decomposition | |
He et al. | Multimodal affective dimension prediction using deep bidirectional long short-term memory recurrent neural networks | |
Rivet et al. | Audiovisual speech source separation: An overview of key methodologies | |
Sargin et al. | Audiovisual synchronization and fusion using canonical correlation analysis | |
Casanovas et al. | Blind audiovisual source separation based on sparse redundant representations | |
Monaci et al. | Learning bimodal structure in audio–visual data | |
CN111461176A (en) | Multi-mode fusion method, device, medium and equipment based on normalized mutual information | |
CN107427250B (en) | Method for presuming perception semantic content through brain activity analysis and presumption | |
Pu et al. | Audio-visual object localization and separation using low-rank and sparsity | |
Finn et al. | Automatic optically-based recognition of speech | |
JP4606800B2 (en) | System for detecting non-stationary signal components and method used in a system for detecting non-stationary signal components | |
Pfister et al. | Robustifying independent component analysis by adjusting for group-wise stationary noise | |
Feather et al. | Model metamers illuminate divergences between biological and artificial neural networks | |
Gul et al. | A survey of audio enhancement algorithms for music, speech, bioacoustics, biomedical, industrial and environmental sounds by image U-Net | |
Haider et al. | SAAMEAT: active feature transformation and selection methods for the recognition of user eating conditions | |
Roweis | Data-driven production models for speech processing | |
CN117711421A (en) | Two-stage voice separation method based on coordination simple attention mechanism | |
Prasath | Design of an integrated learning approach to assist real-time deaf application using voice recognition system | |
CN112687280B (en) | Biodiversity monitoring system with frequency spectrum-time space interface | |
Shaham et al. | Discovery of single independent latent variable | |
Korats et al. | Impact of Window Length and Decorrelation Step on ICA Algorithms for EEG Blind Source Separation. | |
Maniyar et al. | Persons facial image synthesis from audio with Generative Adversarial Networks | |
Beltrán-González et al. | Visual attention priming based on crossmodal expectations |