[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Monaci et al., 2009 - Google Patents

Learning bimodal structure in audio–visual data

Monaci et al., 2009

View PDF
Document ID
3046412299507226284
Author
Monaci G
Vandergheynst P
Sommer F
Publication year
Publication venue
IEEE Transactions on Neural Networks

External Links

Snippet

A novel model is presented to learn bimodally informative structures from audio-visual signals. The signal is represented as a sparse sum of audio-visual kernels. Each kernel is a bimodal function consisting of synchronous snippets of an audio waveform and a spatio …
Continue reading at infoscience.epfl.ch (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06K9/6232Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
    • G06K9/6247Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/36Image preprocessing, i.e. processing the image information without deciding about the identity of the image
    • G06K9/46Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00221Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification

Similar Documents

Publication Publication Date Title
Niu et al. Multimodal spatiotemporal representation for automatic depression level detection
Avila et al. Feature pooling of modulation spectrum features for improved speech emotion recognition in the wild
Norman-Haignere et al. Distinct cortical pathways for music and speech revealed by hypothesis-free voxel decomposition
He et al. Multimodal affective dimension prediction using deep bidirectional long short-term memory recurrent neural networks
Rivet et al. Audiovisual speech source separation: An overview of key methodologies
Sargin et al. Audiovisual synchronization and fusion using canonical correlation analysis
Casanovas et al. Blind audiovisual source separation based on sparse redundant representations
Monaci et al. Learning bimodal structure in audio–visual data
CN111461176A (en) Multi-mode fusion method, device, medium and equipment based on normalized mutual information
CN107427250B (en) Method for presuming perception semantic content through brain activity analysis and presumption
Pu et al. Audio-visual object localization and separation using low-rank and sparsity
Finn et al. Automatic optically-based recognition of speech
JP4606800B2 (en) System for detecting non-stationary signal components and method used in a system for detecting non-stationary signal components
Pfister et al. Robustifying independent component analysis by adjusting for group-wise stationary noise
Feather et al. Model metamers illuminate divergences between biological and artificial neural networks
Gul et al. A survey of audio enhancement algorithms for music, speech, bioacoustics, biomedical, industrial and environmental sounds by image U-Net
Haider et al. SAAMEAT: active feature transformation and selection methods for the recognition of user eating conditions
Roweis Data-driven production models for speech processing
CN117711421A (en) Two-stage voice separation method based on coordination simple attention mechanism
Prasath Design of an integrated learning approach to assist real-time deaf application using voice recognition system
CN112687280B (en) Biodiversity monitoring system with frequency spectrum-time space interface
Shaham et al. Discovery of single independent latent variable
Korats et al. Impact of Window Length and Decorrelation Step on ICA Algorithms for EEG Blind Source Separation.
Maniyar et al. Persons facial image synthesis from audio with Generative Adversarial Networks
Beltrán-González et al. Visual attention priming based on crossmodal expectations