[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Noulas et al., 2007 - Google Patents

On-line multi-modal speaker diarization

Noulas et al., 2007

Document ID
16024346426770079033
Author
Noulas A
Krose B
Publication year
Publication venue
Proceedings of the 9th international conference on Multimodal interfaces

External Links

Snippet

This paper presents a novel framework that utilizes multi-modal information to achieve speaker diarization. We use dynamic Bayesian networks to achieve on-line results. We progress from a simple observation model to a complex multi-modal one as more data …
Continue reading at dl.acm.org (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6267Classification techniques
    • G06K9/6268Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00221Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
    • G06K9/00288Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00221Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
    • G06K9/00268Feature extraction; Face representation
    • G06K9/00281Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • G06N99/005Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00335Recognising movements or behaviour, e.g. recognition of gestures, dynamic facial expressions; Lip-reading
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/36Image preprocessing, i.e. processing the image information without deciding about the identity of the image
    • G06K9/46Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30781Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F17/30784Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Similar Documents

Publication Publication Date Title
Oliver et al. Layered representations for human activity recognition
He et al. Connected component model for multi-object tracking
Zhu et al. Arbitrary talking face generation via attentional audio-visual coherence learning
Wang et al. Greedy batch-based minimum-cost flows for tracking multiple objects
Chiu et al. Gesture generation with low-dimensional embeddings
Stiefelhagen et al. Estimating focus of attention based on gaze and sound
Noulas et al. On-line multi-modal speaker diarization
Bai et al. Predicting the Visual Focus of Attention in Multi-Person Discussion Videos.
CN107590432A (en) A kind of gesture identification method based on circulating three-dimensional convolutional neural networks
Pardas et al. Emotion recognition based on MPEG-4 facial animation parameters
Shahid et al. Voice activity detection by upper body motion analysis and unsupervised domain adaptation
Boccignone et al. Give ear to my face: Modelling multimodal attention to social interactions
Xu et al. Reversible graph neural network-based reaction distribution learning for multiple appropriate facial reactions generation
Cheng et al. Audio-driven talking video frame restoration
Cabañas-Molero et al. Multimodal speaker diarization for meetings using volume-evaluated SRP-PHAT and video analysis
Noulas et al. EM detection of common origin of multi-modal cues
Sheikhi et al. Context aware addressee estimation for human robot interaction
Lathoud et al. Short-term spatio–temporal clustering applied to multiple moving speakers
Deotale et al. Optimized hybrid RNN model for human activity recognition in untrimmed video
Zamzami et al. An accurate evaluation of msd log-likelihood and its application in human action recognition
Pereira et al. Cross-layer classification framework for automatic social behavioural analysis in surveillance scenario
Ba et al. Head pose tracking and focus of attention recognition algorithms in meeting rooms
Vrochidis et al. A Deep Learning Framework for Monitoring Audience Engagement in Online Video Events
Pnevmatikakis et al. Robust multimodal audio–visual processing for advanced context awareness in smart spaces
JP5931021B2 (en) Personal recognition tendency model learning device, personal recognition state estimation device, personal recognition tendency model learning method, personal recognition state estimation method, and program