Noulas et al., 2007 - Google Patents
On-line multi-modal speaker diarizationNoulas et al., 2007
- Document ID
- 16024346426770079033
- Author
- Noulas A
- Krose B
- Publication year
- Publication venue
- Proceedings of the 9th international conference on Multimodal interfaces
External Links
Snippet
This paper presents a novel framework that utilizes multi-modal information to achieve speaker diarization. We use dynamic Bayesian networks to achieve on-line results. We progress from a simple observation model to a complex multi-modal one as more data …
- 238000000034 method 0 abstract description 15
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00288—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00268—Feature extraction; Face representation
- G06K9/00281—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00335—Recognising movements or behaviour, e.g. recognition of gestures, dynamic facial expressions; Lip-reading
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Oliver et al. | Layered representations for human activity recognition | |
He et al. | Connected component model for multi-object tracking | |
Zhu et al. | Arbitrary talking face generation via attentional audio-visual coherence learning | |
Wang et al. | Greedy batch-based minimum-cost flows for tracking multiple objects | |
Chiu et al. | Gesture generation with low-dimensional embeddings | |
Stiefelhagen et al. | Estimating focus of attention based on gaze and sound | |
Noulas et al. | On-line multi-modal speaker diarization | |
Bai et al. | Predicting the Visual Focus of Attention in Multi-Person Discussion Videos. | |
CN107590432A (en) | A kind of gesture identification method based on circulating three-dimensional convolutional neural networks | |
Pardas et al. | Emotion recognition based on MPEG-4 facial animation parameters | |
Shahid et al. | Voice activity detection by upper body motion analysis and unsupervised domain adaptation | |
Boccignone et al. | Give ear to my face: Modelling multimodal attention to social interactions | |
Xu et al. | Reversible graph neural network-based reaction distribution learning for multiple appropriate facial reactions generation | |
Cheng et al. | Audio-driven talking video frame restoration | |
Cabañas-Molero et al. | Multimodal speaker diarization for meetings using volume-evaluated SRP-PHAT and video analysis | |
Noulas et al. | EM detection of common origin of multi-modal cues | |
Sheikhi et al. | Context aware addressee estimation for human robot interaction | |
Lathoud et al. | Short-term spatio–temporal clustering applied to multiple moving speakers | |
Deotale et al. | Optimized hybrid RNN model for human activity recognition in untrimmed video | |
Zamzami et al. | An accurate evaluation of msd log-likelihood and its application in human action recognition | |
Pereira et al. | Cross-layer classification framework for automatic social behavioural analysis in surveillance scenario | |
Ba et al. | Head pose tracking and focus of attention recognition algorithms in meeting rooms | |
Vrochidis et al. | A Deep Learning Framework for Monitoring Audience Engagement in Online Video Events | |
Pnevmatikakis et al. | Robust multimodal audio–visual processing for advanced context awareness in smart spaces | |
JP5931021B2 (en) | Personal recognition tendency model learning device, personal recognition state estimation device, personal recognition tendency model learning method, personal recognition state estimation method, and program |