Noulas et al., 2007 - Google Patents

On-line multi-modal speaker diarization

Noulas et al., 2007

Document ID: 16024346426770079033
Author: Noulas A; Krose B
Publication year: 2007
Publication venue: Proceedings of the 9th international conference on Multimodal interfaces

External Links

Cited by

Snippet

This paper presents a novel framework that utilizes multi-modal information to achieve speaker diarization. We use dynamic Bayesian networks to achieve on-line results. We progress from a simple observation model to a complex multi-modal one as more data …

Continue reading at dl.acm.org (other versions)

238000000034 method 0 abstract description 15

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00288—Classification, e.g. identification
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00268—Feature extraction; Face representation
- G06K9/00281—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00335—Recognising movements or behaviour, e.g. recognition of gestures, dynamic facial expressions; Lip-reading
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems

Similar Documents

Publication	Publication Date	Title
Oliver et al.	2002	Layered representations for human activity recognition
He et al.	2016	Connected component model for multi-object tracking
Zhu et al.	2018	Arbitrary talking face generation via attentional audio-visual coherence learning
Wang et al.	2017	Greedy batch-based minimum-cost flows for tracking multiple objects
Chiu et al.	2014	Gesture generation with low-dimensional embeddings
Stiefelhagen et al.	2001	Estimating focus of attention based on gaze and sound
Noulas et al.	2007	On-line multi-modal speaker diarization
Bai et al.	2019	Predicting the Visual Focus of Attention in Multi-Person Discussion Videos.
CN107590432A (en)	2018-01-16	A kind of gesture identification method based on circulating three-dimensional convolutional neural networks
Pardas et al.	2002	Emotion recognition based on MPEG-4 facial animation parameters
Shahid et al.	2019	Voice activity detection by upper body motion analysis and unsupervised domain adaptation
Boccignone et al.	2018	Give ear to my face: Modelling multimodal attention to social interactions
Xu et al.	2023	Reversible graph neural network-based reaction distribution learning for multiple appropriate facial reactions generation
Cheng et al.	2021	Audio-driven talking video frame restoration
Cabañas-Molero et al.	2018	Multimodal speaker diarization for meetings using volume-evaluated SRP-PHAT and video analysis
Noulas et al.	2006	EM detection of common origin of multi-modal cues
Sheikhi et al.	2013	Context aware addressee estimation for human robot interaction
Lathoud et al.	2007	Short-term spatio–temporal clustering applied to multiple moving speakers
Deotale et al.	2022	Optimized hybrid RNN model for human activity recognition in untrimmed video
Zamzami et al.	2019	An accurate evaluation of msd log-likelihood and its application in human action recognition
Pereira et al.	2017	Cross-layer classification framework for automatic social behavioural analysis in surveillance scenario
Ba et al.	2006	Head pose tracking and focus of attention recognition algorithms in meeting rooms
Vrochidis et al.	2024	A Deep Learning Framework for Monitoring Audience Engagement in Online Video Events
Pnevmatikakis et al.	2009	Robust multimodal audio–visual processing for advanced context awareness in smart spaces
JP5931021B2 (en)	2016-06-08	Personal recognition tendency model learning device, personal recognition state estimation device, personal recognition tendency model learning method, personal recognition state estimation method, and program