[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Discriminative Analysis of Lip Motion Features for Speaker Identification and Speech-Reading

Published: 01 October 2006 Publication History

Abstract

There have been several studies that jointly use audio, lip intensity, and lip geometry information for speaker identification and speech-reading applications. This paper proposes using explicit lip motion information, instead of or in addition to lip intensity and/or geometry information, for speaker identification and speech-reading within a unified feature selection and discrimination analysis framework, and addresses two important issues: 1) Is using explicit lip motion information useful, and, 2) if so, what are the best lip motion features for these two applications? The best lip motion features for speaker identification are considered to be those that result in the highest discrimination of individual speakers in a population, whereas for speech-reading, the best features are those providing the highest phoneme/word/phrase recognition rate. Several lip motion feature candidates have been considered including dense motion features within a bounding box about the lip, lip contour motion features, and combination of these with lip shape features. Furthermore, a novel two-stage, spatial, and temporal discrimination analysis is introduced to select the best lip motion features for speaker identification and speech-reading applications. Experimental results using an hidden-Markov-model-based recognition system indicate that using explicit lip motion information provides additional performance gains in both applications, and lip motion features prove more valuable in the case of speech-reading application

Cited By

View all
  • (2024)Enhancing Arabic Speaker Identification through Lip Movement Analysis and Deep Representation LearningProceedings of the Cognitive Models and Artificial Intelligence Conference10.1145/3660853.3660938(335-340)Online publication date: 25-May-2024
  • (2024)Collaborative Viseme Subword and End-to-End Modeling for Word-Level Lip ReadingIEEE Transactions on Multimedia10.1109/TMM.2024.339014826(9358-9371)Online publication date: 17-Apr-2024
  • (2024)Fine-Grained Lip Image Segmentation Using Fuzzy Logic and Graph ReasoningIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2023.329832332:2(349-359)Online publication date: 1-Feb-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Image Processing
IEEE Transactions on Image Processing  Volume 15, Issue 10
October 2006
380 pages

Publisher

IEEE Press

Publication History

Published: 01 October 2006

Author Tags

  1. Bayesian discriminative feature selection
  2. lip motion
  3. speaker identification
  4. speech recognition
  5. temporal discriminative feature selection

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 29 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Enhancing Arabic Speaker Identification through Lip Movement Analysis and Deep Representation LearningProceedings of the Cognitive Models and Artificial Intelligence Conference10.1145/3660853.3660938(335-340)Online publication date: 25-May-2024
  • (2024)Collaborative Viseme Subword and End-to-End Modeling for Word-Level Lip ReadingIEEE Transactions on Multimedia10.1109/TMM.2024.339014826(9358-9371)Online publication date: 17-Apr-2024
  • (2024)Fine-Grained Lip Image Segmentation Using Fuzzy Logic and Graph ReasoningIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2023.329832332:2(349-359)Online publication date: 1-Feb-2024
  • (2024)One-shot lip-based biometric authenticationImage and Vision Computing10.1016/j.imavis.2024.104900142:COnline publication date: 16-May-2024
  • (2023)Importance-Aware Information Bottleneck Learning Paradigm for Lip ReadingIEEE Transactions on Multimedia10.1109/TMM.2022.321076125(6563-6574)Online publication date: 1-Jan-2023
  • (2023)A face recognition taxonomy and review framework towards dimensionality, modality and feature qualityEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107056126:PCOnline publication date: 1-Nov-2023
  • (2022)Adaptive Semantic-Spatio-Temporal Graph Convolutional Network for Lip ReadingIEEE Transactions on Multimedia10.1109/TMM.2021.310243324(3545-3557)Online publication date: 1-Jan-2022
  • (2022)Lip as biometric and beyond: a surveyMultimedia Tools and Applications10.1007/s11042-021-11613-581:3(3831-3865)Online publication date: 1-Jan-2022
  • (2021)Preventing DeepFake Attacks on Speaker Authentication by Dynamic Lip Movement AnalysisIEEE Transactions on Information Forensics and Security10.1109/TIFS.2020.304593716(1841-1854)Online publication date: 1-Jan-2021
  • (2020)LVID: A Multimodal Biometrics Authentication System on SmartphonesIEEE Transactions on Information Forensics and Security10.1109/TIFS.2019.294405815(1572-1585)Online publication date: 16-Jan-2020
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media