research-article

Discriminative Analysis of Lip Motion Features for Speaker Identification and Speech-Reading

Authors:

H. E. Cetingul,

Y. Yemez,

Engin Erzin,

A. M. TekalpAuthors Info & Claims

IEEE Transactions on Image Processing, Volume 15, Issue 10

Pages 2879 - 2891

https://doi.org/10.1109/TIP.2006.877528

Published: 01 October 2006 Publication History

Abstract

There have been several studies that jointly use audio, lip intensity, and lip geometry information for speaker identification and speech-reading applications. This paper proposes using explicit lip motion information, instead of or in addition to lip intensity and/or geometry information, for speaker identification and speech-reading within a unified feature selection and discrimination analysis framework, and addresses two important issues: 1) Is using explicit lip motion information useful, and, 2) if so, what are the best lip motion features for these two applications? The best lip motion features for speaker identification are considered to be those that result in the highest discrimination of individual speakers in a population, whereas for speech-reading, the best features are those providing the highest phoneme/word/phrase recognition rate. Several lip motion feature candidates have been considered including dense motion features within a bounding box about the lip, lip contour motion features, and combination of these with lip shape features. Furthermore, a novel two-stage, spatial, and temporal discrimination analysis is introduced to select the best lip motion features for speaker identification and speech-reading applications. Experimental results using an hidden-Markov-model-based recognition system indicate that using explicit lip motion information provides additional performance gains in both applications, and lip motion features prove more valuable in the case of speech-reading application

Cited By

View all

Ashour MAbbas AAhmed SHaydar NGhaffoori AHussain AKurdi NAl-Sarem MTawfeq J(2024)Enhancing Arabic Speaker Identification through Lip Movement Analysis and Deep Representation LearningProceedings of the Cognitive Models and Artificial Intelligence Conference10.1145/3660853.3660938(335-340)Online publication date: 25-May-2024
https://dl.acm.org/doi/10.1145/3660853.3660938
Chen HWang QDu JWan GXiong SYin BPan JLee C(2024)Collaborative Viseme Subword and End-to-End Modeling for Word-Level Lip ReadingIEEE Transactions on Multimedia10.1109/TMM.2024.339014826(9358-9371)Online publication date: 17-Apr-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3390148
Yang LWang SLiew A(2024)Fine-Grained Lip Image Segmentation Using Fuzzy Logic and Graph ReasoningIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2023.329832332:2(349-359)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1109/TFUZZ.2023.3298323
Show More Cited By

Recommendations

Multimodal speaker/speech recognition using lip motion, lip texture and audio
Special section: Multimodal human-computer interfaces

We present a new multimodal speaker/speech recognition system that integrates audio, lip texture and lip motion modalities. Fusion of audio and face texture modalities has been investigated in the literature before. The emphasis of this work is to ...
Synergy of Lip-Motion and Acoustic Features in Biometric Speech and Speaker Recognition

This paper presents the scheme and evaluation of a robust audio-visual digit-and-speaker-recognition system using lip motion and speech biometrics. Moreover, a liveness verification barrier based on a person's lip movement is added to the system to ...
Speech fragment decoding techniques for simultaneous speaker identification and speech recognition

This paper addresses the problem of recognising speech in the presence of a competing speaker. We review a speech fragment decoding technique that treats segregation and recognition as coupled problems. Data-driven techniques are used to segment a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Image Processing

IEEE Transactions on Image Processing Volume 15, Issue 10

October 2006

380 pages

ISSN:1057-7149

Issue’s Table of Contents

Publisher

IEEE Press

Publication History

Published: 01 October 2006

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 29 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Ashour MAbbas AAhmed SHaydar NGhaffoori AHussain AKurdi NAl-Sarem MTawfeq J(2024)Enhancing Arabic Speaker Identification through Lip Movement Analysis and Deep Representation LearningProceedings of the Cognitive Models and Artificial Intelligence Conference10.1145/3660853.3660938(335-340)Online publication date: 25-May-2024
https://dl.acm.org/doi/10.1145/3660853.3660938
Chen HWang QDu JWan GXiong SYin BPan JLee C(2024)Collaborative Viseme Subword and End-to-End Modeling for Word-Level Lip ReadingIEEE Transactions on Multimedia10.1109/TMM.2024.339014826(9358-9371)Online publication date: 17-Apr-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3390148
Yang LWang SLiew A(2024)Fine-Grained Lip Image Segmentation Using Fuzzy Logic and Graph ReasoningIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2023.329832332:2(349-359)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1109/TFUZZ.2023.3298323
Koch BGrbić R(2024)One-shot lip-based biometric authenticationImage and Vision Computing10.1016/j.imavis.2024.104900142:COnline publication date: 16-May-2024
https://dl.acm.org/doi/10.1016/j.imavis.2024.104900
Sheng CLiu LDeng WBai LLiu ZLao SKuang GPietikäinen M(2023)Importance-Aware Information Bottleneck Learning Paradigm for Lip ReadingIEEE Transactions on Multimedia10.1109/TMM.2022.321076125(6563-6574)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TMM.2022.3210761
Pattnaik IDev AMohapatra A(2023)A face recognition taxonomy and review framework towards dimensionality, modality and feature qualityEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107056126:PCOnline publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1016/j.engappai.2023.107056
Sheng CZhu XXu HPietikäinen MLiu L(2022)Adaptive Semantic-Spatio-Temporal Graph Convolutional Network for Lip ReadingIEEE Transactions on Multimedia10.1109/TMM.2021.310243324(3545-3557)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1109/TMM.2021.3102433
Chowdhury DKumari RBakshi SSahoo MDas A(2022)Lip as biometric and beyond: a surveyMultimedia Tools and Applications10.1007/s11042-021-11613-581:3(3831-3865)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1007/s11042-021-11613-5
Yang CMa JWang SLiew A(2021)Preventing DeepFake Attacks on Speaker Authentication by Dynamic Lip Movement AnalysisIEEE Transactions on Information Forensics and Security10.1109/TIFS.2020.304593716(1841-1854)Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1109/TIFS.2020.3045937
Wu LYang JZhou MChen YWang Q(2020)LVID: A Multimodal Biometrics Authentication System on SmartphonesIEEE Transactions on Information Forensics and Security10.1109/TIFS.2019.294405815(1572-1585)Online publication date: 16-Jan-2020
https://dl.acm.org/doi/10.1109/TIFS.2019.2944058
Show More Cited By

Abstract

Cited By

Recommendations

Multimodal speaker/speech recognition using lip motion, lip texture and audio

Synergy of Lip-Motion and Acoustic Features in Biometric Speech and Speaker Recognition

Speech fragment decoding techniques for simultaneous speaker identification and speech recognition

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Share

Share this Publication link

Share on social media

Affiliations