A Temporal Network of Support Vector Machine Classifiers for the Recognition of Visual Speech

Mihaela Gordan³,
Constantine Kotropoulos⁴ &
Ioannis Pitas⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2308))

Included in the following conference series:

Hellenic Conference on Artificial Intelligence

1185 Accesses
1 Citations

Abstract

Speech recognition based on visual information is an emerging research field. We propose here a new system for the recognition of visual speech based on support vector machines which proved to be powerful classifiers in other visual tasks. We use support vector machines to recognize the mouth shape corresponding to different phones produced. To model the temporal character of the speech we employ the Viterbi decoding in a network of support vector machines. The recognition rate obtained is higher than those reported earlier when the same features were used. The proposed solution offers the advantage of an easy generalization to large vocabulary recognition tasks due to the use of viseme models, as opposed to entire word models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Complementary models for audio-visual speech classification

Article 07 January 2022

Temporal and Spatial Features for Visual Speech Recognition

A comparative study of English viseme recognition methods and algorithms

Article Open access 07 October 2017

References

Ganapathiraju, A., Hamaker, J., Picone, J.: Hybrid SVM/HMM architectures for speech recognition. Proc. of Speech Transcription Workshop. College Park, Maryland, USA (May 2000).
Google Scholar
Yongmin, Li, Shaogang, Gong, Liddell, H.: Support vector regression and classification based multi-view face detection and recognition. Proc. 4th IEEE Int. Conf. Automatic Face and Gesture Recognition. Grenoble, France (March 2000) 300–305.
Google Scholar
Terrillon, T.J., Shirazi, M. N., Sadek, M., Fukamachi, H., Akamatsu, S.: Invariant face detection with support vector machines. Proc. 15th Int. Conf. Pattern Recognition. Barcelona, Spain. 4 (September 2000) 210–217.
Google Scholar
Chen, T.: Audiovisual speech processing. IEEE Signal Processing Magazine. 18(1) (January 2001) 9–21.
Article MATH Google Scholar
Movellan, J. R.: Visual speech recognition with stochastic networks. In: Tesauro, G., Toruetzky, D., Leen, T. (eds.): Advances in Neural Information Processing Systems. 7. MIT-Press, Cambridge, MA (1995).
Google Scholar
Bregler, C., Omohundro, S.: Nonlinear manifold learning for visual speech recognition. Proc. IEEE Int. Conf. Computer Vision (1995) 494–499.
Google Scholar
Luettin, J., Thacker, N. A.: Speechreading using probabilistic models. Computer Vision and Image Understanding. 65(2) (February 1997) 163–178.
Article Google Scholar
Vapnik, V.N.: Statistical Learning Theory. J. Wiley, N.Y. (1998).
Google Scholar
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, U.K. (2000).
Google Scholar
Joachims, T.: Making large-scal SVM learning practical. In: Schoelkopf, B., Burges, C., Smola, A. (eds.): Advances in Kernel Methods-Support Vector Learning. MITPress (1999)
Google Scholar
Kumar, V. P., Poggio, T.: Learning-based approach to real time tracking and analysis of faces. Proc. 4th IEEE Int. Conf. Automatic Face and Gesture Recognition. Grenoble, France (March 2000) 96–101.
Google Scholar
Ezzat, T., Poggio, T.: MikeTalk: A talking facial display based on morphing visemes. Proc. Computer Animation Conference. Philadelphia, Pennsylvania (June 1998).
Google Scholar
Papageorgiou, C., Poggio, T.: A pattern classification approach to dynamical object detection. Proc. IEEE Int. Conf. Computer Vision. (2) (1999) 1223–1228.
Google Scholar
Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev V., Woodland, P.: The HTK Book. HTK version 2.2. Edition. Entropic, Ltd., Cambridge, UK (1999).
Google Scholar
Papoulis, A.: Probability, Random Variables, and Stochastic Processes. 3^rd Edition. McGraw-Hill (1991)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Electronics and Telecommunications, Technical University of Cluj-Napoca, 15 C. Daicoviciu, 3400, Cluj-Napoca, Romania
Mihaela Gordan
Artificial Intelligence and Information Analysis Laboratory Department of Informatics, Aristotle University of Thessaloniki, Box 451, 54006, Thessaloniki, GR, Greece
Constantine Kotropoulos & Ioannis Pitas

Authors

Mihaela Gordan
View author publications
You can also search for this author in PubMed Google Scholar
Constantine Kotropoulos
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Pitas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Informatics, Aristotle University of Thessaloniki, 54006, Thessaloniki, Greece
Ioannis P. Vlahavas
N.C.S.R. “Demokritos”, Inst. of Informatics & Telecommunications Software and Knowledge Engineering Lab, 15310, Aghia Paraskevi, Greece
Constantine D. Spyropoulos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gordan, M., Kotropoulos, C., Pitas, I. (2002). A Temporal Network of Support Vector Machine Classifiers for the Recognition of Visual Speech. In: Vlahavas, I.P., Spyropoulos, C.D. (eds) Methods and Applications of Artificial Intelligence. SETN 2002. Lecture Notes in Computer Science(), vol 2308. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46014-4_32

Download citation

DOI: https://doi.org/10.1007/3-540-46014-4_32
Published: 19 March 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43472-6
Online ISBN: 978-3-540-46014-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

A Temporal Network of Support Vector Machine Classifiers for the Recognition of Visual Speech

Abstract

Access this chapter

Preview

Similar content being viewed by others

Complementary models for audio-visual speech classification

Temporal and Spatial Features for Visual Speech Recognition

A comparative study of English viseme recognition methods and algorithms

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Temporal Network of Support Vector Machine Classifiers for the Recognition of Visual Speech

Abstract

Access this chapter

Preview

Similar content being viewed by others

Complementary models for audio-visual speech classification

Temporal and Spatial Features for Visual Speech Recognition

A comparative study of English viseme recognition methods and algorithms

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation