Abstract
Speech recognition based on visual information is an emerging research field. We propose here a new system for the recognition of visual speech based on support vector machines which proved to be powerful classifiers in other visual tasks. We use support vector machines to recognize the mouth shape corresponding to different phones produced. To model the temporal character of the speech we employ the Viterbi decoding in a network of support vector machines. The recognition rate obtained is higher than those reported earlier when the same features were used. The proposed solution offers the advantage of an easy generalization to large vocabulary recognition tasks due to the use of viseme models, as opposed to entire word models.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ganapathiraju, A., Hamaker, J., Picone, J.: Hybrid SVM/HMM architectures for speech recognition. Proc. of Speech Transcription Workshop. College Park, Maryland, USA (May 2000).
Yongmin, Li, Shaogang, Gong, Liddell, H.: Support vector regression and classification based multi-view face detection and recognition. Proc. 4th IEEE Int. Conf. Automatic Face and Gesture Recognition. Grenoble, France (March 2000) 300–305.
Terrillon, T.J., Shirazi, M. N., Sadek, M., Fukamachi, H., Akamatsu, S.: Invariant face detection with support vector machines. Proc. 15th Int. Conf. Pattern Recognition. Barcelona, Spain. 4 (September 2000) 210–217.
Chen, T.: Audiovisual speech processing. IEEE Signal Processing Magazine. 18(1) (January 2001) 9–21.
Movellan, J. R.: Visual speech recognition with stochastic networks. In: Tesauro, G., Toruetzky, D., Leen, T. (eds.): Advances in Neural Information Processing Systems. 7. MIT-Press, Cambridge, MA (1995).
Bregler, C., Omohundro, S.: Nonlinear manifold learning for visual speech recognition. Proc. IEEE Int. Conf. Computer Vision (1995) 494–499.
Luettin, J., Thacker, N. A.: Speechreading using probabilistic models. Computer Vision and Image Understanding. 65(2) (February 1997) 163–178.
Vapnik, V.N.: Statistical Learning Theory. J. Wiley, N.Y. (1998).
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, U.K. (2000).
Joachims, T.: Making large-scal SVM learning practical. In: Schoelkopf, B., Burges, C., Smola, A. (eds.): Advances in Kernel Methods-Support Vector Learning. MITPress (1999)
Kumar, V. P., Poggio, T.: Learning-based approach to real time tracking and analysis of faces. Proc. 4th IEEE Int. Conf. Automatic Face and Gesture Recognition. Grenoble, France (March 2000) 96–101.
Ezzat, T., Poggio, T.: MikeTalk: A talking facial display based on morphing visemes. Proc. Computer Animation Conference. Philadelphia, Pennsylvania (June 1998).
Papageorgiou, C., Poggio, T.: A pattern classification approach to dynamical object detection. Proc. IEEE Int. Conf. Computer Vision. (2) (1999) 1223–1228.
Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev V., Woodland, P.: The HTK Book. HTK version 2.2. Edition. Entropic, Ltd., Cambridge, UK (1999).
Papoulis, A.: Probability, Random Variables, and Stochastic Processes. 3rd Edition. McGraw-Hill (1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gordan, M., Kotropoulos, C., Pitas, I. (2002). A Temporal Network of Support Vector Machine Classifiers for the Recognition of Visual Speech. In: Vlahavas, I.P., Spyropoulos, C.D. (eds) Methods and Applications of Artificial Intelligence. SETN 2002. Lecture Notes in Computer Science(), vol 2308. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46014-4_32
Download citation
DOI: https://doi.org/10.1007/3-540-46014-4_32
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43472-6
Online ISBN: 978-3-540-46014-5
eBook Packages: Springer Book Archive