[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

A Temporal Network of Support Vector Machine Classifiers for the Recognition of Visual Speech

  • Conference paper
  • First Online:
Methods and Applications of Artificial Intelligence (SETN 2002)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2308))

Included in the following conference series:

Abstract

Speech recognition based on visual information is an emerging research field. We propose here a new system for the recognition of visual speech based on support vector machines which proved to be powerful classifiers in other visual tasks. We use support vector machines to recognize the mouth shape corresponding to different phones produced. To model the temporal character of the speech we employ the Viterbi decoding in a network of support vector machines. The recognition rate obtained is higher than those reported earlier when the same features were used. The proposed solution offers the advantage of an easy generalization to large vocabulary recognition tasks due to the use of viseme models, as opposed to entire word models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Ganapathiraju, A., Hamaker, J., Picone, J.: Hybrid SVM/HMM architectures for speech recognition. Proc. of Speech Transcription Workshop. College Park, Maryland, USA (May 2000).

    Google Scholar 

  2. Yongmin, Li, Shaogang, Gong, Liddell, H.: Support vector regression and classification based multi-view face detection and recognition. Proc. 4th IEEE Int. Conf. Automatic Face and Gesture Recognition. Grenoble, France (March 2000) 300–305.

    Google Scholar 

  3. Terrillon, T.J., Shirazi, M. N., Sadek, M., Fukamachi, H., Akamatsu, S.: Invariant face detection with support vector machines. Proc. 15th Int. Conf. Pattern Recognition. Barcelona, Spain. 4 (September 2000) 210–217.

    Google Scholar 

  4. Chen, T.: Audiovisual speech processing. IEEE Signal Processing Magazine. 18(1) (January 2001) 9–21.

    Article  MATH  Google Scholar 

  5. Movellan, J. R.: Visual speech recognition with stochastic networks. In: Tesauro, G., Toruetzky, D., Leen, T. (eds.): Advances in Neural Information Processing Systems. 7. MIT-Press, Cambridge, MA (1995).

    Google Scholar 

  6. Bregler, C., Omohundro, S.: Nonlinear manifold learning for visual speech recognition. Proc. IEEE Int. Conf. Computer Vision (1995) 494–499.

    Google Scholar 

  7. Luettin, J., Thacker, N. A.: Speechreading using probabilistic models. Computer Vision and Image Understanding. 65(2) (February 1997) 163–178.

    Article  Google Scholar 

  8. Vapnik, V.N.: Statistical Learning Theory. J. Wiley, N.Y. (1998).

    Google Scholar 

  9. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, U.K. (2000).

    Google Scholar 

  10. Joachims, T.: Making large-scal SVM learning practical. In: Schoelkopf, B., Burges, C., Smola, A. (eds.): Advances in Kernel Methods-Support Vector Learning. MITPress (1999)

    Google Scholar 

  11. Kumar, V. P., Poggio, T.: Learning-based approach to real time tracking and analysis of faces. Proc. 4th IEEE Int. Conf. Automatic Face and Gesture Recognition. Grenoble, France (March 2000) 96–101.

    Google Scholar 

  12. Ezzat, T., Poggio, T.: MikeTalk: A talking facial display based on morphing visemes. Proc. Computer Animation Conference. Philadelphia, Pennsylvania (June 1998).

    Google Scholar 

  13. Papageorgiou, C., Poggio, T.: A pattern classification approach to dynamical object detection. Proc. IEEE Int. Conf. Computer Vision. (2) (1999) 1223–1228.

    Google Scholar 

  14. Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev V., Woodland, P.: The HTK Book. HTK version 2.2. Edition. Entropic, Ltd., Cambridge, UK (1999).

    Google Scholar 

  15. Papoulis, A.: Probability, Random Variables, and Stochastic Processes. 3rd Edition. McGraw-Hill (1991)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gordan, M., Kotropoulos, C., Pitas, I. (2002). A Temporal Network of Support Vector Machine Classifiers for the Recognition of Visual Speech. In: Vlahavas, I.P., Spyropoulos, C.D. (eds) Methods and Applications of Artificial Intelligence. SETN 2002. Lecture Notes in Computer Science(), vol 2308. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46014-4_32

Download citation

  • DOI: https://doi.org/10.1007/3-540-46014-4_32

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43472-6

  • Online ISBN: 978-3-540-46014-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics