Abstract
We have previously developed a method for the recognition of the facial expression of a speaker. For facial expression recognition, we previously selected three images: (i) just before speaking, (ii) speaking the first vowel, and (iii) speaking the last vowel in an utterance. By using the speech recognition system named Julius, thermal static images are saved at the timed positions of just before speaking, and when just speaking the phonemes of the first and last vowels. To implement our method, we recorded three subjects who spoke 25 Japanese first names which provided all combinations of the first and last vowels. These recordings were used to prepare first the training data and then the test data. Julius sometimes makes a mistake in recognizing the first and/or last vowel (s). For example, /a/ for the first vowel is sometimes misrecognized as /i/. In the training data, we corrected this misrecognition. However, the correction cannot be carried out in the test data. In the implementation of our method, the facial expressions of the three subjects were distinguished with a mean accuracy of 79.8% when they exhibited one of the intentional facial expressions of “angry,” “happy,” “neutral,” “sad,” and “surprised.” The mean accuracy of the speech recognition of vowels by Julius was 84.1%.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Yuille AL, Cohen DS, Hallinan PW (1989) Feature extraction from faces using deformable templates. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, California, June 4–8, 1989, pp 104–109
Harashima H, Choi CS, Takebe T (1989) 3-D model-based synthesis of facial expressions and shape deformation (in Japanese). Human Interface 4:157–166
Mase K (1990) An application of optical flow: extraction of facial expression. Proceedings of the IAPR Workshop on Machine Vision and Application, Kokubunji, Tokyo, November 28–30, 1990, pp 195–198
Mase K (1991) Recognition of facial expression from optical flow. Trans IEICE E74(10):3474–3483
Matsuno K, Lee C, Tsuji S (1994) Recognition of facial expressions using potential net and KL expansion (in Japanese). Trans IEICE J77-D-II(8):1591–1600
Kobayashi H, Hara F (1994) Analysis of neural network recognition characteristics of 6 basic facial expressions. Proceedings of the 3rd IEEE International Workshop on Robot and Human Communication, Nagoya, Japan, July 18–20, 1994, pp 222–227
Yoshitomi Y, Kimura S, Hira E, et al (1996) Facial expression recognition using infrared rays image processing. Proceedings of the Annual Convention IPS Japan, Osaka, Japan, September 4–6, 1996, vol 2, pp 339–340
Yoshitomi Y, Kimura S, Hira E, et al (1997) Facial expression recognition using thermal image processing. IPSJ SIG Notes, CVIM103-3, Kyoto, Japan, January 23–24, 1997, pp 17–24
Yoshitomi Y, Miyawaki N, Tomita S, et al (1997) Facial expression recognition using thermal image processing and neural network. Proceedings of the 6th IEEE International Workshop on Robot and Human Communication, Sendai, Japan, September 29–October 1, 1997, pp 380–385
Sugimoto Y, Yoshitomi Y, Tomita S (2000) A method for detecting transitions of emotional states using a thermal face image based on a synthesis of facial expressions. J Robotics Auton Syst 31(3):147–160
Yoshitomi Y, Kim SIll, Kawano T, et al (2000) Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face. Proceedings of the 6th IEEE International Workshop on Robot and Human Interactive Communication, Osaka, Japan, September 27–29, 2000, pp 178–183
Ikezoe F, Ko R, Tanijiri T, et al (2004) Facial expression recognition for speaker using thermal image processing (in Japanese). Trans Human Interface Soc 6(1):19–27
Nakano M, Ikezoe F, Tabuse M, et al (2009) A study on the efficient facial expression using thermal face image in speaking and the influence of individual variations on its performance (in Japanese). J IEEJ 38(2):156–163
Koda Y, Yoshitomi Y, Nakano M, et al (2009) Facial expression recognition for a speaker of a phoneme of vowel using thermal image processing and a speech recognition system. Proceedings of the 18th IEEE International Symposium on Robot and Human Interactive Communication, Toyama, Japan, September 29–October 1, 2009, pp 955–960
Yoshitomi Y (2010) Facial expression recognition for speaker using thermal image processing and speech recognition system. Proceedings of the 10th WSEAS International Conference on Applied Computer Science, Appi Kogen, Iwate, Japan, October 4–6, 2010, pp 182–186
Fujimura T, Yoshitomi Y, Asada T, et al (2011) Facial expression recognition of a speaker using front-view face judgment, vowel judgment, and thermal image processing. Proceedings of the 16th International Symposium on Artificial Life and Robotics, Beppu, Oita, Japan, January 27–29, 2011, pp 219–224
Kuno H (1994) Infrared rays engineering (in Japanese). Tokyo, IEICE, p 22
Kuno H (1994) Infrared rays engineering (in Japanese). Tokyo, IEICE, p 45
Yoshitomi Y, Tsuchiya A, Tomita S (1998) Face recognition using dynamic thermal image processing. Proceedings of the 7th IEEE International Workshop on Robot and Human Communication, Takamatsu, Kagawa, Japan, September 30–October 2, 1998, pp 443–448
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was presented in part at the 16th International Symposium on Artificial Life and Robotics, Oita, Japan, January 27–29, 2011
About this article
Cite this article
Yoshitomi, Y., Asada, T., Shimada, K. et al. Facial expression recognition of a speaker using vowel judgment and thermal image processing. Artif Life Robotics 16, 318–323 (2011). https://doi.org/10.1007/s10015-011-0939-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10015-011-0939-3