Abstract
The mouth region of human face possesses highly discriminative information regarding the expressions on the face. Facial expression analysis to infer the emotional state of a user becomes very challenging when the user talks, as most of the mouth actions while uttering certain words match with mouth shapes expressing various emotions. We introduce a novel unsupervised method to temporally segment talking faces from the faces displaying only emotions, and use the knowledge of talking face segments to improve emotion recognition. The proposed method uses integrated gradient histogram of local binary patterns to represent mouth features suitably and identifies temporal segments of talking faces online by estimating the uncertainties of mouth movements over a period of time. The algorithm accurately identifies talking face segments on a real-world database where talking and emotion happens naturally. Also, the emotion recognition system, using talking face cues, showed considerable improvement in recognition accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ekman, P., Friesen, W.V.: Facial action coding system: A technique for measurement of facial movements. Consulting Psychologists (1978)
Velusamy, S., Kannan, H., Anand, B., Navathe, B., Sharma, A.: A Method to Infer Emotions From Facial Action Units. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP (2011)
Bartlett, M.S., Littlewort, G., Frank, M., Lainscsek, C., Fasel, I., Movellan, J.: Recognizing Facial Expression: Machine Learning and Application to Spontaneous Behavior. In: IEEE Conf. on Computer Vision and Pat. Recog., pp. 568–573 (2005)
Lien, J.J., Zlochower, A., Cohn, J.F., Kanade, T.: Automated Facial Expression Recognition Based on FACS Action Units. In: Proceedings of IEEE Int. Conference on Automatic Face and Gesture Recognition, pp. 390–395 (1998)
Kaliouby, R.: Mind-reading machines: the automated inference of complex mental states from video, Ph.D. Thesis, University of Cambridge (2005)
Ahonen, T., Hadid, A., Pietikainen, M.: Face Description with Local Binary Patterns: Application to Face Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 2037–2041 (2006)
Rudovic, O., Patras, I., Pantic, M.: Coupled Gaussian Process Regression for Pose-Invariant Facial Expression Recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 350–363. Springer, Heidelberg (2010)
Buciu, I., Kotsia, I., Pitas, I.: Facial expression analysis under partial occlusion.: In: IEEE Int. Conf. on Acoustics, Speech, Signal Proc. (ICASSP), pp. 453–456 (2005)
Zhao, G., Barnard, M., Pietikainen, M.: Lipreading With Local Spatiotemporal Descriptors. IEEE Trans. Multimedia 11(7), 1254–1265 (2009)
Liu, P., Wang, Z.: Voice activity detection using visual information. In: Proc. of IEEE Int. Conf. on Acoustics, Speech, & Signal Proc. (ICASSP), pp. 609–612 (2004)
Bendris, M., Charlet, D., Chollet, G.: Lip activity detection for talking faces classification in TV-Content. In: International Conference on Machine Vision (2010)
Siatras, S., Nikolaidis, N., Krinidis, M., Pitas, I.: Visual Lip Activity Detection and Speaker Detection Using Mouth Region Intensities. IEEE Trans. Circuits and Systems for Video Technology 19(1), 133–137 (2009)
Montse, P., Bonafonte, A., Landabaso, J.L.: Emotion Recognition Based on MPEG4 Facial Animation Parameters. In: Proceedings of IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp. 3624–3627 (2002)
Zhou, F., De la Tore, F., Jeffrey, F.C.: Unsupervised Discovery of Facial Events. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2010)
Saragih, J., Lucey, S., Cohn, J.: Deformable Model Fitting by Regularized Landmark Mean-Shifts. Interl. Journal of Computer Vision 91(2), 200–215 (2011)
Ojala, T., Pietikainen, M., Harwood, D.: A comparative study of texture measures with classification based on feature distributions. Pattern Recogn., 51–59 (1996)
Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge (2004)
Hoque, M.E., Picard, R.W.: I See You (ICU): Towards Robust Recognition of Facial Expressions and Speech Prosody in Real Time. In: International Conference on Computer Vision and Pattern Recognition (CVPR), DEMO (2010)
Sohn, J., Sung, W.: A voice activity detector employing soft decision based noise spectrum adaptation. In: Proceedings of IEEE Int. Conference on Acoustics, Speech, Signal Processing (ICASSP), pp. 365–368 (1998)
Bourel, F., Chibelushi, C.C., Low, A.A.: Recognition of facial expressions in the presence of occlusion. In: Proc. of the Twelfth British Machine Vision Conference, vol. 1, pp. 213–222 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Velusamy, S., Gopalakrishnan, V., Navathe, B., Kannan, H., Anand, B., Sharma, A. (2011). Unsupervised Temporal Segmentation of Talking Faces Using Visual Cues to Improve Emotion Recognition. In: D’Mello, S., Graesser, A., Schuller, B., Martin, JC. (eds) Affective Computing and Intelligent Interaction. ACII 2011. Lecture Notes in Computer Science, vol 6974. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24600-5_45
Download citation
DOI: https://doi.org/10.1007/978-3-642-24600-5_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24599-2
Online ISBN: 978-3-642-24600-5
eBook Packages: Computer ScienceComputer Science (R0)