Abstract
The bimodal acoustic-visual nature of speech establishes sound correlations between its audio component and the corresponding articulatory information associated to the time-varying geometry of the vocal tract. In this paper we propose an estimation structure consisting of a simplified Time-Delay Neural Network (TDNN) working on 4–5 dimensional cepstrum trajectories provided by a preceding clusterization layer based on a Self Organizing Map (SOM). The use of this pre-processing layer has allowed an effective non-linear clusterization of cepstrum vectors thus simplifying of one order the complexity of the resulting system while maintaining unchanged the global estimation performances. The achieved results are shown in terms estimation precision and robustness with reference to previously published results.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
F.Lavagetto,”Converting Speech into Lip Movements: A Multimedia Telephone for Hard of Hearing People” IEEE Trans. on RE, Vol.3, n.1, 1995, pp. 90–102.
A.Q. Summerfield, ”Use of Visual Information for Phonetic Perception”, Phonetica, Vol.36, pp.314–331, 1979.
E. Owens, B. Blazek, ”Visems Observed by Hearing-Impaired and Normal-Hearing Adult Viewers”, Journal of Speech and Hearing Research, vol.28, pp.381–393, 1985.
C.A. Fowler ”Coarticulation and Theories of Extrinsic Timing”, Journal of Phonetics, 1980.
O. Fujimura ”Elementary gestures and temporal organization. What does an articulatory constraint means?” in The cognitive representation of speech, North Holland Amsterdam, pp. 101–110, 1981.
A.P. Benguerel, M.K. Pichora-Fuller, ”Coarticulation Effects in Lipreading”, Journal of Speech and Hearing Research, Vol.25, pp.600–607, 1982.
S. Morishima, H. Harashima, ”A Media Conversion from Speech to Facial Image for Intelligent Man-Machine Interface”, IEEE Journal on Sel. Areas in Comm.,vol.9, N.4, pp. 594–600, 1991.
B.P. Yuhas, M.H. Goldstein Jr. and T.J. Sejnowski, ”Integration of Acoustic and Visual Speech Signal Using Neural Networks”, IEEE Communications Magazine, pp. 65–71, 1989.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vignoli, F., Curinga, S., Lavagetto, F. (1996). A neural clustering algorithm for estimating visible articulatory trajectory. In: von der Malsburg, C., von Seelen, W., Vorbrüggen, J.C., Sendhoff, B. (eds) Artificial Neural Networks — ICANN 96. ICANN 1996. Lecture Notes in Computer Science, vol 1112. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61510-5_145
Download citation
DOI: https://doi.org/10.1007/3-540-61510-5_145
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61510-1
Online ISBN: 978-3-540-68684-2
eBook Packages: Springer Book Archive