Abstract
Mel frequency cepstral coefficients (MFCCs) are a standard tool for automatic speech recognition (ASR), but they fail to capture part of the dynamics of speech. The nonlinear nature of speech suggests that extra information provided by some nonlinear features could be especially useful when training data are scarce or when the ASR task is very complex. In this paper, the Fractal Dimension of the observed time series is combined with the traditional MFCCs in the feature vector in order to enhance the performance of two different ASR systems. The first is a simple system of digit recognition in Chinese, with very few training examples, and the second is a large vocabulary ASR system for Broadcast News in Spanish.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Solé-Casals J, Zaiats V, Monte-Moreno E. Non-linear and non-conventional speech processing: alternative techniques. Cogn Comput. 2010;2:133–4.
Teager HM, Teager SM. Evidence for nonlinear sound production mechanisms in the vocal tract. Speech production and speech modelling. In: NATO Advanced Study Institute Series D, vol 55, Bonas, France. 1989.
Barroso N, López de Ipiña K, Ezeiza A. Acoustic phonetic decoding oriented to multilingual speech recognition in the basque context. Adv Intell Soft Comput. 2010;71:697–704. doi:10.1007/978-3-642-12433-4_82.
Faúndez M, Kubin G, Kleijn WB, Maragos P, McLaughlin S, Esposito A, et al. Nonlinear speech processing: overview and applications. Int J Contr Intell Syst. 2002;30(1):1–10.
Pitsikalis V, Maragos P. Analysis and classification of speech signals by generalized fractal dimension features. Speech Commun. 2009;51(12):1206–23.
Indrebo KM, Povinelli RJ, Johnson MT. Third-order moments of filtered speech signals for robust speech recognition. In: Proceedings of NOLISP’2005; 2005.
Shekofteh Y, Almasganj F. Using phase space based processing to extract proper features for ASR systems. In: Proceedings of the 5th International Symposium on Telecommunications; 2010.
Pickover CA, Khorasani A. Fractal characterization of speech waveform graphs. Comput Graph. 1986;10(1):51–61. doi:10.1016/0097-8493(86)90068-3.
Martinez F, Guillamon A, Martinez JJ. Vowel and consonant characterization using fractal dimension in natural speech. In: Proceedings of NOLISP’2003; 2003.
Langi A, Kinsner W. Consonant characterization using correlation fractal dimension for speech recognition. In: Proceedings of WESCANEX 95. Communications, Power, and Computing. Conference Proceedings. IEEE; 1995; doi: 10.1109/WESCAN.1995.493972.
Nelwamondo FV, Mahola U, Marwola T. Multi-scale fractal dimension for speaker identification systems. WSEAS Trans Syst. 2006;5(5):1152–7.
Li Y, Fan Y, Tong Q. Endpoint detection in noisy environment using complexity measure. In: Proceedings of the 2007 International Conference on Wavelet Analysis and Pattern Recognition, Beijing, China; 2007.
Chen X, Zhao H. Fractal Characteristic-based endpoint detection for whispered speech. In: Proceedings of the 6th WSEAS International Conference on Signal, Speech and Image Processing, Lisbon, Portugal; 2006.
Maragos P. Fractal aspects of speech signals: dimension and interpolation, Proceedings of 1991 International Conference on Acoustics, Speech, and Signal Processing (ICASSP-91), Toronto, Canada; 1991. p. 417–420.
Maragos P, Potamianos A. Fractal dimensions of speech sounds: computation and application to automatic speech recognition. J Acoust Soc Am. 1999;105(3):1925–32.
Pitsikalis V, Kokkinos I, Maragos P. Nonlinear analysis of speech signals: generalized dimensions and Lyapunov exponents. In: Proceedings of Interspeech`2002, Santorini, Greece; 2002.
Pitsikalis V, Maragos P. Filtered dynamics and fractal dimensions for noisy speech recognition. IEEE Sig Process Lett. 2006;13(11):711–4.
Higuchi T. Approach to an irregular time series on the basis of the fractal theory. Physica D. 1988;31277:283.
Katz MJ. Fractals and the analysis of waveforms. Comput Biol Med. 1988;18(3):145–56.
Castiglioni P. What is wrong in Katz’s method? Comments on: “a note on fractal dimensions of biomedical waveforms”. Comput Biol Med. 2010;40:11–2.
Tsonis AA. Reconstructing dynamics from observables: the issue of the delay parameter revisited. Int J Bifurcat Chaos. 2007;17:4229–43.
Jang JSR. Audio signal processing and recognition. In: Roger Jang’s Homepage. 2011. http://www.cs.nthu.edu.tw/~jang. Accessed 5 Apr 2011.
Esteller R, Vachtsevanos G, Echauz J, Litt B. A comparison of waveform fractal dimension algorithms. IEEE Trans Circuits Syst I Fundam Theory Appl. 2001;48(2):177–83.
Young S, Kershaw D, Odell J, Ollason D, Valtchev V, Woodland P. The HTK book 3.4. Cambridge: Cambridge University Press; 2006.
Barroso N, Lopez de Ipina K, Ezeiza A, Hernandez C, Ezeiza N, Barroso O, et al. GorUp: an ontology-driven audio information retrieval system that suits the requirements of under-resourced languages. In: Proceedings of Interspeech’2011, Firenze; 2011.
Barroso N, Lopez de Ipina K, Hernandez C, Ezeiza A, and Graña M. Experiments for the selection of sub-word units in the Basque context for semantic tasks. Int J Speech Technol. 2012;15(1):49–56. doi:10.1007/s10772-011-9109-1.
Acknowledgments
The authors thank Roger Jang and Infozazpi irratia for providing the basic resources for this work.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ezeiza, A., López de Ipiña, K., Hernández, C. et al. Enhancing the Feature Extraction Process for Automatic Speech Recognition with Fractal Dimensions. Cogn Comput 5, 545–550 (2013). https://doi.org/10.1007/s12559-012-9165-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-012-9165-0