Enhancing the Feature Extraction Process for Automatic Speech Recognition with Fractal Dimensions

Aitzol Ezeiza¹,
Karmele López de Ipiña¹,
Carmen Hernández² &
…
Nora Barroso¹

398 Accesses
24 Citations
Explore all metrics

Abstract

Mel frequency cepstral coefficients (MFCCs) are a standard tool for automatic speech recognition (ASR), but they fail to capture part of the dynamics of speech. The nonlinear nature of speech suggests that extra information provided by some nonlinear features could be especially useful when training data are scarce or when the ASR task is very complex. In this paper, the Fractal Dimension of the observed time series is combined with the traditional MFCCs in the feature vector in order to enhance the performance of two different ASR systems. The first is a simple system of digit recognition in Chinese, with very few training examples, and the second is a large vocabulary ASR system for Broadcast News in Spanish.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Optimal Fractal Feature Selection and Estimation for Speech Recognition Under Mismatched Conditions

Fractal-Based Speech Analysis for Emotional Content Estimation

Article 12 May 2021

A Hybrid of Fractal Code Descriptor and Harmonic Pattern Generator for Improving Speech Recognition of Different Sampling Rates

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Solé-Casals J, Zaiats V, Monte-Moreno E. Non-linear and non-conventional speech processing: alternative techniques. Cogn Comput. 2010;2:133–4.
Article Google Scholar
Teager HM, Teager SM. Evidence for nonlinear sound production mechanisms in the vocal tract. Speech production and speech modelling. In: NATO Advanced Study Institute Series D, vol 55, Bonas, France. 1989.
Barroso N, López de Ipiña K, Ezeiza A. Acoustic phonetic decoding oriented to multilingual speech recognition in the basque context. Adv Intell Soft Comput. 2010;71:697–704. doi:10.1007/978-3-642-12433-4_82.
Article Google Scholar
Faúndez M, Kubin G, Kleijn WB, Maragos P, McLaughlin S, Esposito A, et al. Nonlinear speech processing: overview and applications. Int J Contr Intell Syst. 2002;30(1):1–10.
Google Scholar
Pitsikalis V, Maragos P. Analysis and classification of speech signals by generalized fractal dimension features. Speech Commun. 2009;51(12):1206–23.
Article Google Scholar
Indrebo KM, Povinelli RJ, Johnson MT. Third-order moments of filtered speech signals for robust speech recognition. In: Proceedings of NOLISP’2005; 2005.
Shekofteh Y, Almasganj F. Using phase space based processing to extract proper features for ASR systems. In: Proceedings of the 5th International Symposium on Telecommunications; 2010.
Pickover CA, Khorasani A. Fractal characterization of speech waveform graphs. Comput Graph. 1986;10(1):51–61. doi:10.1016/0097-8493(86)90068-3.
Article Google Scholar
Martinez F, Guillamon A, Martinez JJ. Vowel and consonant characterization using fractal dimension in natural speech. In: Proceedings of NOLISP’2003; 2003.
Langi A, Kinsner W. Consonant characterization using correlation fractal dimension for speech recognition. In: Proceedings of WESCANEX 95. Communications, Power, and Computing. Conference Proceedings. IEEE; 1995; doi: 10.1109/WESCAN.1995.493972.
Nelwamondo FV, Mahola U, Marwola T. Multi-scale fractal dimension for speaker identification systems. WSEAS Trans Syst. 2006;5(5):1152–7.
Google Scholar
Li Y, Fan Y, Tong Q. Endpoint detection in noisy environment using complexity measure. In: Proceedings of the 2007 International Conference on Wavelet Analysis and Pattern Recognition, Beijing, China; 2007.
Chen X, Zhao H. Fractal Characteristic-based endpoint detection for whispered speech. In: Proceedings of the 6th WSEAS International Conference on Signal, Speech and Image Processing, Lisbon, Portugal; 2006.
Maragos P. Fractal aspects of speech signals: dimension and interpolation, Proceedings of 1991 International Conference on Acoustics, Speech, and Signal Processing (ICASSP-91), Toronto, Canada; 1991. p. 417–420.
Maragos P, Potamianos A. Fractal dimensions of speech sounds: computation and application to automatic speech recognition. J Acoust Soc Am. 1999;105(3):1925–32.
Article PubMed CAS Google Scholar
Pitsikalis V, Kokkinos I, Maragos P. Nonlinear analysis of speech signals: generalized dimensions and Lyapunov exponents. In: Proceedings of Interspeech`2002, Santorini, Greece; 2002.
Pitsikalis V, Maragos P. Filtered dynamics and fractal dimensions for noisy speech recognition. IEEE Sig Process Lett. 2006;13(11):711–4.
Article Google Scholar
Higuchi T. Approach to an irregular time series on the basis of the fractal theory. Physica D. 1988;31277:283.
Google Scholar
Katz MJ. Fractals and the analysis of waveforms. Comput Biol Med. 1988;18(3):145–56.
Article PubMed CAS Google Scholar
Castiglioni P. What is wrong in Katz’s method? Comments on: “a note on fractal dimensions of biomedical waveforms”. Comput Biol Med. 2010;40:11–2.
Google Scholar
Tsonis AA. Reconstructing dynamics from observables: the issue of the delay parameter revisited. Int J Bifurcat Chaos. 2007;17:4229–43.
Article Google Scholar
Jang JSR. Audio signal processing and recognition. In: Roger Jang’s Homepage. 2011. http://www.cs.nthu.edu.tw/~jang. Accessed 5 Apr 2011.
Esteller R, Vachtsevanos G, Echauz J, Litt B. A comparison of waveform fractal dimension algorithms. IEEE Trans Circuits Syst I Fundam Theory Appl. 2001;48(2):177–83.
Article Google Scholar
Young S, Kershaw D, Odell J, Ollason D, Valtchev V, Woodland P. The HTK book 3.4. Cambridge: Cambridge University Press; 2006.
Google Scholar
Barroso N, Lopez de Ipina K, Ezeiza A, Hernandez C, Ezeiza N, Barroso O, et al. GorUp: an ontology-driven audio information retrieval system that suits the requirements of under-resourced languages. In: Proceedings of Interspeech’2011, Firenze; 2011.
Barroso N, Lopez de Ipina K, Hernandez C, Ezeiza A, and Graña M. Experiments for the selection of sub-word units in the Basque context for semantic tasks. Int J Speech Technol. 2012;15(1):49–56. doi:10.1007/s10772-011-9109-1.
Article Google Scholar

Download references

Acknowledgments

The authors thank Roger Jang and Infozazpi irratia for providing the basic resources for this work.

Author information

Authors and Affiliations

Department of Systems Engineering and Automation, University of the Basque Country UPV/EHU, Europa plaza 1, 20018, Donostia, Spain
Aitzol Ezeiza, Karmele López de Ipiña & Nora Barroso
Department of Computer Science and Artificial Intelligence, University of the Basque Country UPV/EHU, Manuel Lardizabal 1, 20018, Donostia, Spain
Carmen Hernández

Authors

Aitzol Ezeiza
View author publications
You can also search for this author in PubMed Google Scholar
Karmele López de Ipiña
View author publications
You can also search for this author in PubMed Google Scholar
Carmen Hernández
View author publications
You can also search for this author in PubMed Google Scholar
Nora Barroso
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aitzol Ezeiza.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ezeiza, A., López de Ipiña, K., Hernández, C. et al. Enhancing the Feature Extraction Process for Automatic Speech Recognition with Fractal Dimensions. Cogn Comput 5, 545–550 (2013). https://doi.org/10.1007/s12559-012-9165-0

Download citation

Received: 23 January 2012
Accepted: 09 July 2012
Published: 24 July 2012
Issue Date: December 2013
DOI: https://doi.org/10.1007/s12559-012-9165-0

Enhancing the Feature Extraction Process for Automatic Speech Recognition with Fractal Dimensions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Optimal Fractal Feature Selection and Estimation for Speech Recognition Under Mismatched Conditions

Fractal-Based Speech Analysis for Emotional Content Estimation

A Hybrid of Fractal Code Descriptor and Harmonic Pattern Generator for Improving Speech Recognition of Different Sampling Rates

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Enhancing the Feature Extraction Process for Automatic Speech Recognition with Fractal Dimensions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Optimal Fractal Feature Selection and Estimation for Speech Recognition Under Mismatched Conditions

Fractal-Based Speech Analysis for Emotional Content Estimation

A Hybrid of Fractal Code Descriptor and Harmonic Pattern Generator for Improving Speech Recognition of Different Sampling Rates

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation