[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Enhancing the Feature Extraction Process for Automatic Speech Recognition with Fractal Dimensions

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Mel frequency cepstral coefficients (MFCCs) are a standard tool for automatic speech recognition (ASR), but they fail to capture part of the dynamics of speech. The nonlinear nature of speech suggests that extra information provided by some nonlinear features could be especially useful when training data are scarce or when the ASR task is very complex. In this paper, the Fractal Dimension of the observed time series is combined with the traditional MFCCs in the feature vector in order to enhance the performance of two different ASR systems. The first is a simple system of digit recognition in Chinese, with very few training examples, and the second is a large vocabulary ASR system for Broadcast News in Spanish.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Solé-Casals J, Zaiats V, Monte-Moreno E. Non-linear and non-conventional speech processing: alternative techniques. Cogn Comput. 2010;2:133–4.

    Article  Google Scholar 

  2. Teager HM, Teager SM. Evidence for nonlinear sound production mechanisms in the vocal tract. Speech production and speech modelling. In: NATO Advanced Study Institute Series D, vol 55, Bonas, France. 1989.

  3. Barroso N, López de Ipiña K, Ezeiza A. Acoustic phonetic decoding oriented to multilingual speech recognition in the basque context. Adv Intell Soft Comput. 2010;71:697–704. doi:10.1007/978-3-642-12433-4_82.

    Article  Google Scholar 

  4. Faúndez M, Kubin G, Kleijn WB, Maragos P, McLaughlin S, Esposito A, et al. Nonlinear speech processing: overview and applications. Int J Contr Intell Syst. 2002;30(1):1–10.

    Google Scholar 

  5. Pitsikalis V, Maragos P. Analysis and classification of speech signals by generalized fractal dimension features. Speech Commun. 2009;51(12):1206–23.

    Article  Google Scholar 

  6. Indrebo KM, Povinelli RJ, Johnson MT. Third-order moments of filtered speech signals for robust speech recognition. In: Proceedings of NOLISP’2005; 2005.

  7. Shekofteh Y, Almasganj F. Using phase space based processing to extract proper features for ASR systems. In: Proceedings of the 5th International Symposium on Telecommunications; 2010.

  8. Pickover CA, Khorasani A. Fractal characterization of speech waveform graphs. Comput Graph. 1986;10(1):51–61. doi:10.1016/0097-8493(86)90068-3.

    Article  Google Scholar 

  9. Martinez F, Guillamon A, Martinez JJ. Vowel and consonant characterization using fractal dimension in natural speech. In: Proceedings of NOLISP’2003; 2003.

  10. Langi A, Kinsner W. Consonant characterization using correlation fractal dimension for speech recognition. In: Proceedings of WESCANEX 95. Communications, Power, and Computing. Conference Proceedings. IEEE; 1995; doi: 10.1109/WESCAN.1995.493972.

  11. Nelwamondo FV, Mahola U, Marwola T. Multi-scale fractal dimension for speaker identification systems. WSEAS Trans Syst. 2006;5(5):1152–7.

    Google Scholar 

  12. Li Y, Fan Y, Tong Q. Endpoint detection in noisy environment using complexity measure. In: Proceedings of the 2007 International Conference on Wavelet Analysis and Pattern Recognition, Beijing, China; 2007.

  13. Chen X, Zhao H. Fractal Characteristic-based endpoint detection for whispered speech. In: Proceedings of the 6th WSEAS International Conference on Signal, Speech and Image Processing, Lisbon, Portugal; 2006.

  14. Maragos P. Fractal aspects of speech signals: dimension and interpolation, Proceedings of 1991 International Conference on Acoustics, Speech, and Signal Processing (ICASSP-91), Toronto, Canada; 1991. p. 417–420.

  15. Maragos P, Potamianos A. Fractal dimensions of speech sounds: computation and application to automatic speech recognition. J Acoust Soc Am. 1999;105(3):1925–32.

    Article  PubMed  CAS  Google Scholar 

  16. Pitsikalis V, Kokkinos I, Maragos P. Nonlinear analysis of speech signals: generalized dimensions and Lyapunov exponents. In: Proceedings of Interspeech`2002, Santorini, Greece; 2002.

  17. Pitsikalis V, Maragos P. Filtered dynamics and fractal dimensions for noisy speech recognition. IEEE Sig Process Lett. 2006;13(11):711–4.

    Article  Google Scholar 

  18. Higuchi T. Approach to an irregular time series on the basis of the fractal theory. Physica D. 1988;31277:283.

    Google Scholar 

  19. Katz MJ. Fractals and the analysis of waveforms. Comput Biol Med. 1988;18(3):145–56.

    Article  PubMed  CAS  Google Scholar 

  20. Castiglioni P. What is wrong in Katz’s method? Comments on: “a note on fractal dimensions of biomedical waveforms”. Comput Biol Med. 2010;40:11–2.

    Google Scholar 

  21. Tsonis AA. Reconstructing dynamics from observables: the issue of the delay parameter revisited. Int J Bifurcat Chaos. 2007;17:4229–43.

    Article  Google Scholar 

  22. Jang JSR. Audio signal processing and recognition. In: Roger Jang’s Homepage. 2011. http://www.cs.nthu.edu.tw/~jang. Accessed 5 Apr 2011.

  23. Esteller R, Vachtsevanos G, Echauz J, Litt B. A comparison of waveform fractal dimension algorithms. IEEE Trans Circuits Syst I Fundam Theory Appl. 2001;48(2):177–83.

    Article  Google Scholar 

  24. Young S, Kershaw D, Odell J, Ollason D, Valtchev V, Woodland P. The HTK book 3.4. Cambridge: Cambridge University Press; 2006.

    Google Scholar 

  25. Barroso N, Lopez de Ipina K, Ezeiza A, Hernandez C, Ezeiza N, Barroso O, et al. GorUp: an ontology-driven audio information retrieval system that suits the requirements of under-resourced languages. In: Proceedings of Interspeech’2011, Firenze; 2011.

  26. Barroso N, Lopez de Ipina K, Hernandez C, Ezeiza A, and Graña M. Experiments for the selection of sub-word units in the Basque context for semantic tasks. Int J Speech Technol. 2012;15(1):49–56. doi:10.1007/s10772-011-9109-1.

    Article  Google Scholar 

Download references

Acknowledgments

The authors thank Roger Jang and Infozazpi irratia for providing the basic resources for this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aitzol Ezeiza.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ezeiza, A., López de Ipiña, K., Hernández, C. et al. Enhancing the Feature Extraction Process for Automatic Speech Recognition with Fractal Dimensions. Cogn Comput 5, 545–550 (2013). https://doi.org/10.1007/s12559-012-9165-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-012-9165-0

Keywords

Navigation