[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Improvement of Speech Detection Using ERB Feature Extraction

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

A range of speech extraction techniques have been applied to improve speech recognition when the signals are mixed with noise. Degradation of the speech recognition performance is caused by differences between the model training environment and the recognition environment due to inaccurate voice versus non-voice classification at low signal-to-noise ratios (SNRs). Problems also arise because voice activity detection is inaccurate when noise is caused by inconsistent changes in the recognition environment and the learning model. One technique is to extract a speech feature that is resistant to noise by removing that noise to improve the speech recognition performance. This study extracted such a feature using an equivalent rectangular bandwidth (ERB) filter bank cepstrum and constructed a learning model using the acoustic model to improve the speech recognition rate. The ERB filter bank cepstrum was examined in a computational auditory scene analysis system, which analyzes the properties of the speech signal. This paper improved the speech recognition rate by extracting such a feature with an ERB filter bank cepstrum. The proposed model used train and train station noises to evaluate the performance. The distortion was measured by performing noise reduction at SNRs of \(-10\) and \(-5\) dB in noisy environments, showing a respective 1.67 and 1.74 dB improvement in performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Lee, Y.-K., & Kwon, O.-W. (2008). Application of shape analysis techniques for improved CASA-based speech separation. The Korean Society of Phonetic Sciences and Speech Technology: MALSORI., 65, 153–168.

    Google Scholar 

  2. Choi, T. & Kim, S.-H. (2013). Target speech segregation using non-parametric correlation feature extraction in CASA system. The Journal of the Acoustical Society of Korea, 32(1), 79–85.

    Google Scholar 

  3. Jin, Z. & Wang, D. L. (2011). Reverberant speech segregation based on multipitch tracki ng and classification. IEEE Transactions on Audio, Speech, and Language Processing, 19(8), 2328–2337.

    Google Scholar 

  4. Wu, B. F., & Wang, K. C. (2005). Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments. IEEE Transactions on Speech Audio Processing, 13(5), 762–775.

    Article  Google Scholar 

  5. Elmezain, M., Al-Hamadi, A., Appenrodt, J., & Michaelis, B. (2008). A hidden markov model-based continuous gesture recognition system for hand motion trajectory. ICPR, 2008, 1–4.

    Google Scholar 

  6. Homer, J., & Mareels, I. (2004). LS detection guided NLMS estimation of sparse system. Proceedings of the IEEE 2004 international conference on acoustic. Speech and signal processing (ICASSP). Montreal, Quebec, Canada.

  7. Li, Q., Zheng, J., Tsai, A., & Zhou, Q. (2002). Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Transactions on Speech Audio Processing, 10(3), 146–157.

    Article  Google Scholar 

  8. Ahmed, B., & Holmes, P. H. (2004). A voice activity detector using the Chi square test. In Proceedings of the international conference on acoustics, speech, and signal processing, 2004 (pp. I-625–I-628). Royal Melbourne Institute of Technology, Victoria.

  9. Oh, S. Y., & Chung, K. Y. (2013). Target speech feature extraction using non-parametric correlation coefficient. Cluster Comput. doi:10.1007/s10586-013-0284-5.

  10. Ko, J. W., Chung, K. Y., & Han, J. S. (2013). Model transformation verification using similarity and graph comparison algorithm. Multimedia Tools and Applications. doi:10.1007/s11042-013-1581-y.

  11. ETSI Standard Document. (2003). Speech processing, transmission and quality aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms. ETSI ES 202 050 v. 1.1.3 (2003–11).

  12. Kozel, D., & Apostoaia, C. (2007) Colored noise reduction using bark scale spectral subtraction, statistics, and multiple time frames. IEEE EIT proceedings 2007, Chicago USA, pp. 416–421.

  13. Wang, K. C., & Tsai, Y. H. (2008). Voice activity detection algorithm with low signal-to-noise ratios based on spectrum entropy. Second International Symposium on Universal Communication, 2008, 423–428.

    Article  Google Scholar 

  14. Kim, S. H., & Chung, K. Y. (2013). Medical information service system based on human 3D anatomical model. Multimedia Tools and Applications. doi:10.1007/s11042-013-1584-8.

  15. Naqvi, S. M., Yu, M., & Chamber, J. A. (2010). A multimodal approach to blind source separation of moving sources. IEEE Transactions on Signal Processing, 4(5), 895–910.

    Google Scholar 

  16. Kang, S. K., Chung, K. Y., & Lee, J. H. (2013). Development of head detection and tracking systems for visual surveillance. Personal and Ubiquitous Computing. doi:10.1007/s00779-013-0668-9.

  17. Jung, H., & Chung, K. Y. (2013). Discovery of automotive design paradigm using relevance feedback. Personal and Ubiquitous Computing. doi:10.1007/s00779-013-0738-z.

  18. Kim, S. H., & Chung, K. Y. (2013). 3D simulator for stability analysis of finite slope causing plane activity. Multimedia Tools and Applications. doi:10.1007/s11042-013-1356-5.

  19. Shao, Y., Srinivasan, S., Jin, Z., & Wang, D. (2010). A computational auditory scene analysis system for robust speech recognition. Computer Speech & Language, 24(1), 77–93.

    Article  Google Scholar 

  20. Li, P., Guan, Y., Xu, B., & Liu, W. (2006). Monaural speech separation based on computational auditory analysis and objective quality assessment of speech. IEEE Transactions on Audio, Speech and Language Processing, 14(6), 2014–2022.

    Article  Google Scholar 

  21. Klapuri, A. P. (2008). Multipitch analysis of polyphonic music and speech signals using an auditory model. IEEE Transactions on Audio, Speech and Language Processing, 16(2), 255–266.

    Article  Google Scholar 

  22. Ahn, C.-S., & Oh, S.-Y. (2012). Echo noise robust HMM learning model using average estimator LMS algorithm. The Journal of Digital Policy and Management, 10(10), 277–282.

    Google Scholar 

  23. Yamagishi, J., Kobayashi, T., Nakano, Y., Ogata, K., & Isogai, J. (2009). Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Transactions on Audio Speech Lang Processing, 17(1), 66–83.

    Article  Google Scholar 

  24. Nose, T., Yamagishi, J., & Kobayashi, T. (2007). A style control technique for HMM-based expressive speech synthesis. IEICE Transactions on Information and System, E90–D(9), 1406–1413.

    Article  Google Scholar 

  25. Yamagishi, J., Nose, T., Zen, H., Toda, T., Ling, Z.-H., Toda, T., et al. (2009). A robust speaker-adaptive HMM based text-to-speech synthesis. IEEE Transactions on Audio Speech Lang Processing, 17(6), 1208–1230.

    Google Scholar 

  26. Hu, G., & Wang, D. (2010). A tandem algorithm for pitch estimation and voiced speech segregation. IEEE Transactions on Audio, Speech and Language Processing, 18(8), 2067–2079.

    Article  Google Scholar 

  27. Cho, S. Y., Sun, D. M., & Qiu, Z. D. (2011). A spearman correlation coefficient ranking for matching-score fusion on speaker recognition. Proceedings of the TENCON conference, pp. 736–741.

  28. Kim, B., Choi, T., & Kim, S. (2012). Colored noise cancellation algorithm using average estimator. Proceedings of the Acoustical Society of Korea Conference, 29(1), 71–74.

  29. Zen, H., Tokuda, K., Masuko, T., Kobayashi, T., & Kitamura, T. (2007). A hidden semi-Markov model-based speech synthesis system. IEICE Transactions on Information and System, E90–D(5), 825–834.

    Article  Google Scholar 

  30. Tuske, Z., Mihajlik, P., Tobler, Z., & Fegyo, T. (2005). Robust voice activity detection based on the entropy of noise suppressed spectrum, interspeech 2005. Lisbon Portugal, pp. 245–248.

Download references

Acknowledgments

This work was supported by the Gachon University research fund of 2013. (GCU-2013-R366).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sang-Yeob Oh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oh, SY., Chung, K. Improvement of Speech Detection Using ERB Feature Extraction. Wireless Pers Commun 79, 2439–2451 (2014). https://doi.org/10.1007/s11277-014-1752-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-014-1752-9

Keywords

Navigation