Improvement of Speech Detection Using ERB Feature Extraction

Sang-Yeob Oh^1,2 &
Kyungyong Chung³

319 Accesses
Explore all metrics

Abstract

A range of speech extraction techniques have been applied to improve speech recognition when the signals are mixed with noise. Degradation of the speech recognition performance is caused by differences between the model training environment and the recognition environment due to inaccurate voice versus non-voice classification at low signal-to-noise ratios (SNRs). Problems also arise because voice activity detection is inaccurate when noise is caused by inconsistent changes in the recognition environment and the learning model. One technique is to extract a speech feature that is resistant to noise by removing that noise to improve the speech recognition performance. This study extracted such a feature using an equivalent rectangular bandwidth (ERB) filter bank cepstrum and constructed a learning model using the acoustic model to improve the speech recognition rate. The ERB filter bank cepstrum was examined in a computational auditory scene analysis system, which analyzes the properties of the speech signal. This paper improved the speech recognition rate by extracting such a feature with an ERB filter bank cepstrum. The proposed model used train and train station noises to evaluate the performance. The distortion was measured by performing noise reduction at SNRs of \(-10\) and \(-5\) dB in noisy environments, showing a respective 1.67 and 1.74 dB improvement in performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

References

Lee, Y.-K., & Kwon, O.-W. (2008). Application of shape analysis techniques for improved CASA-based speech separation. The Korean Society of Phonetic Sciences and Speech Technology: MALSORI., 65, 153–168.
Google Scholar
Choi, T. & Kim, S.-H. (2013). Target speech segregation using non-parametric correlation feature extraction in CASA system. The Journal of the Acoustical Society of Korea, 32(1), 79–85.
Google Scholar
Jin, Z. & Wang, D. L. (2011). Reverberant speech segregation based on multipitch tracki ng and classification. IEEE Transactions on Audio, Speech, and Language Processing, 19(8), 2328–2337.
Google Scholar
Wu, B. F., & Wang, K. C. (2005). Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments. IEEE Transactions on Speech Audio Processing, 13(5), 762–775.
Article Google Scholar
Elmezain, M., Al-Hamadi, A., Appenrodt, J., & Michaelis, B. (2008). A hidden markov model-based continuous gesture recognition system for hand motion trajectory. ICPR, 2008, 1–4.
Google Scholar
Homer, J., & Mareels, I. (2004). LS detection guided NLMS estimation of sparse system. Proceedings of the IEEE 2004 international conference on acoustic. Speech and signal processing (ICASSP). Montreal, Quebec, Canada.
Li, Q., Zheng, J., Tsai, A., & Zhou, Q. (2002). Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Transactions on Speech Audio Processing, 10(3), 146–157.
Article Google Scholar
Ahmed, B., & Holmes, P. H. (2004). A voice activity detector using the Chi square test. In Proceedings of the international conference on acoustics, speech, and signal processing, 2004 (pp. I-625–I-628). Royal Melbourne Institute of Technology, Victoria.
Oh, S. Y., & Chung, K. Y. (2013). Target speech feature extraction using non-parametric correlation coefficient. Cluster Comput. doi:10.1007/s10586-013-0284-5.
Ko, J. W., Chung, K. Y., & Han, J. S. (2013). Model transformation verification using similarity and graph comparison algorithm. Multimedia Tools and Applications. doi:10.1007/s11042-013-1581-y.
ETSI Standard Document. (2003). Speech processing, transmission and quality aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms. ETSI ES 202 050 v. 1.1.3 (2003–11).
Kozel, D., & Apostoaia, C. (2007) Colored noise reduction using bark scale spectral subtraction, statistics, and multiple time frames. IEEE EIT proceedings 2007, Chicago USA, pp. 416–421.
Wang, K. C., & Tsai, Y. H. (2008). Voice activity detection algorithm with low signal-to-noise ratios based on spectrum entropy. Second International Symposium on Universal Communication, 2008, 423–428.
Article Google Scholar
Kim, S. H., & Chung, K. Y. (2013). Medical information service system based on human 3D anatomical model. Multimedia Tools and Applications. doi:10.1007/s11042-013-1584-8.
Naqvi, S. M., Yu, M., & Chamber, J. A. (2010). A multimodal approach to blind source separation of moving sources. IEEE Transactions on Signal Processing, 4(5), 895–910.
Google Scholar
Kang, S. K., Chung, K. Y., & Lee, J. H. (2013). Development of head detection and tracking systems for visual surveillance. Personal and Ubiquitous Computing. doi:10.1007/s00779-013-0668-9.
Jung, H., & Chung, K. Y. (2013). Discovery of automotive design paradigm using relevance feedback. Personal and Ubiquitous Computing. doi:10.1007/s00779-013-0738-z.
Kim, S. H., & Chung, K. Y. (2013). 3D simulator for stability analysis of finite slope causing plane activity. Multimedia Tools and Applications. doi:10.1007/s11042-013-1356-5.
Shao, Y., Srinivasan, S., Jin, Z., & Wang, D. (2010). A computational auditory scene analysis system for robust speech recognition. Computer Speech & Language, 24(1), 77–93.
Article Google Scholar
Li, P., Guan, Y., Xu, B., & Liu, W. (2006). Monaural speech separation based on computational auditory analysis and objective quality assessment of speech. IEEE Transactions on Audio, Speech and Language Processing, 14(6), 2014–2022.
Article Google Scholar
Klapuri, A. P. (2008). Multipitch analysis of polyphonic music and speech signals using an auditory model. IEEE Transactions on Audio, Speech and Language Processing, 16(2), 255–266.
Article Google Scholar
Ahn, C.-S., & Oh, S.-Y. (2012). Echo noise robust HMM learning model using average estimator LMS algorithm. The Journal of Digital Policy and Management, 10(10), 277–282.
Google Scholar
Yamagishi, J., Kobayashi, T., Nakano, Y., Ogata, K., & Isogai, J. (2009). Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Transactions on Audio Speech Lang Processing, 17(1), 66–83.
Article Google Scholar
Nose, T., Yamagishi, J., & Kobayashi, T. (2007). A style control technique for HMM-based expressive speech synthesis. IEICE Transactions on Information and System, E90–D(9), 1406–1413.
Article Google Scholar
Yamagishi, J., Nose, T., Zen, H., Toda, T., Ling, Z.-H., Toda, T., et al. (2009). A robust speaker-adaptive HMM based text-to-speech synthesis. IEEE Transactions on Audio Speech Lang Processing, 17(6), 1208–1230.
Google Scholar
Hu, G., & Wang, D. (2010). A tandem algorithm for pitch estimation and voiced speech segregation. IEEE Transactions on Audio, Speech and Language Processing, 18(8), 2067–2079.
Article Google Scholar
Cho, S. Y., Sun, D. M., & Qiu, Z. D. (2011). A spearman correlation coefficient ranking for matching-score fusion on speaker recognition. Proceedings of the TENCON conference, pp. 736–741.
Kim, B., Choi, T., & Kim, S. (2012). Colored noise cancellation algorithm using average estimator. Proceedings of the Acoustical Society of Korea Conference, 29(1), 71–74.
Zen, H., Tokuda, K., Masuko, T., Kobayashi, T., & Kitamura, T. (2007). A hidden semi-Markov model-based speech synthesis system. IEICE Transactions on Information and System, E90–D(5), 825–834.
Article Google Scholar
Tuske, Z., Mihajlik, P., Tobler, Z., & Fegyo, T. (2005). Robust voice activity detection based on the entropy of noise suppressed spectrum, interspeech 2005. Lisbon Portugal, pp. 245–248.

Download references

Acknowledgments

This work was supported by the Gachon University research fund of 2013. (GCU-2013-R366).

Author information

Authors and Affiliations

Department of Interactive Media, Gachon University, Bokjeong-dong, Sujeong-gu, Seongnam-si, Gyeonggi-do , 461-701, Korea
Sang-Yeob Oh
Department of Computer Media Convergence, Gachon University, Bokjeong-dong, Sujeong-gu, Seongnam-si, Gyeonggi-do , 461-701, Korea
Sang-Yeob Oh
School of Computer Information Engineering, Sangji University, 83 Sangjidae-gil, Wonju-si, Gangwon-do , 220-702, Korea
Kyungyong Chung

Authors

Sang-Yeob Oh
View author publications
You can also search for this author in PubMed Google Scholar
Kyungyong Chung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sang-Yeob Oh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oh, SY., Chung, K. Improvement of Speech Detection Using ERB Feature Extraction. Wireless Pers Commun 79, 2439–2451 (2014). https://doi.org/10.1007/s11277-014-1752-9

Download citation

Published: 05 April 2014
Issue Date: December 2014
DOI: https://doi.org/10.1007/s11277-014-1752-9

Improvement of Speech Detection Using ERB Feature Extraction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Smoothed Nonlinear Energy Operator-Based Amplitude Modulation Features for Robust Speech Recognition

Spectral Analysis for Automatic Speech Recognition and Enhancement

Robust Feature Extraction Based on Teager-Entropy and Half Power Spectrum Estimation for Speech Recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Improvement of Speech Detection Using ERB Feature Extraction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Smoothed Nonlinear Energy Operator-Based Amplitude Modulation Features for Robust Speech Recognition

Spectral Analysis for Automatic Speech Recognition and Enhancement

Robust Feature Extraction Based on Teager-Entropy and Half Power Spectrum Estimation for Speech Recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation