[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content

Advertisement

Log in

Speech enhancement using Teager energy operated ERB-like perceptual wavelet packet decomposition

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In recent past, wavelet packet (WP) based speech enhancement techniques have been gaining popularity due to their inherent nature of noise minimization. WP based techniques appeared as more robust and efficient than short-time Fourier transform based methods. In the present work, a speech enhancement method using Teager energy operated equal rectangular bandwidth (ERB)-like WP decomposition has been proposed. Twenty four sub-band perceptual wavelet packet decomposition (PWPD) structure is implemented according to the auditory ERB scale. ERB scale based decomposition structure is used because the central frequency of the ERB scale distribution is similar to the frequency response of the human cochlea. Teager energy operator is applied to estimate the threshold value for the PWPD coefficients. Lastly, Wiener filtering is applied to remove the low frequency noise before final reconstruction stage. The proposed method has been applied to evaluate the Hindi sentences database, corrupted with six noise conditions. The proposed method’s performance is analysed with respect to several speech quality parameters and output signal to noise ratio levels. Performance indicates that the proposed technique outperforms some traditional speech enhancement algorithms at all SNR levels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Allen, J. B. (1994). How do humans process and recognize speech? IEEE Transactions on Speech and Audio Processing, 2(4), 567–577.

    Article  Google Scholar 

  • Bahoura, M., & Rouat, J. (2001). Wavelet speech enhancement based on the teager energy operator. IEEE Signal Processing Letters, 8(1), 10–12.

    Article  Google Scholar 

  • Bahoura, M., & Rouat, J. (2006). Wavelet speech enhancement based on time-scale adaptation. Speech Communication, 48(12), 1620–1637.

    Article  Google Scholar 

  • Berouti, M., Schwartz, R., Makhoul, J., 1979. Enhancement of speech corrupted by acoustic noise. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP’79) (Vol. 4, pp. 208–211). IEEE.

  • Bhowmick, A., Chandra, M. (2017). Speech enhancement using voiced speech probability based wavelet decomposition. Computers & Electrical Engineering. doi:10.1016/j.compeleceng.2017.01.013.

    Google Scholar 

  • Biswas, A., Sahu, P., & Chandra, M. (2014). Admissible wavelet packet features based on human inner ear frequency response for hindi consonant recognition. Computers & Electrical Engineering, 40(4), 1111–1122.

    Article  Google Scholar 

  • Chen, F., Loizou, P. C. (2010). Speech enhancement using a frequency-specific composite Wiener function. In: 2010 IEEE international conference on acoustics speech and signal processing (ICASSP) (pp. 4726–4729). IEEE.

  • Chen, S.-H., Wang, J.-F. (2004). Speech enhancement using perceptual wavelet packet decomposition and teager energy operator. In: Real world speech processing (pp. 51–65). New York: Springer.

  • Cohen, I. (2004). Speech enhancement using a noncausal a priori SNR estimator. IEEE Signal Processing Letters, 11(9), 725–728.

    Article  Google Scholar 

  • Donoho, D. L. (1995). De-noising by soft-thresholding. IEEE Transactions on Information Theory, 41(3), 613–627.

    Article  MathSciNet  MATH  Google Scholar 

  • Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Processing, 33(2), 443–445.

    Article  Google Scholar 

  • Farooq, O., & Datta, S. (2001). Mel filter-like admissible wavelet packet structure for speech recognition. IEEE Signal Processing Letters, 8(7), 196–198.

    Article  Google Scholar 

  • Farooq, O., & Datta, S. (2003). Phoneme recognition using wavelet based features. Information Sciences, 150(1), 5–15.

    Article  Google Scholar 

  • Gandhiraj, R., Sathidevi, P. (2007). Auditory-based wavelet packet filterbank for speech recognition using neural network. In: International conference on advanced computing and communications (ADCOM), 2007 (pp. 666–673). IEEE.

  • Ghanbari, Y., & Karami-Mollaei, M. R. (2006). A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets. Speech Communication, 48(8), 927–940.

    Article  Google Scholar 

  • Gonzalez, S., & Brookes, M. (2014). Pefac-a pitch estimation algorithm robust to high levels of noise. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(2), 518–530.

    Article  Google Scholar 

  • Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.

    Article  Google Scholar 

  • Islam, M. T., Shahnaz, C., Zhu, W.-P., & Ahmad, M. O. (2015). Speech enhancement based on student modeling of teager energy operated perceptual wavelet packet coefficients and a custom thresholding function. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(11), 1800–1811.

    Article  Google Scholar 

  • Jabloun, F., & Champagne, B. (2003). Incorporating the human hearing properties in the signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing, 11(6), 700–708.

    Article  Google Scholar 

  • Johnson, M. T., Yuan, X., & Ren, Y. (2007). Speech signal enhancement through adaptive wavelet thresholding. Speech Communication, 49(2), 123–133.

    Article  Google Scholar 

  • Johnstone, I. M., & Silverman, B. W. (1997). Wavelet threshold estimators for data with correlated noise. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(2), 319–351.

    Article  MathSciNet  MATH  Google Scholar 

  • Kaiser, J. F. (1993). Some useful properties of teager’s energy operators. In: IEEE international conference on acoustics, speech, and signal processing, 1993 (ICASSP-93) (Vol. 3, pp. 149–152). IEEE.

  • Kamath, S., Loizou, P. (2002). A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In: IEEE international conference on acoustics speech and signal processing (Vol. 4, pp. 4164–4164). Citeseer.

  • Lu, Y., & Loizou, P. C. (2008). A geometric approach to spectral subtraction. Speech Communication, 50(6), 453–466.

    Article  Google Scholar 

  • Mallat, S. (1999). A wavelet tour of signal processing. Cambridge: Academic Press.

    MATH  Google Scholar 

  • Mittal, U., & Phamdo, N. (2000). Signal/noise KLT based approach for enhancing speech degraded by colored noise. IEEE Transactions on Speech and Audio Processing, 8(2), 159–167.

    Article  Google Scholar 

  • Plapous, C., Marro, C., & Scalart, P. (2006). Improved signal-to-noise ratio estimation for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 2098–2108.

    Article  Google Scholar 

  • Sahu, P., Biswas, A., Bhowmick, A., & Chandra, M. (2014). Auditory ERB like admissible wavelet packet features for timit phoneme recognition. Engineering Science and Technology, An International Journal, 17(3), 145–151.

    Article  Google Scholar 

  • Samudravijaya, K., Rawat, K., & Rao, P. (1998). Design of phonetically rich sentences for hindi speech database. The Acoustical Society of India, 26, 466–471.

    Google Scholar 

  • Scalart, P., et al. (1996). Speech enhancement based on a priori signal to noise estimation. In: 1996 IEEE international conference on acoustics, speech, and signal processing, 1996 (ICASSP-96) (Vol. 2, pp. 629–632). IEEE.

  • Stein, C. M. (1981). Estimation of the mean of a multivariate normal distribution. The Annals of Statistics, 9, 1135–1151.

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, X. P., Zhu, C.-Q., Li, Z.-G. (2002). A comparative study on wavelet packet based front-end in connected mandarin digit recognition. In: International symposium on Chinese spoken language processing.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anirban Bhowmick.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhowmick, A., Chandra, M. & Biswas, A. Speech enhancement using Teager energy operated ERB-like perceptual wavelet packet decomposition. Int J Speech Technol 20, 813–827 (2017). https://doi.org/10.1007/s10772-017-9448-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-017-9448-7

Keywords

Navigation