Abstract
In recent past, wavelet packet (WP) based speech enhancement techniques have been gaining popularity due to their inherent nature of noise minimization. WP based techniques appeared as more robust and efficient than short-time Fourier transform based methods. In the present work, a speech enhancement method using Teager energy operated equal rectangular bandwidth (ERB)-like WP decomposition has been proposed. Twenty four sub-band perceptual wavelet packet decomposition (PWPD) structure is implemented according to the auditory ERB scale. ERB scale based decomposition structure is used because the central frequency of the ERB scale distribution is similar to the frequency response of the human cochlea. Teager energy operator is applied to estimate the threshold value for the PWPD coefficients. Lastly, Wiener filtering is applied to remove the low frequency noise before final reconstruction stage. The proposed method has been applied to evaluate the Hindi sentences database, corrupted with six noise conditions. The proposed method’s performance is analysed with respect to several speech quality parameters and output signal to noise ratio levels. Performance indicates that the proposed technique outperforms some traditional speech enhancement algorithms at all SNR levels.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Allen, J. B. (1994). How do humans process and recognize speech? IEEE Transactions on Speech and Audio Processing, 2(4), 567–577.
Bahoura, M., & Rouat, J. (2001). Wavelet speech enhancement based on the teager energy operator. IEEE Signal Processing Letters, 8(1), 10–12.
Bahoura, M., & Rouat, J. (2006). Wavelet speech enhancement based on time-scale adaptation. Speech Communication, 48(12), 1620–1637.
Berouti, M., Schwartz, R., Makhoul, J., 1979. Enhancement of speech corrupted by acoustic noise. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP’79) (Vol. 4, pp. 208–211). IEEE.
Bhowmick, A., Chandra, M. (2017). Speech enhancement using voiced speech probability based wavelet decomposition. Computers & Electrical Engineering. doi:10.1016/j.compeleceng.2017.01.013.
Biswas, A., Sahu, P., & Chandra, M. (2014). Admissible wavelet packet features based on human inner ear frequency response for hindi consonant recognition. Computers & Electrical Engineering, 40(4), 1111–1122.
Chen, F., Loizou, P. C. (2010). Speech enhancement using a frequency-specific composite Wiener function. In: 2010 IEEE international conference on acoustics speech and signal processing (ICASSP) (pp. 4726–4729). IEEE.
Chen, S.-H., Wang, J.-F. (2004). Speech enhancement using perceptual wavelet packet decomposition and teager energy operator. In: Real world speech processing (pp. 51–65). New York: Springer.
Cohen, I. (2004). Speech enhancement using a noncausal a priori SNR estimator. IEEE Signal Processing Letters, 11(9), 725–728.
Donoho, D. L. (1995). De-noising by soft-thresholding. IEEE Transactions on Information Theory, 41(3), 613–627.
Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Processing, 33(2), 443–445.
Farooq, O., & Datta, S. (2001). Mel filter-like admissible wavelet packet structure for speech recognition. IEEE Signal Processing Letters, 8(7), 196–198.
Farooq, O., & Datta, S. (2003). Phoneme recognition using wavelet based features. Information Sciences, 150(1), 5–15.
Gandhiraj, R., Sathidevi, P. (2007). Auditory-based wavelet packet filterbank for speech recognition using neural network. In: International conference on advanced computing and communications (ADCOM), 2007 (pp. 666–673). IEEE.
Ghanbari, Y., & Karami-Mollaei, M. R. (2006). A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets. Speech Communication, 48(8), 927–940.
Gonzalez, S., & Brookes, M. (2014). Pefac-a pitch estimation algorithm robust to high levels of noise. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(2), 518–530.
Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.
Islam, M. T., Shahnaz, C., Zhu, W.-P., & Ahmad, M. O. (2015). Speech enhancement based on student modeling of teager energy operated perceptual wavelet packet coefficients and a custom thresholding function. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(11), 1800–1811.
Jabloun, F., & Champagne, B. (2003). Incorporating the human hearing properties in the signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing, 11(6), 700–708.
Johnson, M. T., Yuan, X., & Ren, Y. (2007). Speech signal enhancement through adaptive wavelet thresholding. Speech Communication, 49(2), 123–133.
Johnstone, I. M., & Silverman, B. W. (1997). Wavelet threshold estimators for data with correlated noise. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(2), 319–351.
Kaiser, J. F. (1993). Some useful properties of teager’s energy operators. In: IEEE international conference on acoustics, speech, and signal processing, 1993 (ICASSP-93) (Vol. 3, pp. 149–152). IEEE.
Kamath, S., Loizou, P. (2002). A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In: IEEE international conference on acoustics speech and signal processing (Vol. 4, pp. 4164–4164). Citeseer.
Lu, Y., & Loizou, P. C. (2008). A geometric approach to spectral subtraction. Speech Communication, 50(6), 453–466.
Mallat, S. (1999). A wavelet tour of signal processing. Cambridge: Academic Press.
Mittal, U., & Phamdo, N. (2000). Signal/noise KLT based approach for enhancing speech degraded by colored noise. IEEE Transactions on Speech and Audio Processing, 8(2), 159–167.
Plapous, C., Marro, C., & Scalart, P. (2006). Improved signal-to-noise ratio estimation for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 2098–2108.
Sahu, P., Biswas, A., Bhowmick, A., & Chandra, M. (2014). Auditory ERB like admissible wavelet packet features for timit phoneme recognition. Engineering Science and Technology, An International Journal, 17(3), 145–151.
Samudravijaya, K., Rawat, K., & Rao, P. (1998). Design of phonetically rich sentences for hindi speech database. The Acoustical Society of India, 26, 466–471.
Scalart, P., et al. (1996). Speech enhancement based on a priori signal to noise estimation. In: 1996 IEEE international conference on acoustics, speech, and signal processing, 1996 (ICASSP-96) (Vol. 2, pp. 629–632). IEEE.
Stein, C. M. (1981). Estimation of the mean of a multivariate normal distribution. The Annals of Statistics, 9, 1135–1151.
Wang, X. P., Zhu, C.-Q., Li, Z.-G. (2002). A comparative study on wavelet packet based front-end in connected mandarin digit recognition. In: International symposium on Chinese spoken language processing.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bhowmick, A., Chandra, M. & Biswas, A. Speech enhancement using Teager energy operated ERB-like perceptual wavelet packet decomposition. Int J Speech Technol 20, 813–827 (2017). https://doi.org/10.1007/s10772-017-9448-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-017-9448-7