Abstract
A novel single-channel technique was proposed based on a minimum mean square error (MMSE) estimator to enhance short-time spectral amplitude (STSA) in the Discrete Fourier Transform (DFT) domain. In the present contribution, a Weibull distribution was used to model DFT magnitudes of clean speech signals under the additive Gaussian noise assumption. Moreover, the speech enhancement procedure was conducted with (WSPU) and without speech presence uncertainty (WoSPU). The theoretical spectral gain function was obtained as a weighted geometric mean of hypothetical gains associated with signal presence and absence. Extensive experiments were conducted with clean speech signals taken from the TIMIT database, which had been degraded by various additive non-stationary noise sources, and then enhanced signals were evaluated. The evaluation results demonstrated the outperformance of the proposed method compared to the probability density functions (PDF) of Rayleigh and Gamma distributions in terms of segmental signal-to-noise ratio (segSNR), general SNR, and perceptual evaluation of speech quality (PESQ). The performance in the WSPU case was also significantly improved compared to WoSPU, assuming Weibull speech priors in the MMSE-STSA based speech enhancement algorithm.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Andrianakis, I., & White, P. R. (2006). MMSE speech spectral amplitude estimators with chi and gamma speech priors. In IEEE international conference on acoustics speech and signal processing proceedings (Vol. 3, pp. 1068–1071).
Bahrami, M., & Faraji, N. (2017). Speech enhancement by minimum mean-square error spectral amplitude estimation assuming Weibull speech priors. In Artificial intelligence and signal processing conference (AISP) (pp. 190–194). IEEE.
Bahrami, M., & Seyedin, S. (2018). MMSE log-spectral amplitude estimation for single channel speech enhancement under speech presence uncertainty by Weibull speech priors. In Iranian conference on electrical engineering (ICEE) (pp. 749–754). IEEE.
Barnwell, T. P., III, Clements, M. A., & Quackenbush, S. R. (1988). Objective measures for speech quality testing. Englewood Cliffs, NJ: Prentice-Hall.
Brillinger, D. R. (2001). Time series: data analysis and theory. Philadelphia, PA: Society for Industrial Mathematics.
Chehrehsa, S., & Moir, T. J. (2016). Speech enhancement using maximum a posteriori and Gaussian mixture models for speech and noise periodogram estimation. Computer Speech & Language, 36, 58–71.
Chen, B., & Loizou, P. C. (2007). A Laplacian-based MMSE estimator for speech enhancement. Speech Communication, 49(2), 134–143.
Cover, T. M., & Thomas, J. A. (2012). Elements of information theory. New York: Wiley.
Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.
El-Fattah, M. A. A., Dessouky, M. I., Abbas, A. M., Diab, S. M., El-Rabaie, E. S. M., Al-Nuaimy, W., et al. (2014). Speech enhancement with an adaptive Wiener filter. International Journal of Speech Technology, 17(1), 53–64.
Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109–1121.
Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 443–445.
Erkelens, J. S., Hendriks, R. C., Heusdens, R., & Jensen, J. (2007). Minimum mean-square error estimation of discrete Fourier coefficients with generalized Gamma priors. IEEE Transactions on Audio, Speech and Language Processing, 15(6), 1741–1752.
Faraji, N., & Kohansal, A. (2018). MMSE and maximum a posteriori estimators for speech enhancement in additive noise assuming a t-location-scale clean speech prior. IET Signal Processing, 12(4), 532–543.
Fisher, W. M. (1986). Ther DARPA speech recognition research database: Specifications and status. In Proceedings of the DARPA workshop on speech recognition (pp. 93–99).
Fodor, B., & Fingscheidt, T. (2012). MMSE speech enhancement under speech presence uncertainty assuming (generalized) gamma speech priors throughout. In IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4033–4036).
Gerkmann, T., Breithaupt, C., & Martin, R. (2008). Improved a posteriori speech presence probability estimation based on a likelihood ratio with fixed priors. IEEE Transactions on Audio, Speech and Language Processing, 16(5), 910–919.
Gradshteyn, I. S., & Ryzhik, I. M. (2014). Table of integrals, series, and products. New York: Academic Press.
Hendriks, R. C., Heusdens, R., & Jensen, J. (2009). Log-spectral magnitude MMSE estimators under super-Gaussian densities. In 10th annual conference of the international speech communication association (pp. 1319–1322).
Hendriks, R. C., Heusdens, R., & Jensen, J. (2010). MMSE based noise PSD tracking with low complexity. In IEEE international conference on acoustics, speech and signal processing (pp. 4266–4269).
Hirsch, H. G., & Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In ASR-automatic speech recognition: Challenges for the new millenium ISCA tutorial and research workshop (ITRW).
Kayser, H., & Anemueller, J. (2016). Probabilistic spatial filter estimation for multi-channel signal enhancement in hearing aids. In Speech communication; 12. ITG symposium (pp. 1–5).
Kumar, B. (2018). Comparative performance evaluation of MMSE-based speech enhancement techniques through simulation and real-time implementation. International Journal of Speech Technology, 21(4), 1033–1044.
Liu, J., Zhou, Y., Ma, Y., & Liu, H. (2016). MMSE estimation of speech power spectral density under speech presence uncertainty for automatic speech recognition. In IEEE international conference on digital signal processing (DSP) (pp. 412–416).
Loizou, P. C. (2013). Speech enhancement: Theory and practice. Boca Raton: CRC Press.
Lotter, T., & Vary, P. (2005). Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model. EURASIP Journal on Applied Signal Processing, 7, 1110–1126.
Mahmmod, B. M., Ramli, A. R., Abdulhussian, S. H., Al-Haddad, S. A. R., & Jassim, W. A. (2017). Low-distortion MMSE speech enhancement estimator based on Laplacian prior. IEEE Access, 5, 9866–9881.
Malah, D., Cox, R. V., & Accardi, A. J. (1999). Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments. In IEEE international conference on acoustics, speech, and signal processing. proceedings. ICASSP99 (Vol. 2, pp. 789–792).
Martin, R. (2005). Speech enhancement based on minimum mean-square error estimation and super gaussian priors. IEEE Transactions on Speech and Audio Processing, 13(5), 845–856.
McAulay, R., & Malpass, M. (1980). Speech enhancement using a soft-decision noise suppression filter. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(2), 137–145.
McCallum, M., & Guillemin, B. (2013). Stochastic-deterministic MMSE STFT speech enhancement with general a priori information. IEEE Transactions on Audio, Speech and Language Processing, 21(7), 1445–1457.
Modhave, N., Karuna, Y., & Tonde, S. (2016). Design of multichannel wiener filter for speech enhancement in hearing aids and noise reduction technique. In IEEE online international conference on green engineering and technologies (IC-GET) (pp. 1–4).
Olver, F. W., Lozier, D. W., Boisvert, R. F., & Clark, C. W. (Eds.). (2010). NIST handbook of mathematical functions hardback and CD-ROM. New York: Cambridge University Press.
Paliwal, K., Wójcicki, K., & Schwerin, B. (2010). Single-channel speech enhancement using spectral subtraction in the short-time modulation domain. Speech Communication, 52(5), 450–475.
Papoulis, A., & Pillai, S. U. (2002). Probability, random variables, and stochastic processes. New York: McGraw-Hill.
Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In IEEE international conference on acoustics, speech, and signal processing. proceedings (Vol. 2, pp. 749–752).
Sheskin, D. J. (2003). Handbook of parametric and nonparametric statistical procedures. Boca Raton: Chapman & Hall/CRC Press.
Souden, M., Araki, S., Kinoshita, K., Nakatani, T., & Sawada, H. (2013). A multichannel MMSE-based framework for speech source separation and noise reduction. IEEE Transactions on Audio, Speech and Language Processing, 21(9), 1913–1928.
Tashev, I., & Acero, A. (2010). Statistical modeling of the speech signal. In International workshop on acoustic, echo, and noise control (IWAENC).
Tong, R., Bao, G., & Ye, Z. (2015). A higher order subspace algorithm for multichannel speech enhancement. IEEE Signal Processing Letters, 22(11), 2004–2008.
Tribolet, J. M., Noll, P., McDermott, B., & Crochiere, R. (1978). A study of complexity and quality of speech waveform coders. In IEEE international conference on acoustics, speech, and signal processing (Vol. 3, pp. 586–590).
Wei, Q., Xia, Y., & Jiang, S. (2013). A novel prewhitening subspace method for enhancing speech corrupted by colored noise. In IEEE 6th international congress on image and signal processing (CISP) (Vol. 3, pp. 1282–1286).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1
According to Papoulis and Pillai (2002) by integration of Eq. (23) in polar coordinate the univariate distribution is obtained
where \( R = \left| Y \right| = \sqrt {Y_{\text{Re}}^{2} + Y_{\text{Im}}^{2} } , \) and \( Y_{\text{Re}} \) and \( Y_{\text{Im}} \) denote the real and imaginary parts of noisy speech spectrum, respectively. We have the Weibull PDF in Eq. (1) in order to solve the integral in Eq. (36), we utilize Eq. (37)
Appendix 2
The 2-D convolution formula in polar coordinate is defined as
where r and θ are amplitude and phase variables.
Rights and permissions
About this article
Cite this article
Bahrami, M., Faraji, N. Minimum mean square error estimator for speech enhancement in additive noise assuming Weibull speech priors and speech presence uncertainty. Int J Speech Technol 24, 97–108 (2021). https://doi.org/10.1007/s10772-020-09767-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-020-09767-y