[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Minimum mean square error estimator for speech enhancement in additive noise assuming Weibull speech priors and speech presence uncertainty

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

A novel single-channel technique was proposed based on a minimum mean square error (MMSE) estimator to enhance short-time spectral amplitude (STSA) in the Discrete Fourier Transform (DFT) domain. In the present contribution, a Weibull distribution was used to model DFT magnitudes of clean speech signals under the additive Gaussian noise assumption. Moreover, the speech enhancement procedure was conducted with (WSPU) and without speech presence uncertainty (WoSPU). The theoretical spectral gain function was obtained as a weighted geometric mean of hypothetical gains associated with signal presence and absence. Extensive experiments were conducted with clean speech signals taken from the TIMIT database, which had been degraded by various additive non-stationary noise sources, and then enhanced signals were evaluated. The evaluation results demonstrated the outperformance of the proposed method compared to the probability density functions (PDF) of Rayleigh and Gamma distributions in terms of segmental signal-to-noise ratio (segSNR), general SNR, and perceptual evaluation of speech quality (PESQ). The performance in the WSPU case was also significantly improved compared to WoSPU, assuming Weibull speech priors in the MMSE-STSA based speech enhancement algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Andrianakis, I., & White, P. R. (2006). MMSE speech spectral amplitude estimators with chi and gamma speech priors. In IEEE international conference on acoustics speech and signal processing proceedings (Vol. 3, pp. 1068–1071).

  • Bahrami, M., & Faraji, N. (2017). Speech enhancement by minimum mean-square error spectral amplitude estimation assuming Weibull speech priors. In Artificial intelligence and signal processing conference (AISP) (pp. 190–194). IEEE.

  • Bahrami, M., & Seyedin, S. (2018). MMSE log-spectral amplitude estimation for single channel speech enhancement under speech presence uncertainty by Weibull speech priors. In Iranian conference on electrical engineering (ICEE) (pp. 749–754). IEEE.

  • Barnwell, T. P., III, Clements, M. A., & Quackenbush, S. R. (1988). Objective measures for speech quality testing. Englewood Cliffs, NJ: Prentice-Hall.

    Google Scholar 

  • Brillinger, D. R. (2001). Time series: data analysis and theory. Philadelphia, PA: Society for Industrial Mathematics.

    Book  Google Scholar 

  • Chehrehsa, S., & Moir, T. J. (2016). Speech enhancement using maximum a posteriori and Gaussian mixture models for speech and noise periodogram estimation. Computer Speech & Language, 36, 58–71.

    Article  Google Scholar 

  • Chen, B., & Loizou, P. C. (2007). A Laplacian-based MMSE estimator for speech enhancement. Speech Communication, 49(2), 134–143.

    Article  Google Scholar 

  • Cover, T. M., & Thomas, J. A. (2012). Elements of information theory. New York: Wiley.

    MATH  Google Scholar 

  • Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.

    MathSciNet  MATH  Google Scholar 

  • El-Fattah, M. A. A., Dessouky, M. I., Abbas, A. M., Diab, S. M., El-Rabaie, E. S. M., Al-Nuaimy, W., et al. (2014). Speech enhancement with an adaptive Wiener filter. International Journal of Speech Technology, 17(1), 53–64.

    Article  Google Scholar 

  • Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109–1121.

    Article  Google Scholar 

  • Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 443–445.

    Article  Google Scholar 

  • Erkelens, J. S., Hendriks, R. C., Heusdens, R., & Jensen, J. (2007). Minimum mean-square error estimation of discrete Fourier coefficients with generalized Gamma priors. IEEE Transactions on Audio, Speech and Language Processing, 15(6), 1741–1752.

    Article  Google Scholar 

  • Faraji, N., & Kohansal, A. (2018). MMSE and maximum a posteriori estimators for speech enhancement in additive noise assuming a t-location-scale clean speech prior. IET Signal Processing, 12(4), 532–543.

    Article  Google Scholar 

  • Fisher, W. M. (1986). Ther DARPA speech recognition research database: Specifications and status. In Proceedings of the DARPA workshop on speech recognition (pp. 93–99).

  • Fodor, B., & Fingscheidt, T. (2012). MMSE speech enhancement under speech presence uncertainty assuming (generalized) gamma speech priors throughout. In IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4033–4036).

  • Gerkmann, T., Breithaupt, C., & Martin, R. (2008). Improved a posteriori speech presence probability estimation based on a likelihood ratio with fixed priors. IEEE Transactions on Audio, Speech and Language Processing, 16(5), 910–919.

    Article  Google Scholar 

  • Gradshteyn, I. S., & Ryzhik, I. M. (2014). Table of integrals, series, and products. New York: Academic Press.

    MATH  Google Scholar 

  • Hendriks, R. C., Heusdens, R., & Jensen, J. (2009). Log-spectral magnitude MMSE estimators under super-Gaussian densities. In 10th annual conference of the international speech communication association (pp. 1319–1322).

  • Hendriks, R. C., Heusdens, R., & Jensen, J. (2010). MMSE based noise PSD tracking with low complexity. In IEEE international conference on acoustics, speech and signal processing (pp. 4266–4269).

  • Hirsch, H. G., & Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In ASR-automatic speech recognition: Challenges for the new millenium ISCA tutorial and research workshop (ITRW).

  • Kayser, H., & Anemueller, J. (2016). Probabilistic spatial filter estimation for multi-channel signal enhancement in hearing aids. In Speech communication; 12. ITG symposium (pp. 1–5).

  • Kumar, B. (2018). Comparative performance evaluation of MMSE-based speech enhancement techniques through simulation and real-time implementation. International Journal of Speech Technology, 21(4), 1033–1044.

    Article  Google Scholar 

  • Liu, J., Zhou, Y., Ma, Y., & Liu, H. (2016). MMSE estimation of speech power spectral density under speech presence uncertainty for automatic speech recognition. In IEEE international conference on digital signal processing (DSP) (pp. 412–416).

  • Loizou, P. C. (2013). Speech enhancement: Theory and practice. Boca Raton: CRC Press.

    Book  Google Scholar 

  • Lotter, T., & Vary, P. (2005). Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model. EURASIP Journal on Applied Signal Processing, 7, 1110–1126.

    MATH  Google Scholar 

  • Mahmmod, B. M., Ramli, A. R., Abdulhussian, S. H., Al-Haddad, S. A. R., & Jassim, W. A. (2017). Low-distortion MMSE speech enhancement estimator based on Laplacian prior. IEEE Access, 5, 9866–9881.

    Article  Google Scholar 

  • Malah, D., Cox, R. V., & Accardi, A. J. (1999). Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments. In IEEE international conference on acoustics, speech, and signal processing. proceedings. ICASSP99 (Vol. 2, pp. 789–792).

  • Martin, R. (2005). Speech enhancement based on minimum mean-square error estimation and super gaussian priors. IEEE Transactions on Speech and Audio Processing, 13(5), 845–856.

    Article  Google Scholar 

  • McAulay, R., & Malpass, M. (1980). Speech enhancement using a soft-decision noise suppression filter. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(2), 137–145.

    Article  Google Scholar 

  • McCallum, M., & Guillemin, B. (2013). Stochastic-deterministic MMSE STFT speech enhancement with general a priori information. IEEE Transactions on Audio, Speech and Language Processing, 21(7), 1445–1457.

    Article  Google Scholar 

  • Modhave, N., Karuna, Y., & Tonde, S. (2016). Design of multichannel wiener filter for speech enhancement in hearing aids and noise reduction technique. In IEEE online international conference on green engineering and technologies (IC-GET) (pp. 1–4).

  • Olver, F. W., Lozier, D. W., Boisvert, R. F., & Clark, C. W. (Eds.). (2010). NIST handbook of mathematical functions hardback and CD-ROM. New York: Cambridge University Press.

    Google Scholar 

  • Paliwal, K., Wójcicki, K., & Schwerin, B. (2010). Single-channel speech enhancement using spectral subtraction in the short-time modulation domain. Speech Communication, 52(5), 450–475.

    Article  Google Scholar 

  • Papoulis, A., & Pillai, S. U. (2002). Probability, random variables, and stochastic processes. New York: McGraw-Hill.

    Google Scholar 

  • Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In IEEE international conference on acoustics, speech, and signal processing. proceedings (Vol. 2, pp. 749–752).

  • Sheskin, D. J. (2003). Handbook of parametric and nonparametric statistical procedures. Boca Raton: Chapman & Hall/CRC Press.

    Book  Google Scholar 

  • Souden, M., Araki, S., Kinoshita, K., Nakatani, T., & Sawada, H. (2013). A multichannel MMSE-based framework for speech source separation and noise reduction. IEEE Transactions on Audio, Speech and Language Processing, 21(9), 1913–1928.

    Article  Google Scholar 

  • Tashev, I., & Acero, A. (2010). Statistical modeling of the speech signal. In International workshop on acoustic, echo, and noise control (IWAENC).

  • Tong, R., Bao, G., & Ye, Z. (2015). A higher order subspace algorithm for multichannel speech enhancement. IEEE Signal Processing Letters, 22(11), 2004–2008.

    Article  Google Scholar 

  • Tribolet, J. M., Noll, P., McDermott, B., & Crochiere, R. (1978). A study of complexity and quality of speech waveform coders. In IEEE international conference on acoustics, speech, and signal processing (Vol. 3, pp. 586–590).

  • Wei, Q., Xia, Y., & Jiang, S. (2013). A novel prewhitening subspace method for enhancing speech corrupted by colored noise. In IEEE 6th international congress on image and signal processing (CISP) (Vol. 3, pp. 1282–1286).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Neda Faraji.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

According to Papoulis and Pillai (2002) by integration of Eq. (23) in polar coordinate the univariate distribution is obtained

$$ p(R) = \frac{\partial }{\partial R}\int\limits_{0}^{R} {\int\limits_{0}^{2\pi } {\frac{1}{2\pi }} } p(Y \equiv X)rd\varphi dr, $$
(36)

where \( R = \left| Y \right| = \sqrt {Y_{\text{Re}}^{2} + Y_{\text{Im}}^{2} } , \) and \( Y_{\text{Re}} \) and \( Y_{\text{Im}} \) denote the real and imaginary parts of noisy speech spectrum, respectively. We have the Weibull PDF in Eq. (1) in order to solve the integral in Eq. (36), we utilize Eq. (37)

$$ \frac{\partial }{\partial w}\int\limits_{0}^{w} {z(v)dv} = z(w). $$
(37)

Appendix 2

The 2-D convolution formula in polar coordinate is defined as

$$ f(R,\theta ) = g(R,\theta )\, * \,h(R,\theta ) = \int\limits_{0}^{\infty } {\int\limits_{0}^{2\pi } {g(R - t,\theta - \alpha )h(t,\alpha )td\alpha dt} } $$
(38)

where r and θ are amplitude and phase variables.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bahrami, M., Faraji, N. Minimum mean square error estimator for speech enhancement in additive noise assuming Weibull speech priors and speech presence uncertainty. Int J Speech Technol 24, 97–108 (2021). https://doi.org/10.1007/s10772-020-09767-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-020-09767-y

Keywords

Navigation