Abstract
In this paper, STFT based speech enhancement algorithms based on estimation of short time spectral amplitudes are proposed. These algorithms use maximum likelihood, maximum a posterior and minimum mean square error (MMSE) estimators which respectively uses Laplace, Gamma and Exponential probability density functions as noise spectral amplitude priors and Nakagami distribution as speech spectral amplitude priors. The phase of noisy speech carries significant information to be retrieved and utilized. However, the undesired artifacts which are the resultant of the process do create many challenges. In this paper, the reconstructed phase is treated as an uncertain prior knowledge when deriving a joint MMSE estimate of the (C)omplex speech coefficients given (U)ncertain (P)hase information is proposed. The proposed phase reconstruction algorithm assists in generating a clean speech phase. The proposed estimator reduces undesired artifacts and also gives satisfactory values between noisy phase signal and estimate of prior phase and hence yields superior performance in the instrument measures, informal listening and speech quality.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Andrianakis, I., & White, P. R. (2006). MMSE speech spectral amplitude estimators with Chi and Gamma speech priors. In Proceedings of international conference on acoustic, speech and signal processing (ICASSP 2006), Toulouse, France, pp. 1068–1071. doi:10.1109/ICASSP.2006.1660842
Breithaupt, C., Gerkmann, T., & Martin, R. (2008a). A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing. In Proceedings of international conference on acoustic, speech and signal processing (ICASSP 2008), Las Vegas, NV, USA, pp. 4897–4900. doi:10.1109/ICASSP.2008.4518755
Breithaupt, C., Krawczyk, M., & Martin, R. (2008b). Parameterized MMSE spectral magnitude estimation for the enhancement of noisy speech. In Proceedings of international conference on acoustic, speech and signal processing (ICASSP 2008), Las Vegas, NV, USA, pp. 4037–4040. doi:10.1109/ICASSP.2008.4518540
Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transaction on Acoustic, Speech, Signal Processing, 32(6), 1109–1121. doi:10.1109/TASSP.1984.1164453.
Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transaction on Acoustic, Speech, Signal Processing, 33(2), 443–445. doi:10.1109/TASSP.1985.1164550.
Erkelens, J. S., Hendriks, R. C., Heusdens, R., & Jensen, J. (2007). Minimum mean-square error estimation of discrete fourier coefficients with generalized gamma priors. IEEE Transaction on Audio, Speech and Language Processing, 15(6), 1741–1752. doi:10.1109/TASL.2007.899233.
Evans, M., Hastings, N., & Peacock, B. (2000). von Mises distribution. In Statistical distributions (ch. 45, pp. 191–192), 4th ed. New York: Wiley.
Gerkmann, T. (2014). MMSE-optimal enhancement of complex speech coefficientswith uncertain prior knowledge of the clean speech phase. In Proceedings of international conference on acoustic, speech and signal processing (ICASSP 2014), Florence, Italy, pp. 4478–4482. doi:10.1109/ICASSP.2014.6854449
Gerkmann, T., & Hendriks, R. C. (2012). Unbiased MMSEbased noise power estimation with low complexity and low tracking delay. IEEE Transaction on Audio, Speech, Language Processing, 20(4), 1383–1393. doi:10.1109/TASL.2011.2180896.
Gerkmann, T., & Krawczyk, M. (2013). MMSE-optimal spectral amplitude estimation given the STFT-phase. IEEE Signal Processing Letters, 20(2), 129–132. doi:10.1109/LSP.2012.2233470.
Gerkmann, T., & Martin, R. (2009). On the statistics of spectral amplitudes after variance reduction by temporal cepstrum smoothing and cepstral nulling. IEEE Transaction on Signal Processing, 57(11), 4165–4174. doi:10.1109/TSP.2009.2025795.
Gonzalez, S., & Brookes, M. (2014). PEFAC—a pitch estimation algorithm robust to high levels of noise. IEEE Transaction on Audio, Speech, Language Processing, 22(2), 518–530. doi:10.1109/TASLP.2013.2295918.
Gradshteyn, I. S., & Ryzhik, I. M. (2007). Table of integrals series and products (7th ed.). San Diego, CA: Academic.
Griffin, D., & Lim, J. S. (1984). Signal estimation from modified short-time fourier transform. IEEE Transaction on Acoustics, Speech, and Signal Processing, 32(2), 236–243. doi:10.1109/TASSP.1984.1164317.
Hendriks, R. C., Gerkmann, T., & Jensen, J. (2013). DFT-domain based single-microphone noise reduction for speech enhancement: A survey of the state of the art. Synthesis Lectures on Speech and Audio Processing, 9(1), 1–80. doi:10.2200/S00473ED1V01Y201301SAP011.
Krawczyk, M., & Gerkmann, T. (2012). STFT phase improvement for single channel speech enhancement. In Acoustic signal enhancement; proceedings of IWAENC 2012; international workshop O. VDE, Aachen, Germany, pp. 1–4. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6309424.
Krawczyk, M., & Gerkmann, T. (2014). STFT phase reconstruction in voiced speech for an improved single channel speech enhancement. IEEE Transaction on Audio, Speech and Language Processing, 22(12), 1931–1940. doi:10.1109/TASLP.2014.2354236.
Krawczyk, M., Rehr, R., & Gerkmann, T. (2013). Phase-sensitive real-time capable speech enhancement under voiced- unvoiced uncertainty. In Proceeding of Eur. signal processing conference (EUSIPCO 2013), Morocco, pp. 1–5. http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6811648&url=http%3A%2F%2Fieeexplore.ieee.org%2Fstamp%2Fstamp.jsp%3Ftp%3D%26arnumber%3D6811648.
Le Roux, J., & Vincent, E. (2013). Consistent Wiener filtering for audio source separation. IEEE Signal Processing Letter, 20(3), 217–220. doi:10.1109/LSP.2012.2225617.
Loizou, P. C. (2007). Speech enhancement-theory and practice. Boca Raton, FL: CRC Press, Taylor & Francis Group.
Lotter, T., & Vary, P. (2005). Speech enhancement by MAP spectral amplitude estimation using a super-gaussian speech model. Eurasip Journal on Applied Signal Processing, 7, 1110–1126. http://www.ind.rwth-aachen.de/fileadmin/publications/lotter05a.pdf
Mardia, K. V., & Jupp, P. E. (2000). Directional statistics. Chichester: Wiley.
Mowlaee, P., & Saeidi, R. (2013). Iterative closed-loop phase aware single-channel speech enhancement. IEEE Signal Procesing Letter, 20(12), 1235–1239. doi:10.1109/LSP.2013.2286748.
Paliwal, K., Wójcicki, K., & Shannon, B. (2011). The importance of phase in speech enhancement. Speech Communication, 53(4), 465–494. doi:10.1016/j.specom.2010.12.003.
Ryzhik, I., & Gradshteyn, I. S. (2007). Table of integrals series and products (7th ed.). CA: Academic Press.
Sturmel, N., & Daudet, L. (2011). Signal reconstruction from STFT magnitude: A state of the art. In International conference on digital audio effects (DAFx), Paris, France, pp. 375–386. http://recherche.ircam.fr/pub/dafx11/Papers/27_e.pdf
Sturmel, N., & Daudet, L. (2012). Iterative phase reconstruction of Wiener filtered signals. In Proceedings of international conference on acoustic, speech and signal processing (ICASSP 2012), Kyoto, Japan, pp. 101–104. doi:10.1109/ICASSP.2012.6287827
Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J. (2011). An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Transaction on Audio, Speech, Language Processing, 19(7), 2125–2136. doi:10.1109/TASL.2011.2114881.
Vary, P., & Eurasip, M. (1985). Noise suppression by spectral magnitude estimation—mechanism and theoretical limits. Signal Processing, 8(4), 387–400. doi:10.1016/0165-1684(85)90002-7.
Wang, D., & Lim, J. (1982). The unimportance of phase in speech enhancement. IEEE Transaction on Acoustics, Speech and Signal Processing, 30(4), 679–681. doi:10.1109/TASSP.1982.1163920.
You, C. H., Koh, S. N., & Rahardja, S. (2005). β-order MMSE spectral amplitude estimation for speech enhancement. IEEE Transaction on Speech Audio Processing, 13(4), 475–486. doi:10.1109/TSA.2005.848883.
Author information
Authors and Affiliations
Corresponding author
Appendix: MMSE CUP Estimator (Speech as Nakagami PDF and noise as exponential PDF)
Appendix: MMSE CUP Estimator (Speech as Nakagami PDF and noise as exponential PDF)
Assuming that speech coefficients as Nakagami
Assuming that noise coefficients as exponential
von Mises distribution with concentration
To determine posterior, substitute (44), (45) and (46) in (43),
From Gradshteyn and Ryzhik (2007), Eq. 3.462.1
Multiply and divide with \( \sigma_{N}^{2} \) with in the integration
where \( \xi = {\raise0.7ex\hbox{${\sigma_{S}^{2} }$} \!\mathord{\left/ {\vphantom {{\sigma_{S}^{2} } {\sigma_{N}^{2} }}}\right.\kern-0pt} \!\lower0.7ex\hbox{${\sigma_{N}^{2} }$}} \), \( \upsilon = \sqrt {\frac{\xi }{4\mu }} \cos \left( {\phi_{S} - \phi_{Y} } \right) \)
Rights and permissions
About this article
Cite this article
Sunnydayal, V., Kumar, T.K. Bayesian estimation for speech enhancement given a priori knowledge of clean speech phase. Int J Speech Technol 18, 593–607 (2015). https://doi.org/10.1007/s10772-015-9306-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-015-9306-4