A novel BNMF-DNN based speech reconstruction method for speech quality evaluation under complex environments

Weili Zhou¹ &
Zhen Zhu¹

222 Accesses
7 Citations
Explore all metrics

Abstract

Speech quality evaluation (SQE) under complex noisy environment is important for audio processing systems and quality of service. Recently, the non-intrusive SQE is getting more and more attentive due to its efficient and ease of use. However, non-intrusive SQEs are expected to be underperformed the intrusive ones since it has no prior knowledge of the clean speech. In this paper, a novel quasi-clean speech reconstruction method for non-intrusive SQE is proposed. The method incorporates Bayesian NMF (BNMF) with deep neural network (DNN), which takes the advantages of both NMF and DNN. BNMF is utilized to calculate the basic spectro-temporal matrixes of target speech, and the obtained matrices are integrated into the DNN model as an individual layer. Then DNN is trained to learn the complex mapping between the target source and the mixture signal, and reconstruct the magnitude spectrograms of the quasi-clean speech. Finally, the reconstructed speech is regarded as the reference of the perceptual model to estimate the Mean opinion score of the tested noisy sample. The experiment results show that the proposed method outperforms the comparative non-intrusive SQE algorithms under challenging conditions in terms of objective measurement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning Based Speech Quality Assessment Focusing on Noise Effects

A hybrid speech enhancement system with DNN based speech reconstruction and Kalman filtering

Article 29 August 2020

Non-intrusive speech quality prediction based on the blind estimation of clean speech and the i-vector framework

Article 06 October 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Gierlich H, Heute U, Moeller S (2014) Advances in perceptual modeling of speech quality in telecommunications. In: 2014 ITG symposium on speech communication, Erlangen, pp 1–4
Wang J, Xie X, Li JX et al (2014) Research on audio quality evaluation standards. Inf Technol Stand 3:39–46
Google Scholar
Zhou WL, Zhu Z (2019) A new online Bayesian NMF based quasi-clean speech reconstruction for non-intrusive voice quality evaluation. Neurocomputing 349:261–270
Article Google Scholar
Zhou WL, He QH (2015) Non-intrusive speech quality objective evaluation in high-noise environments. In: 2015 IEEE China summit and international conference on signal and information processing, Chengdu, pp 50–54
ITU-T Rec. (2001) P.862, Perceptual Evaluation of Speech Quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs
Ludovic M, Jens B, Martin K (2016) P.563-The ITU-T standard for single-ended speech quality assessment. IEEE Trans Audio Speech Lang Process 14:1924–1934
Google Scholar
Rajesh KD, Arun K (2015) Non-intrusive speech quality assessment using multi-resolution auditory model features for degraded narrowband speech. IET Signal Proc 9:638–646
Article Google Scholar
Sharma D, Meredith L, Lainez J, Barreda D, Naylor PA (2014) A non-intrusive PESQ measure. In: 2014 IEEE international conference on GlobalSIP, pp 975–978
Soni MH, Patil HA (2016) Novel subband autoencoder features for non-intrusive quality assessment of noise suppressed speech. In: 2016 conference of the international speech communication association on interspeech. IEEE, pp 3708–3712
Fu SW, Tsao Y, Hwang HT et al (2018) Quality-net: an end-to-end non-intrusive speech quality assessment model based on BLSTM. arXiv preprint arXiv:1808.05344
Zhou WL, Zhu Z, Liang PY (2019) Speech denoising using Bayesian NMF with online base update. Multimed Tools Appl 78(11):261–270
Google Scholar
Chen Y, Shi L, Feng Q et al (2014) Artifact suppressed dictionary learning for low-dose CT image processing. IEEE Trans Med Imaging 33(12):2271–2292
Article Google Scholar
Chen Y, Zhang Y, Yang J et al (2018) Structure-adaptive fuzzy estimation for random-valued impulse noise suppression. IEEE Trans Circuits Syst Video Technol 28(2):414–427
Article Google Scholar
Zhou WL, He QH, Wang YL et al (2017) Sparse representation-based quasi-clean speech construction for speech quality assessment under complex environments. IET Signal Proc 11:486–493
Article Google Scholar
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791
Article Google Scholar
Le Roux J, Weninger F, Hershey JR (2015) Sparse NMF-half-baked or well done? Mitsubishi Elect. Res. Cambridge, Tech. Rep. TR2015-023
Weninger F, Le Roux J, Hershey JR, Watanabe S (2014) Discriminative NMF and its application to single-channel source separation. In: 2014 conference of the international speech communication association on interspeech. IEEE, pp 865–869
Ogrady PD, Pearlmutter BA (2008) Discovering speech phones using convolutive non-negative matrix factorisation with a sparseness constraint. Neurocomputing 72(1):88–101
Article Google Scholar
Mysore GJ, Smaragdis P (2011) A non-negative approach to semisupervised separation of speech from noise with the use of temporal dynamics. In: 2011 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 1919–1926
Schmidt MN, Larsen J (2008) Reduction of non-stationary noise using a non-negative latent variable decomposition. In: 2008 IEEE workshop on machine learning for signal process. IEEE, pp 486–491
Mohammadiha N, Smaragdis P, Leijon A (2013) Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Trans Audio Speech Lang Process 21:2140–2151
Article Google Scholar
Han K, Wang Y, Wang DL, Woods WS, Merks I, Zhang T (2015) Learning spectral mapping for speech dereverberation and denoising. IEEE/ACM Trans Audio Speech Lang Process 23(6):982–992
Article Google Scholar
Wang Y, Narayanan A, Wang D (2014) On training targets for supervised speech separation. IEEE/ACM Trans Audio Speech Lang Process 22(12):1849–1858
Article Google Scholar
Erdogan H, Hershey JR, Watanabe S, Roux JL (2015) Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks. In: 2015 IEEE international conference on acoustics, speech signal process. IEEE, pp 708–712
Williamson DS, Wang Y, Wang D (2016) Complex ratio masking for monaural speech separation. IEEE/ACM Trans Audio Speech Lang Process 24(3):483–492
Article Google Scholar
Rethage D, Pons J, Serra X (2018) A wavenet for speech denoising. In: 2018 IEEE international conference on acoustics speech signal processing. IEEE, pp 1927–1930
Pascual S, Bonafonte A, Serra J (2017) Segan: speech enhancement generative adversarial network. Proc Interspeech 2017:3642–3646
Article Google Scholar
Soni MH, Shah N, Patil HA (2018) Time-frequency masking-based speech enhancement using generative adversarial network. In: 2018 IEEE international conference on acoustics, speech signal processing. IEEE, pp 1887–1890
Wang Y, Wang D (2014) A structure-preserving training target for supervised speech separation. In: 2014 IEEE international conference on acoustics speech signal processing. IEEE, pp 6107–6111
Kang TG, Kwon K, Shin JW, Kim NS (2015) NMF-based target source separation using deep neural network. IEEE Signal Process Lett 22(2):229–233
Article Google Scholar
Mohammadiha N, Taghia J, Leijon A (2012) Single channel speech enhancement using Bayesian NMF with recursive temporal updates of prior distributions. In: 2012 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 4561–4564
Cemgil AT (2009) Bayesian inference for nonnegative matrix factorisation models. Comput Intell Neurosci 785152:17
Google Scholar
Martin R (2005) Speech enhancement based on minimum mean-square error estimation and supergaussian priorsm. IEEE Trans Audio Speech Lang Process 13(5):845–856
Article Google Scholar
ITU-T P. Supplement-23 speech corpus. https://www.itu.int/net/itu-t/sigdb/genaudio/Pseries.htm#Psupp23. Accessed 1Jan 2019
‘NOIZEUS speech corpus.https://ecs.utdallas.edu/loizou/speech/noizeus/. Accessed 11Oct 2018
ITU-T Rec (1996) P.800: ‘Methods for subjective determination of transmissionquality
‘Voice bank corpus’.https://www.infona.pl/resource/bwmeta1.element.ieee-art-000006709856/. Accessed 20Sept 2018
‘TIMIT speech corpus’. https://catalog.ldc.upenn.edu/. Accessed 20Sept 2018
‘NOISEX-92 database’. https://www.speech.cs.cmu.edu/. Accessed 1 Jan 2018
Mohammadiha N, Leijon A Model order selection for nonnegative matrix factorization with application to speech enhancement. https://kth.diva-portal.org/smash/record.jsf?pid=diva2:447310. Accessed 15 Jan 2019
Kwon K, Jong WS, Nam SK (2015) NMF-based speech enhancement using bases update. IEEE Sig Process Lett 22(4):450–454
Article Google Scholar
Sunnydayal V, Kumar TK (2018) Speech enhancement using posterior regularized NMF with bases update. Comput Electr Eng 62:663–675
Article Google Scholar

Download references

Acknowledgements

This work is supported by the Foshan University Research Foundation for Advanced Talents (GG07005), the Natural Science Foundation of Guangdong Province (2019A1515111148), Guangdong Province Colleges and Universities Young Innovative Talent Project (2019KQNCX168).

Author information

Authors and Affiliations

School of Electronic and Information Engineering, Foshan University, Foshan, People’s Republic of China
Weili Zhou & Zhen Zhu

Authors

Weili Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weili Zhou.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, W., Zhu, Z. A novel BNMF-DNN based speech reconstruction method for speech quality evaluation under complex environments. Int. J. Mach. Learn. & Cyber. 12, 959–972 (2021). https://doi.org/10.1007/s13042-020-01214-3

Download citation

Received: 29 September 2019
Accepted: 20 September 2020
Published: 06 October 2020
Issue Date: April 2021
DOI: https://doi.org/10.1007/s13042-020-01214-3

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Learning Based Speech Quality Assessment Focusing on Noise Effects

A hybrid speech enhancement system with DNN based speech reconstruction and Kalman filtering

Non-intrusive speech quality prediction based on the blind estimation of clean speech and the i-vector framework

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A novel BNMF-DNN based speech reconstruction method for speech quality evaluation under complex environments

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Learning Based Speech Quality Assessment Focusing on Noise Effects

A hybrid speech enhancement system with DNN based speech reconstruction and Kalman filtering

Non-intrusive speech quality prediction based on the blind estimation of clean speech and the i-vector framework

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation