[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

A novel BNMF-DNN based speech reconstruction method for speech quality evaluation under complex environments

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Speech quality evaluation (SQE) under complex noisy environment is important for audio processing systems and quality of service. Recently, the non-intrusive SQE is getting more and more attentive due to its efficient and ease of use. However, non-intrusive SQEs are expected to be underperformed the intrusive ones since it has no prior knowledge of the clean speech. In this paper, a novel quasi-clean speech reconstruction method for non-intrusive SQE is proposed. The method incorporates Bayesian NMF (BNMF) with deep neural network (DNN), which takes the advantages of both NMF and DNN. BNMF is utilized to calculate the basic spectro-temporal matrixes of target speech, and the obtained matrices are integrated into the DNN model as an individual layer. Then DNN is trained to learn the complex mapping between the target source and the mixture signal, and reconstruct the magnitude spectrograms of the quasi-clean speech. Finally, the reconstructed speech is regarded as the reference of the perceptual model to estimate the Mean opinion score of the tested noisy sample. The experiment results show that the proposed method outperforms the comparative non-intrusive SQE algorithms under challenging conditions in terms of objective measurement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Gierlich H, Heute U, Moeller S (2014) Advances in perceptual modeling of speech quality in telecommunications. In: 2014 ITG symposium on speech communication, Erlangen, pp 1–4

  2. Wang J, Xie X, Li JX et al (2014) Research on audio quality evaluation standards. Inf Technol Stand 3:39–46

    Google Scholar 

  3. Zhou WL, Zhu Z (2019) A new online Bayesian NMF based quasi-clean speech reconstruction for non-intrusive voice quality evaluation. Neurocomputing 349:261–270

    Article  Google Scholar 

  4. Zhou WL, He QH (2015) Non-intrusive speech quality objective evaluation in high-noise environments. In: 2015 IEEE China summit and international conference on signal and information processing, Chengdu, pp 50–54

  5. ITU-T Rec. (2001) P.862, Perceptual Evaluation of Speech Quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs

  6. Ludovic M, Jens B, Martin K (2016) P.563-The ITU-T standard for single-ended speech quality assessment. IEEE Trans Audio Speech Lang Process 14:1924–1934

    Google Scholar 

  7. Rajesh KD, Arun K (2015) Non-intrusive speech quality assessment using multi-resolution auditory model features for degraded narrowband speech. IET Signal Proc 9:638–646

    Article  Google Scholar 

  8. Sharma D, Meredith L, Lainez J, Barreda D, Naylor PA (2014) A non-intrusive PESQ measure. In: 2014 IEEE international conference on GlobalSIP, pp 975–978

  9. Soni MH, Patil HA (2016) Novel subband autoencoder features for non-intrusive quality assessment of noise suppressed speech. In: 2016 conference of the international speech communication association on interspeech. IEEE, pp 3708–3712

  10. Fu SW, Tsao Y, Hwang HT et al (2018) Quality-net: an end-to-end non-intrusive speech quality assessment model based on BLSTM. arXiv preprint arXiv:1808.05344

  11. Zhou WL, Zhu Z, Liang PY (2019) Speech denoising using Bayesian NMF with online base update. Multimed Tools Appl 78(11):261–270

    Google Scholar 

  12. Chen Y, Shi L, Feng Q et al (2014) Artifact suppressed dictionary learning for low-dose CT image processing. IEEE Trans Med Imaging 33(12):2271–2292

    Article  Google Scholar 

  13. Chen Y, Zhang Y, Yang J et al (2018) Structure-adaptive fuzzy estimation for random-valued impulse noise suppression. IEEE Trans Circuits Syst Video Technol 28(2):414–427

    Article  Google Scholar 

  14. Zhou WL, He QH, Wang YL et al (2017) Sparse representation-based quasi-clean speech construction for speech quality assessment under complex environments. IET Signal Proc 11:486–493

    Article  Google Scholar 

  15. Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791

    Article  Google Scholar 

  16. Le Roux J, Weninger F, Hershey JR (2015) Sparse NMF-half-baked or well done? Mitsubishi Elect. Res. Cambridge, Tech. Rep. TR2015-023

  17. Weninger F, Le Roux J, Hershey JR, Watanabe S (2014) Discriminative NMF and its application to single-channel source separation. In: 2014 conference of the international speech communication association on interspeech. IEEE, pp 865–869

  18. Ogrady PD, Pearlmutter BA (2008) Discovering speech phones using convolutive non-negative matrix factorisation with a sparseness constraint. Neurocomputing 72(1):88–101

    Article  Google Scholar 

  19. Mysore GJ, Smaragdis P (2011) A non-negative approach to semisupervised separation of speech from noise with the use of temporal dynamics. In: 2011 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 1919–1926

  20. Schmidt MN, Larsen J (2008) Reduction of non-stationary noise using a non-negative latent variable decomposition. In: 2008 IEEE workshop on machine learning for signal process. IEEE, pp 486–491

  21. Mohammadiha N, Smaragdis P, Leijon A (2013) Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Trans Audio Speech Lang Process 21:2140–2151

    Article  Google Scholar 

  22. Han K, Wang Y, Wang DL, Woods WS, Merks I, Zhang T (2015) Learning spectral mapping for speech dereverberation and denoising. IEEE/ACM Trans Audio Speech Lang Process 23(6):982–992

    Article  Google Scholar 

  23. Wang Y, Narayanan A, Wang D (2014) On training targets for supervised speech separation. IEEE/ACM Trans Audio Speech Lang Process 22(12):1849–1858

    Article  Google Scholar 

  24. Erdogan H, Hershey JR, Watanabe S, Roux JL (2015) Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks. In: 2015 IEEE international conference on acoustics, speech signal process. IEEE, pp 708–712

  25. Williamson DS, Wang Y, Wang D (2016) Complex ratio masking for monaural speech separation. IEEE/ACM Trans Audio Speech Lang Process 24(3):483–492

    Article  Google Scholar 

  26. Rethage D, Pons J, Serra X (2018) A wavenet for speech denoising. In: 2018 IEEE international conference on acoustics speech signal processing. IEEE, pp 1927–1930

  27. Pascual S, Bonafonte A, Serra J (2017) Segan: speech enhancement generative adversarial network. Proc Interspeech 2017:3642–3646

    Article  Google Scholar 

  28. Soni MH, Shah N, Patil HA (2018) Time-frequency masking-based speech enhancement using generative adversarial network. In: 2018 IEEE international conference on acoustics, speech signal processing. IEEE, pp 1887–1890

  29. Wang Y, Wang D (2014) A structure-preserving training target for supervised speech separation. In: 2014 IEEE international conference on acoustics speech signal processing. IEEE, pp 6107–6111

  30. Kang TG, Kwon K, Shin JW, Kim NS (2015) NMF-based target source separation using deep neural network. IEEE Signal Process Lett 22(2):229–233

    Article  Google Scholar 

  31. Mohammadiha N, Taghia J, Leijon A (2012) Single channel speech enhancement using Bayesian NMF with recursive temporal updates of prior distributions. In: 2012 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 4561–4564

  32. Cemgil AT (2009) Bayesian inference for nonnegative matrix factorisation models. Comput Intell Neurosci 785152:17

    Google Scholar 

  33. Martin R (2005) Speech enhancement based on minimum mean-square error estimation and supergaussian priorsm. IEEE Trans Audio Speech Lang Process 13(5):845–856

    Article  Google Scholar 

  34. ITU-T P. Supplement-23 speech corpus. https://www.itu.int/net/itu-t/sigdb/genaudio/Pseries.htm#Psupp23. Accessed 1Jan 2019

  35. ‘NOIZEUS speech corpus.https://ecs.utdallas.edu/loizou/speech/noizeus/. Accessed 11Oct 2018

  36. ITU-T Rec (1996) P.800: ‘Methods for subjective determination of transmissionquality

  37. ‘Voice bank corpus’.https://www.infona.pl/resource/bwmeta1.element.ieee-art-000006709856/. Accessed 20Sept 2018

  38. ‘TIMIT speech corpus’. https://catalog.ldc.upenn.edu/. Accessed 20Sept 2018

  39. ‘NOISEX-92 database’. https://www.speech.cs.cmu.edu/. Accessed 1 Jan 2018

  40. Mohammadiha N, Leijon A Model order selection for nonnegative matrix factorization with application to speech enhancement. https://kth.diva-portal.org/smash/record.jsf?pid=diva2:447310. Accessed 15 Jan 2019

  41. Kwon K, Jong WS, Nam SK (2015) NMF-based speech enhancement using bases update. IEEE Sig Process Lett 22(4):450–454

    Article  Google Scholar 

  42. Sunnydayal V, Kumar TK (2018) Speech enhancement using posterior regularized NMF with bases update. Comput Electr Eng 62:663–675

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the Foshan University Research Foundation for Advanced Talents (GG07005), the Natural Science Foundation of Guangdong Province (2019A1515111148), Guangdong Province Colleges and Universities Young Innovative Talent Project (2019KQNCX168).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weili Zhou.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, W., Zhu, Z. A novel BNMF-DNN based speech reconstruction method for speech quality evaluation under complex environments. Int. J. Mach. Learn. & Cyber. 12, 959–972 (2021). https://doi.org/10.1007/s13042-020-01214-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-020-01214-3

Keywords

Navigation