Deep Speech Denoising with Minimal Dependence on Clean Speech Data

Venkateswarlu Poluboina ORCID: orcid.org/0000-0003-2366-1752¹,
Aparna Pulikala¹ &
Arivudai Nambi Pitchaimuthu²

239 Accesses
Explore all metrics

Abstract

Most of the existing deep learning-based speech denoising methods rely heavily on clean speech data. According to the traditional view, a large number of noisy and clean speech samples are required for good speech denoising performance. However, the data collection is a technical barrier to this criteria, particularly in economically challenged areas and for languages with limited resources. Training deep denoising networks with only noisy speech samples is a viable option to avoid dependence on sample data size. In this study, the target and input of a DCU-Net were trained using only noisy speech samples. Experimental results demonstrate that, when compared to traditional speech denoising techniques, the proposed approach avoids not only the high dependence on clean targets but also the high dependence on large data sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Speech Enhancement: Traditional and Deep Learning Techniques

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

Article Open access 25 October 2023

Data Availability

The data that support the findings of this study are available from the corresponding author on request.

References

N. Alamdari, A. Azarang, N. Kehtarnavaz, Improving deep speech denoising by noisy2noisy signal mapping. Appl. Acoust. 172, 107631 (2021)
Article Google Scholar
Y. Attabi, B. Champagne, W.P. Zhu, Dnn-based calibrated-filter models for speech enhancement. Circuits Syst. Signal Process. 40, 2926–2949 (2021)
Article Google Scholar
A. Azarang, N. Kehtarnavaz, A review of multi-objective deep learning speech denoising methods. Speech Commun. 122, 1–10 (2020)
Article Google Scholar
D. Baby, S. Verhulst, Sergan: speech enhancement using relativistic generative adversarial networks with gradient penalty. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 106–110. IEEE (2019)
H.S. Choi, J.H. Kim, J. Huh, A. Kim, J.W. Ha, K. Lee, Phase-aware speech enhancement with deep complex u-net. In: International Conference on Learning Representations (2019)
A. Defossez, G. Synnaeve, Y. Adi, Real time speech enhancement in the waveform domain. arXiv preprint arXiv:2006.12847 (2020)
S.W. Fu, C. Yu, T.A. Hsieh, P. Plantinga, M. Ravanelli, X. Lu, Y. Tsao, Metricgan+: an improved version of metricgan for speech enhancement. arXiv preprint arXiv:2104.03538 (2021)
E.M. Grais, M.D. Plumbley, Single channel audio source separation using convolutional denoising autoencoders. In: 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 1265–1269. IEEE (2017)
F. He, S.H.C. Chu, O. Kjartansson, C.E. Rivera, A. Katanova, A. Gutkin, I. Demirsahin, C.C. Johny, M. Jansche, S. Sarin, et al., Open-source multi-speaker speech corpora for building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu speech synthesis systems. In: Proceedings of the Twelfth Language Resources and Evaluation Conference (2020)
Y. Hu, P.C. Loizou, Evaluation of objective quality measures for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 16(1), 229–238 (2007)
Article Google Scholar
M.M. Kashyap, A. Tambwekar, K. Manohara, S. Natarajan, Speech denoising without clean training data: a noise2noise approach. arXiv preprint arXiv:2104.03838 (2021)
M. Kawanaka, Y. Koizumi, R. Miyazaki, K. Yatabe, Stable training of dnn for speech enhancement based on perceptually-motivated black-box cost function. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7524–7528. IEEE (2020)
Y. Koizumi, K. Yatabe, M. Delcroix, Y. Masuyama, D. Takeuchi, Speech enhancement using self-adaptation and multi-head self-attention. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 181–185. IEEE (2020)
J. Le Roux, S. Wisdom, H. Erdogan, J.R. Hershey, Sdr–half-baked or well done? In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 626–630. IEEE (2019)
J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, T. Aila, Noise2noise: learning image restoration without clean data. arXiv preprint arXiv:1803.04189 (2018)
X. Lu, Y. Tsao, S. Matsuda, C. Hori, Speech enhancement based on deep denoising autoencoder. In: Interspeech, vol. 2013, pp. 436–440 (2013)
A.A. Nugraha, A. Liutkus, E. Vincent, Multichannel audio source separation with deep neural networks. IEEE/ACM Trans. Audio, Speech, Lang. Process. 24(9), 1652–1664 (2016)
Article Google Scholar
S. Pascual, A. Bonafonte, J. Serra, Segan: speech enhancement generative adversarial network. arXiv preprint arXiv:1703.09452 (2017)
I.T. Recommendation, Perceptual evaluation of speech quality (pesq): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Rec. ITU-T P. 862 (2001)
O. Ronneberger, P. Fischer, T. Brox, U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18, pp. 234–241. Springer (2015)
J. Salamon, C. Jacoby, J.P. Bello, A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 1041–1044 (2014)
N. Sharma, M.K. Singh, S.Y. Low, A. Kumar, Weighted sigmoid-based frequency-selective noise filtering for speech denoising. Circuits Syst. Signal Process. 40, 276–295 (2021)
Article Google Scholar
J. Su, Z. Jin, A. Finkelstein, Hifi-gan: high-fidelity denoising and dereverberation based on speech deep features in adversarial networks. arXiv preprint arXiv:2006.05694 (2020)
C.H. Taal, R.C. Hendriks, R. Heusdens, J. Jensen, An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans. Audio Speech Lang. Process. 19(7), 2125–2136 (2011)
Article Google Scholar
N. Takahashi, N. Goswami, Y. Mitsufuji, Mmdenselstm: an efficient combination of convolutional and recurrent neural networks for audio source separation. In: 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 106–110. IEEE (2018)
J. Thiemann, N. Ito, E. Vincent, The diverse environments multi-channel acoustic noise database (demand): a database of multichannel environmental noise recordings. In: Proceedings of Meetings on Acoustics ICA2013, vol. 19, p. 035081. Acoustical Society of America (2013)
C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J.F. Santos, S. Mehri, N. Rostamzadeh, Y. Bengio, C.J. Pal, Deep complex networks. arXiv preprint arXiv:1705.09792 (2017)
C. Valentini-Botinhao, X. Wang, S. Takaki, J. Yamagishi, Speech enhancement for a noise-robust text-to-speech synthesis system using deep recurrent neural networks. In: Interspeech, vol. 8, pp. 352–356 (2016)
K. Wang, B. He, W.P. Zhu, Caunet: context-aware u-net for speech enhancement in time domain. In: 2021 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5. IEEE (2021)
D.S. Williamson, Y. Wang, D. Wang, Complex ratio masking for monaural speech separation. IEEE/ACM Trans. Audio, Speech, Lang. Process. 24(3), 483–492 (2015)
Article Google Scholar
J. Wu, Q. Li, G. Yang, L. Senhadji, H. Shu, Self-supervised speech denoising using only noisy audio signals. arXiv preprint arXiv:2111.00242 (2021)
Y. Xu, J. Du, L.R. Dai, C.H. Lee, An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process. Lett. 21(1), 65–68 (2013)
Article Google Scholar
Y. Xu, J. Du, L.R. Dai, C.H. Lee, A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans. Audio, Speech, Lang. Process. 23(1), 7–19 (2014)
Article Google Scholar
J. Zeng, L. Yang, Speech enhancement of complex convolutional recurrent network with attention. Circuits, Syst. Signal Process. 1–14 (2022)
H. Zhao, S. Zarar, I. Tashev, C.H. Lee, Convolutional-recurrent neural networks for speech enhancement. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2401–2405. IEEE (2018)

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, National Institute of Technology Karnataka, Surathkal, 575025, India
Venkateswarlu Poluboina & Aparna Pulikala
Department of Audiology and Speech Language Pathology, Kasturba Medical College, Mangalore, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India
Arivudai Nambi Pitchaimuthu

Authors

Venkateswarlu Poluboina
View author publications
You can also search for this author in PubMed Google Scholar
Aparna Pulikala
View author publications
You can also search for this author in PubMed Google Scholar
Arivudai Nambi Pitchaimuthu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Venkateswarlu Poluboina.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Poluboina, V., Pulikala, A. & Pitchaimuthu, A.N. Deep Speech Denoising with Minimal Dependence on Clean Speech Data. Circuits Syst Signal Process 43, 3909–3926 (2024). https://doi.org/10.1007/s00034-024-02644-y

Download citation

Received: 19 April 2023
Revised: 20 February 2024
Accepted: 21 February 2024
Published: 20 March 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s00034-024-02644-y

Deep Speech Denoising with Minimal Dependence on Clean Speech Data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Speech Enhancement: Traditional and Deep Learning Techniques

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Deep Speech Denoising with Minimal Dependence on Clean Speech Data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Speech Enhancement: Traditional and Deep Learning Techniques

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation