Robust Sound Event Detection by a Two-Stage Network in the Presence of Background Noise

Jie Ou²¹,
Hongqing Liu²¹,
Yi Zhou²¹ &
…
Lu Gan²²

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 433))

Included in the following conference series:

International Conference on Communications and Networking in China

1192 Accesses

Abstract

With the advent of deep learning, research on noise-robust sound event detection (SED) has progressed rapidly. However, SED performance in noisy conditions of single-channel systems remains unsatisfactory. Recently, there were several speech enhancement (SE) methods for the SED front-end to reduce the noise effect, which are completely two models that handle two tasks separately. In this work, we introduced a network trained by a two-stage method to simultaneously perform signal denoising and SED, where denoising and SED are conducted sequentially using neural network method. In addition, we designed a new objective function that takes into account the Euclidean distance between the output of the denoising block and the corresponding clean audio amplitude spectrum, which can better limit the distortion of the output features. The two-stage model is then jointly trained to optimize the proposed objective function. The results show that the proposed network presents a better performance compared with single-stage network without noise suppression. Compared with other recent state-of-the-art networks in the SED field, the performance of the proposed network model is competitive, especially in noisy environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Low-complexity artificial noise suppression methods for deep learning-based speech enhancement algorithms

Article Open access 12 April 2021

A hybrid speech enhancement system with DNN based speech reconstruction and Kalman filtering

Article 29 August 2020

Speech enhancement by LSTM-based noise suppression followed by CNN-based speech restoration

Article Open access 10 December 2020

References

Foggia, P., Petkov, N., Saggese, A., Strisciuglio, N., Vento, M.: Audio surveillance of roads: A system for detecting anomalous sounds. IEEE Trans. Intell. Transp. Syst. 17(1), 279–288 (2015)
Article Google Scholar
Phuong, N.C., Do Dat, T.: Sound classification for event detection: application into medical telemonitoring. In: International Conference on Computing (2013)
Google Scholar
Clavel, C., Ehrette, T., Richard, G.: Events detection for an audio-based surveillance system. In: ICME, pp. 1306–1309 (2005)
Google Scholar
Baumann, J., Lohrenz, T., Roy, A., Fingscheidt, T.: Beyond the dcase 2017 challenge on rare sound event detection: a proposal for a more realistic training and test framework. In: ICASSP, pp. 611–615 (2020)
Google Scholar
Wang, W., Kao, C.-C., Wang, C.: A simple model for detection of rare sound events. Interspeech (2018)
Google Scholar
Shimada, K., Koyama, Y., Inoue, A.: Metric learning with background noise class for few-shot detection of rare sound events. In: ICASSP, pp. 616–620 (2019)
Google Scholar
Lim, H., Park, J., Han, Y.: Rare sound event detection using 1D convolutional recurrent neural networks. Technical report, DCASE2017 Challenge, September 2017
Google Scholar
Kao, C.-C., Wang, W., Sun, M., Wang, C.: R-CRNN: region-based convolutional recurrent neural network for audio event detection. Interspeech, pp. 1358–1362 (2018)
Google Scholar
Zhang, K., Cai, Y., Ren, Y., Ye, R., He, L.: MTF-CRNN: multiscale time-frequency convolutional recurrent neural network for sound event detection. IEEE Access (99), 1 (2020)
Google Scholar
Shen, Y.-H., He, K.-X., Zhang, W.-Q.: Learning how to listen: a temporal-frequential attention model for sound event detection. arXiv: Sound, pp. 2563–2567 (2019)
Keisuke, K., Ochiai, T., Delcroix, M., Nakatani, T.: Improving noise robust automatic speech recognition with single-channel time-domain enhancement network. In: ICASSP, pp. 7009–7013 (2020)
Google Scholar
Kolbæk, M.: Single-microphone speech enhancement and separation using deep learning. arXiv: Sound (2018)
Heymann, J., Drude, L., Böddeker, C., Hanebrink, P., Haeb-Umbach, R.: Beamnet: end-to-end training of a beamformer-supported multi-channel ASR system. In: ICASSP, pp. 5325–5329 (2017)
Google Scholar
Feng, Q., Zhou, Z.: Robust sound event detection through noise estimation and source separation using NMF. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop(DCASE2017) (2017)
Google Scholar
Zhou, Q., Feng, Z., Benetos, E.: Adaptive noise reduction for sound event detection using subband-weighted NMF. Sensors (Basel, Switzerland) (2019)
Google Scholar
Wan, T., Zhou, Y., Ma, Y., Liu, H.: Noise robust sound event detection using deep learning and audio enhancement. In: ISSPIT, pp. 1–5 (2019)
Google Scholar
Zhao, Y., Wang, Z.Q., Wang, D.L.: Two-stage deep learning for noisy-reverberant speech enhancement. IEEE/ACM Trans. Audio, Speech, Lang. Proc. 27, 53–62 (2018)
Google Scholar
Tan, K., Wang, D.L.: A convolutional recurrent neural network for real-time speech enhancement. Interspeech, pp. 3229–3233 (2018)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). Computer ence (2015)
Google Scholar
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: AISTATS, pp. 315–323 (2011)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (2015)
Google Scholar
Srivastava, N., Hinton, E.G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
MathSciNet MATH Google Scholar
Mesaros, A., Heittola, T., Virtanen, T.: Tut database for acoustic scene classification and sound event detection. In: EUSIPCO, pp. 1128–1132 (2016)
Google Scholar
Mesaros, A., et al.: Dcase 2017 challenge setup: tasks, datasets and baseline system (2017)
Google Scholar
Kingma, P.D., Ba, L.J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)
Google Scholar
Mesaros, A., Heittola, T., Virtanen, T.: Metrics for polyphonic sound event detection. Appl. Sci. (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing, China
Jie Ou, Hongqing Liu & Yi Zhou
College of Engineering, Design and Physical Science, Brunel University, London, UB8 3PH, UK
Lu Gan

Authors

Jie Ou
View author publications
You can also search for this author in PubMed Google Scholar
Hongqing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yi Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Lu Gan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jie Ou .

Editor information

Editors and Affiliations

Shanghai University, Shanghai, China
Honghao Gao
Fudan University, Shanghai, China
Jun Wun
Zhejiang University, Hangzhou, China
Jianwei Yin
Tsinghua University, Beijing, China
Feifei Shen
Xidian University, X'ian, China
Yulong Shen
Hangzhou Dianzi University, Hangzhou, China
Jun Yu

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 43308 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ou, J., Liu, H., Zhou, Y., Gan, L. (2022). Robust Sound Event Detection by a Two-Stage Network in the Presence of Background Noise. In: Gao, H., Wun, J., Yin, J., Shen, F., Shen, Y., Yu, J. (eds) Communications and Networking. ChinaCom 2021. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 433. Springer, Cham. https://doi.org/10.1007/978-3-030-99200-2_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-99200-2_34
Published: 05 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99199-9
Online ISBN: 978-3-030-99200-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Robust Sound Event Detection by a Two-Stage Network in the Presence of Background Noise

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Low-complexity artificial noise suppression methods for deep learning-based speech enhancement algorithms

A hybrid speech enhancement system with DNN based speech reconstruction and Kalman filtering

Speech enhancement by LSTM-based noise suppression followed by CNN-based speech restoration

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Robust Sound Event Detection by a Two-Stage Network in the Presence of Background Noise

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Low-complexity artificial noise suppression methods for deep learning-based speech enhancement algorithms

A hybrid speech enhancement system with DNN based speech reconstruction and Kalman filtering

Speech enhancement by LSTM-based noise suppression followed by CNN-based speech restoration

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation