[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Robust Sound Event Detection by a Two-Stage Network in the Presence of Background Noise

  • Conference paper
  • First Online:
Communications and Networking (ChinaCom 2021)

Abstract

With the advent of deep learning, research on noise-robust sound event detection (SED) has progressed rapidly. However, SED performance in noisy conditions of single-channel systems remains unsatisfactory. Recently, there were several speech enhancement (SE) methods for the SED front-end to reduce the noise effect, which are completely two models that handle two tasks separately. In this work, we introduced a network trained by a two-stage method to simultaneously perform signal denoising and SED, where denoising and SED are conducted sequentially using neural network method. In addition, we designed a new objective function that takes into account the Euclidean distance between the output of the denoising block and the corresponding clean audio amplitude spectrum, which can better limit the distortion of the output features. The two-stage model is then jointly trained to optimize the proposed objective function. The results show that the proposed network presents a better performance compared with single-stage network without noise suppression. Compared with other recent state-of-the-art networks in the SED field, the performance of the proposed network model is competitive, especially in noisy environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 71.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 89.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Foggia, P., Petkov, N., Saggese, A., Strisciuglio, N., Vento, M.: Audio surveillance of roads: A system for detecting anomalous sounds. IEEE Trans. Intell. Transp. Syst. 17(1), 279–288 (2015)

    Article  Google Scholar 

  2. Phuong, N.C., Do Dat, T.: Sound classification for event detection: application into medical telemonitoring. In: International Conference on Computing (2013)

    Google Scholar 

  3. Clavel, C., Ehrette, T., Richard, G.: Events detection for an audio-based surveillance system. In: ICME, pp. 1306–1309 (2005)

    Google Scholar 

  4. Baumann, J., Lohrenz, T., Roy, A., Fingscheidt, T.: Beyond the dcase 2017 challenge on rare sound event detection: a proposal for a more realistic training and test framework. In: ICASSP, pp. 611–615 (2020)

    Google Scholar 

  5. Wang, W., Kao, C.-C., Wang, C.: A simple model for detection of rare sound events. Interspeech (2018)

    Google Scholar 

  6. Shimada, K., Koyama, Y., Inoue, A.: Metric learning with background noise class for few-shot detection of rare sound events. In: ICASSP, pp. 616–620 (2019)

    Google Scholar 

  7. Lim, H., Park, J., Han, Y.: Rare sound event detection using 1D convolutional recurrent neural networks. Technical report, DCASE2017 Challenge, September 2017

    Google Scholar 

  8. Kao, C.-C., Wang, W., Sun, M., Wang, C.: R-CRNN: region-based convolutional recurrent neural network for audio event detection. Interspeech, pp. 1358–1362 (2018)

    Google Scholar 

  9. Zhang, K., Cai, Y., Ren, Y., Ye, R., He, L.: MTF-CRNN: multiscale time-frequency convolutional recurrent neural network for sound event detection. IEEE Access (99), 1 (2020)

    Google Scholar 

  10. Shen, Y.-H., He, K.-X., Zhang, W.-Q.: Learning how to listen: a temporal-frequential attention model for sound event detection. arXiv: Sound, pp. 2563–2567 (2019)

  11. Keisuke, K., Ochiai, T., Delcroix, M., Nakatani, T.: Improving noise robust automatic speech recognition with single-channel time-domain enhancement network. In: ICASSP, pp. 7009–7013 (2020)

    Google Scholar 

  12. Kolbæk, M.: Single-microphone speech enhancement and separation using deep learning. arXiv: Sound (2018)

  13. Heymann, J., Drude, L., Böddeker, C., Hanebrink, P., Haeb-Umbach, R.: Beamnet: end-to-end training of a beamformer-supported multi-channel ASR system. In: ICASSP, pp. 5325–5329 (2017)

    Google Scholar 

  14. Feng, Q., Zhou, Z.: Robust sound event detection through noise estimation and source separation using NMF. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop(DCASE2017) (2017)

    Google Scholar 

  15. Zhou, Q., Feng, Z., Benetos, E.: Adaptive noise reduction for sound event detection using subband-weighted NMF. Sensors (Basel, Switzerland) (2019)

    Google Scholar 

  16. Wan, T., Zhou, Y., Ma, Y., Liu, H.: Noise robust sound event detection using deep learning and audio enhancement. In: ISSPIT, pp. 1–5 (2019)

    Google Scholar 

  17. Zhao, Y., Wang, Z.Q., Wang, D.L.: Two-stage deep learning for noisy-reverberant speech enhancement. IEEE/ACM Trans. Audio, Speech, Lang. Proc. 27, 53–62 (2018)

    Google Scholar 

  18. Tan, K., Wang, D.L.: A convolutional recurrent neural network for real-time speech enhancement. Interspeech, pp. 3229–3233 (2018)

    Google Scholar 

  19. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  20. Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). Computer ence (2015)

    Google Scholar 

  21. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: AISTATS, pp. 315–323 (2011)

    Google Scholar 

  22. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (2015)

    Google Scholar 

  23. Srivastava, N., Hinton, E.G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  24. Mesaros, A., Heittola, T., Virtanen, T.: Tut database for acoustic scene classification and sound event detection. In: EUSIPCO, pp. 1128–1132 (2016)

    Google Scholar 

  25. Mesaros, A., et al.: Dcase 2017 challenge setup: tasks, datasets and baseline system (2017)

    Google Scholar 

  26. Kingma, P.D., Ba, L.J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)

    Google Scholar 

  27. Mesaros, A., Heittola, T., Virtanen, T.: Metrics for polyphonic sound event detection. Appl. Sci. (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Ou .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 43308 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ou, J., Liu, H., Zhou, Y., Gan, L. (2022). Robust Sound Event Detection by a Two-Stage Network in the Presence of Background Noise. In: Gao, H., Wun, J., Yin, J., Shen, F., Shen, Y., Yu, J. (eds) Communications and Networking. ChinaCom 2021. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 433. Springer, Cham. https://doi.org/10.1007/978-3-030-99200-2_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-99200-2_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-99199-9

  • Online ISBN: 978-3-030-99200-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics