Data Augmentation for Environmental Sound Classification Using Diffusion Probabilistic Model with Top-K Selection Discriminator

Yunhao Chen ORCID: orcid.org/0000-0002-8134-2314¹³,
Zihui Yan¹³,
Yunjie Zhu¹⁴,
Zhen Ren¹³,
Jianlu Shen¹³ &
…
Yifan Huang¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14087))

Included in the following conference series:

International Conference on Intelligent Computing

1340 Accesses
1 Citations

Abstract

Despite consistent advancement in powerful deep learning techniques in recent years, large amounts of training data are still necessary for the models to avoid overfitting. Synthetic datasets using generative adversarial networks (GAN) have recently been generated to overcome this problem. Nevertheless, despite advancements, GAN-based methods are usually hard to train or fail to generate high-quality data samples. In this paper, we propose an environmental sound classification (ESC) augmentation technique based on the diffusion probabilistic model (DPM) with DPM-Solver ++ for fast sampling. In addition, to ensure the quality of the generated spectrograms, we propose a top-k selection technique to filter out the low-quality synthetic data samples. According to the experiment results, the synthetic data samples have similar features to the original dataset and can significantly increase the classification accuracy of different state-of-the-art models compared with traditional data augmentation techniques. The public code is available on https://github.com/JNAIC/DPMs-for-Audio-Data-Augmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 87.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 109.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ho, J., et al.: Denoising diffusion probabilistic models. Adv. Neural. Inf. Process. Syst. 33, 6840–6851 (2020)
Google Scholar
Salamon, J., Bello, J.P.: Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Sig. Process. Lett. 24, 279–283 (2016)
Article Google Scholar
Gong, Y., et al.: AST: Audio Spectrogram Transformer. ArXiv abs/2104.01778 (2021)
Google Scholar
Bahmei, B., et al.: CNN-RNN and data augmentation using deep convolutional generative adversarial network for environmental sound classification. IEEE Sign. Process. Lett. 29, 682–686 (2022)
Google Scholar
Hershey, S., et al.: CNN architectures for large-scale audio classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 131–135 (2016)
Google Scholar
Zhu, X., et al.: Emotion classification with data augmentation using generative adversarial networks. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 349–360 (2018)
Google Scholar
Arjovsky, M., et al.: Wasserstein GAN. ArXiv abs/1701.07875 (2017)
Google Scholar
Zhao, H., et al.: Bias and generalization in deep generative models: an empirical study. Neural Inf. Process. Syst. 13 (2018)
Google Scholar
Ho, J., et al.: Denoising Diffusion Probabilistic Models. ArXiv abs/2006.11239 (2020)
Google Scholar
Dhariwal, P., Nichol, A.: Diffusion Models Beat GANs on Image Synthesis. ArXiv abs/2105.05233 (2021)
Google Scholar
Müller-Franzes, G., et al.: Diffusion Probabilistic Models beat GANs on Medical Images. ArXiv abs/2212.07501 (2022)
Google Scholar
Maz’e, F., Ahmed, F.: Diffusion Models Beat GANs on Topology Optimization (2022)
Google Scholar
Song, J., et al.: Denoising Diffusion Implicit Models. ArXiv abs/2010.02502 (2020)
Google Scholar
Cheng, L., et al.: DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models. ArXiv abs/2211.01095 (2022)
Google Scholar
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
Google Scholar
Dickstein, S., Narain, J., et al.: Deep Unsupervised Learning using Nonequilibrium Thermodynamics. ArXiv abs/1503.03585 (2015)
Google Scholar
Saharia, C., et al.: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. ArXiv abs/2205.11487 (2022)
Google Scholar
Font, F., et al.: Freesound technical demo. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 411–412 (2013)
Google Scholar
He, K., et al.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2015)
Google Scholar
lucidrains.2023.denoising-diffusion-pytorch (2023). https://github.com/lucidrains/denoising-diffusion-pytorch
Ronneberger, O., et al.: U-Net: Convolutional Networks for Biomedical Image Segmentation. ArXiv abs/1505.04597 (2015)
Google Scholar
Iwana, B.K., Uchida, S.: An empirical survey of data augmentation for time series classification with neural networks. Plos One 16 (2020)
Google Scholar
rw2019timm, Ross Wightman, PyTorch Image Models (2019). https://github.com/rwightman/pytorch-image-models
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2017)
Google Scholar
Kingma, D.P., Welling, M.: Auto-Encoding Variational Bayes. ArXiv. /abs/1312.6114 (2013). Accessed 22 March 2023
Google Scholar
Ho, J.: Classifier-Free Diffusion Guidance. ArXiv abs/2207.12598 (2022)
Google Scholar
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251-1258 (2016)
Google Scholar
d’Ascoli, S., et al.: ConViT: improving vision transformers with soft convolutional inductive biases. J. Statist. Mech. Theory Experiment 2022 (2021)
Google Scholar
Mehta, S., Rastegari, M.: MobileViT: Light-weight, General purpose, and Mobile-friendly Vision Transformer. ArXiv abs/2110.02178 (2021)
Google Scholar
Liu, Z., et al.: A ConvNet for the 2020s. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11966–11976 1800–1807 (2022)
Google Scholar
Touvron, H., et al.: DeiT III: Revenge of the ViT. ArXiv abs/2204.07118 (2022)
Google Scholar
.von-platen-etal-2022-diffusers, Patrick von Platen et al. 2022, Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers
Chen, Y., et al.: Effective audio classification network based on paired inverse pyramid structure and dense MLP Block. ArXiv abs/2211.02940 (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

Jiangnan University, Wuxi, 214000, China
Yunhao Chen, Zihui Yan, Zhen Ren, Jianlu Shen & Yifan Huang
University of Leeds, Leeds, LS2 9JT, UK
Yunjie Zhu

Authors

Yunhao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zihui Yan
View author publications
You can also search for this author in PubMed Google Scholar
Yunjie Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Ren
View author publications
You can also search for this author in PubMed Google Scholar
Jianlu Shen
View author publications
You can also search for this author in PubMed Google Scholar
Yifan Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yunhao Chen .

Editor information

Editors and Affiliations

Department of Computer Science, Eastern Institute of Technology, Zhejiang, China
De-Shuang Huang
University of Wollongong, North Wollongong, NSW, Australia
Prashan Premaratne
Zhengzhou University of Light Industry, Zhengzhou, China
Baohua Jin
Zhong Yuan University of Technology, Zhengzhou, China
Boyang Qu
University of Ulsan, Ulsan, Korea (Republic of)
Kang-Hyun Jo
Department of Computer Science, Liverpool John Moores University, Liverpool, UK
Abir Hussain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Y., Yan, Z., Zhu, Y., Ren, Z., Shen, J., Huang, Y. (2023). Data Augmentation for Environmental Sound Classification Using Diffusion Probabilistic Model with Top-K Selection Discriminator. In: Huang, DS., Premaratne, P., Jin, B., Qu, B., Jo, KH., Hussain, A. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2023. Lecture Notes in Computer Science, vol 14087. Springer, Singapore. https://doi.org/10.1007/978-981-99-4742-3_23

Download citation

DOI: https://doi.org/10.1007/978-981-99-4742-3_23
Published: 30 July 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4741-6
Online ISBN: 978-981-99-4742-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics