Abstract
Despite consistent advancement in powerful deep learning techniques in recent years, large amounts of training data are still necessary for the models to avoid overfitting. Synthetic datasets using generative adversarial networks (GAN) have recently been generated to overcome this problem. Nevertheless, despite advancements, GAN-based methods are usually hard to train or fail to generate high-quality data samples. In this paper, we propose an environmental sound classification (ESC) augmentation technique based on the diffusion probabilistic model (DPM) with DPM-Solver ++ for fast sampling. In addition, to ensure the quality of the generated spectrograms, we propose a top-k selection technique to filter out the low-quality synthetic data samples. According to the experiment results, the synthetic data samples have similar features to the original dataset and can significantly increase the classification accuracy of different state-of-the-art models compared with traditional data augmentation techniques. The public code is available on https://github.com/JNAIC/DPMs-for-Audio-Data-Augmentation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ho, J., et al.: Denoising diffusion probabilistic models. Adv. Neural. Inf. Process. Syst. 33, 6840–6851 (2020)
Salamon, J., Bello, J.P.: Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Sig. Process. Lett. 24, 279–283 (2016)
Gong, Y., et al.: AST: Audio Spectrogram Transformer. ArXiv abs/2104.01778 (2021)
Bahmei, B., et al.: CNN-RNN and data augmentation using deep convolutional generative adversarial network for environmental sound classification. IEEE Sign. Process. Lett. 29, 682–686 (2022)
Hershey, S., et al.: CNN architectures for large-scale audio classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 131–135 (2016)
Zhu, X., et al.: Emotion classification with data augmentation using generative adversarial networks. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 349–360 (2018)
Arjovsky, M., et al.: Wasserstein GAN. ArXiv abs/1701.07875 (2017)
Zhao, H., et al.: Bias and generalization in deep generative models: an empirical study. Neural Inf. Process. Syst. 13 (2018)
Ho, J., et al.: Denoising Diffusion Probabilistic Models. ArXiv abs/2006.11239 (2020)
Dhariwal, P., Nichol, A.: Diffusion Models Beat GANs on Image Synthesis. ArXiv abs/2105.05233 (2021)
Müller-Franzes, G., et al.: Diffusion Probabilistic Models beat GANs on Medical Images. ArXiv abs/2212.07501 (2022)
Maz’e, F., Ahmed, F.: Diffusion Models Beat GANs on Topology Optimization (2022)
Song, J., et al.: Denoising Diffusion Implicit Models. ArXiv abs/2010.02502 (2020)
Cheng, L., et al.: DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models. ArXiv abs/2211.01095 (2022)
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
Dickstein, S., Narain, J., et al.: Deep Unsupervised Learning using Nonequilibrium Thermodynamics. ArXiv abs/1503.03585 (2015)
Saharia, C., et al.: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. ArXiv abs/2205.11487 (2022)
Font, F., et al.: Freesound technical demo. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 411–412 (2013)
He, K., et al.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2015)
lucidrains.2023.denoising-diffusion-pytorch (2023). https://github.com/lucidrains/denoising-diffusion-pytorch
Ronneberger, O., et al.: U-Net: Convolutional Networks for Biomedical Image Segmentation. ArXiv abs/1505.04597 (2015)
Iwana, B.K., Uchida, S.: An empirical survey of data augmentation for time series classification with neural networks. Plos One 16 (2020)
rw2019timm, Ross Wightman, PyTorch Image Models (2019). https://github.com/rwightman/pytorch-image-models
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2017)
Kingma, D.P., Welling, M.: Auto-Encoding Variational Bayes. ArXiv. /abs/1312.6114 (2013). Accessed 22 March 2023
Ho, J.: Classifier-Free Diffusion Guidance. ArXiv abs/2207.12598 (2022)
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251-1258 (2016)
d’Ascoli, S., et al.: ConViT: improving vision transformers with soft convolutional inductive biases. J. Statist. Mech. Theory Experiment 2022 (2021)
Mehta, S., Rastegari, M.: MobileViT: Light-weight, General purpose, and Mobile-friendly Vision Transformer. ArXiv abs/2110.02178 (2021)
Liu, Z., et al.: A ConvNet for the 2020s. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11966–11976 1800–1807 (2022)
Touvron, H., et al.: DeiT III: Revenge of the ViT. ArXiv abs/2204.07118 (2022)
.von-platen-etal-2022-diffusers, Patrick von Platen et al. 2022, Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers
Chen, Y., et al.: Effective audio classification network based on paired inverse pyramid structure and dense MLP Block. ArXiv abs/2211.02940 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chen, Y., Yan, Z., Zhu, Y., Ren, Z., Shen, J., Huang, Y. (2023). Data Augmentation for Environmental Sound Classification Using Diffusion Probabilistic Model with Top-K Selection Discriminator. In: Huang, DS., Premaratne, P., Jin, B., Qu, B., Jo, KH., Hussain, A. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2023. Lecture Notes in Computer Science, vol 14087. Springer, Singapore. https://doi.org/10.1007/978-981-99-4742-3_23
Download citation
DOI: https://doi.org/10.1007/978-981-99-4742-3_23
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4741-6
Online ISBN: 978-981-99-4742-3
eBook Packages: Computer ScienceComputer Science (R0)