Robust technique for environmental sound classification using convolutional recurrent neural network

408 Accesses
1 Citation
Explore all metrics

Abstract

Environmental sound classification(ESC) is one of the challenging issues in the area of audio recognition. Classifying the sounds such as glass breaking, gunshots, dog barking can aid in a variety of applications like audio surveillance, smart homes, robotic navigation, and other crime investigation systems. Environmental sounds are more complicated as compared to speech and music due to their unstructured nature. Environ- mental sounds are classified using machine learning algorithms such as Support Vector Machines, K-Nearest Neighbour, and various deep learn- ing algorithms such as Convolutional Neural Networks, Recurrent Neural Networks, Long Short Term Memory Neural Networks. In this paper, a robust approach is proposed for ESC on the UrbanSound8k dataset. Cep- stral features Mel Frequency Cepstral Coefficients are extracted from the audios and fed to the Convolutional Recurrent Neural Network (CRNN). The effect of varying various hyperparameters such as the number of layers, batch size, and filter count are analyzed to determine the ade- quate combination of parameters that can give considerable accuracy for ESC. The proposed approach is compared to the baseline models and it attains the best accuracy of 93.58% which is higher than the previ- ous research results of ESC. The proposed model gives the best results using one LSTM layer, 0.3 momentum value, 512 number of filters, 512 neurons in LSTM layer and 256 batch size. The proposed framework can be ameliorated by using novel neural networks and combinations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Rethinking environmental sound classification using convolutional neural networks: optimized parameter tuning of single feature extraction

Article 26 May 2021

Sound Classification Using Residual Convolutional Network

Convolutional Neural Network-Gated Recurrent Unit Neural Network with Feature Fusion for Environmental Sound Classification

Article 01 July 2021

Data Availability

Data is available online at link “https://urbansounddataset.weebly.com/urbansound8k.html”

References

Raponi S, Oligeri G, Ali IM (2022) Sound of guns: digital forensics of gun audio samples meets artificial intelligence. Multimed Tools Appl 81(21):30387–30412
Google Scholar
Mnasri Z, Rovetta S, Masulli F (2022) Anomalous sound event detection: A survey of machine learning based methods and applications. In: Multimedia Tools and Applications, pp 1–50
Google Scholar
Fan X, Sun T, Chen W, Fan Q (2020) Deep neural network based envi- ronment sound classification and its implementation on hearing aid app. Measurement 159:107790
Google Scholar
Singh J, Joshi R (2019) Background sound classification in speech audio segments. In: 2019 International Conference on Speech Technology and Human-Computer Dialogue (SpeD). IEEE, pp 1–6
Google Scholar
Chandrakala S, Jayalakshmi S (2019) Environmental audio scene and sound event recognition for autonomous surveillance: A survey and comparative studies. ACM Computing Surveys (CSUR) 52(3):1–34
Google Scholar
Pal D, Triyason T, Funikul S (2017) Smart homes and quality of life for the elderly: a systematic review. In: 2017 IEEE international symposium on multimedia (ISM). IEEE, pp 413–419
Google Scholar
Arslan Y, Tanıs A, Canbolat H (2017) A relational database model and tools for environmental sound recognition. ASTES J 2(6):145–150
Google Scholar
Al-Hattab YA, Zaki HF, Shafie AA (2021) Rethinking environmental sound classification using convolutional neural networks: optimized parameter tuning of single feature extraction. Neural Comput Appl 33(21):14495–14506
Google Scholar
Siderius M, Gebbie J (2021) Signal processing ocean ambient sound for environmental awareness. J Acoust Soc Am 150(4):A314–A314
Google Scholar
Browning E, Gibb R, Glover-Kapfer P, Jones, KE (2017) Passive acoustic monitoring in ecology and conservation
Google Scholar
Kuücuüktopcu O, Masazade E, Ünsalan C, Varshney PK (2019) A real-time bird sound recognition system using a low-cost microcontroller. Appl Acoust 148:194–201
Google Scholar
Brodie S, Allen-Ankins S, Towsey M, Roe P, Schwarzkopf L (2020) Auto- mated species identification of frog choruses in environmental recordings using acoustic indices. Ecol Ind 119:106852
Google Scholar
Mac Aodha O, Gibb R, Barlow KE, Browning E, Firman M, Freeman R, Harder B, Kinsey L, Mead GR, Newson SE et al (2018) Bat detective? Deep learning tools for bat acoustic signal detection. PLoS Comput Biol 14(3):1005995
Google Scholar
Chen Y, Guo Q, Liang X, Wang J, Qian Y (2019) Environmental sound classification with dilated convolutions. Appl Acoust 148:123–132
Google Scholar
Ullo SL, Khare SK, Bajaj V, Sinha G (2020) Hybrid computerized method for environmental sound classification. IEEE Access 8:124055–124065
Google Scholar
Salamon J, Bello JP (2015) Unsupervised feature learning for urban sound classification. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 171–175
Google Scholar
Piczak KJ (2015) ESC: Dataset for environmental sound classification. In: Proceedings of the 23rd ACM international conference on Multimedia, pp 1015–1018
Google Scholar
Salamon J, Jacoby C, Bello JP (2014) A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM international conference on Multimedia, pp 1041–1044
Google Scholar
Font F, Roma G, Serra X (2013) Freesound technical demo. In: Proceedings of the 21st ACM international conference on Multimedia, pp 411–412
Google Scholar
Ntalampiras S, Potamitis I, Fakotakis N (2010) Automatic recognition of urban environmental sounds events.
Van Der Maaten L, Postma E, Van den Herik J (2009) Dimensionality reduction: a comparative. J Mach Learn Res 10(66–71):13
Google Scholar
Zhang H, McLoughlin I, Song Y (2015) Robust sound event recognition using convolutional neural networks. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 559–563
Google Scholar
Wang J-C, Lee H-P, Wang J-F, Lin C-B (2008) Robust environmental sound recognition for home automation. IEEE Trans Autom Sci Eng 5(1):25–31
Google Scholar
Chu S, Narayanan S, Kuo CCJ, Mataric MJ (2006) Where am I? Scene recognition for mobile robots using audio features. In: 2006 IEEE International conference on multimedia and expo. IEEE, pp 885–888
Google Scholar
Valero X, Alías F (2012) Classification of audio scenes using narrow-band autocorrelation features. In: 2012 Proceedings of the 20th European signal processing conference (EUSIPCO). IEEE
Google Scholar
Tak RN, Agrawal DM, Patil HA (2017) Novel phase encoded Mel filterbank energies for environmental sound classification. In: International Conference on Pattern Recognition and Machine Intelligence. Springer International Publishing, Cham, pp 317–325
Google Scholar
Karbasi M, Ahadi SM, Bahmanian M (2011) Environmental sound classification using spectral dynamic features. In: 2011 8th International Conference on Information, Communications & Signal Processing. IEEE, pp 1–5
Google Scholar
Gencoglu O, Virtanen T, Huttunen H (2014) Recognition of acoustic events using deep neural networks. In: 2014 22nd European signal processing conference (EUSIPCO). IEEE, pp 506–510
Google Scholar
Wang JC, Wang JF, He KW, Hsu CS (2006) Environmental sound classification using hybrid SVM/KNN classifier and MPEG-7 audio low-level descriptor. In: The 2006 IEEE international joint conference on neural network proceedings. IEEE, pp 1731–1735
Google Scholar
Sigtia S, Stark AM, Krstulović S, Plumbley MD (2016) Automatic envi- ronmental sound recognition: Performance versus computational cost. IEEE/ACM Trans Audio, Speech, Language Process 24(11):2096–2107
Google Scholar
Sharan RV, Moir TJ (2019) Acoustic event recognition using cochleagram image and convolutional neural networks. Appl Acoust 148:62–66
Google Scholar
Boddapati V, Petef A, Rasmusson J, Lundberg L (2017) Classifying envi- ronmental sounds using image recognition networks. Procedia Comput Sci 112:2048–2056
Google Scholar
Theodorou T, Mporas I, Fakotakis N (2015) Automatic sound recognition of urban environment events. In: Speech and Computer: 17th International Conference, SPECOM 2015, Athens, Greece, September 20-24, 2015, Proceedings 17. Springer International Publishing, pp 129–136
Google Scholar
Zhang X, Zou Y, Shi W (2017) Dilated convolution neural network with LeakyReLU for environmental sound classification. In: 2017 22nd international conference on digital signal processing (DSP). IEEE, pp 1–5
Google Scholar
Barchiesi D, Giannoulis D, Stowell D, Plumbley MD (2015) Acoustic scene classification: Classifying environments from the sounds they produce. IEEE Signal Process Mag 32(3):16–34
Google Scholar
Muhammad G, Alotaibi YA, Alsulaiman M, Huda MN (2010) Environment recognition using selected MPEG-7 audio features and mel-frequency cepstral coefficients. In: 2010 Fifth international conference on digital telecommunications. IEEE, pp 11–16
Google Scholar
Bountourakis V, Vrysis L, Papanikolaou G (2015) Machine learning algorithms for environmental sound recognition: Towards soundscape semantics. In: Proceedings of the audio mostly 2015 on interaction with sound, pp 1–7
Google Scholar
Mushtaq Z, Su S-F (2020) Environmental sound classification using a regular- ized deep convolutional neural network with data augmentation. Appl Acoust 167:107389
Google Scholar
Sang J, Park S, Lee J (2018) Convolutional recurrent neural networks for urban sound classification using raw waveforms. In: 2018 26th European Signal Processing Conference (EUSIPCO). IEEE, pp 2444–2448
Google Scholar
Khamparia A, Gupta D, Nguyen NG, Khanna A, Pandey B, Tiwari P (2019) Sound classification using convolutional neural network and tensor deep stacking network. IEEE Access 7:7717–7727
Google Scholar
Yao K, Yang J, Zhang X, Zheng C, Zeng X (2019) Robust deep feature extraction method for acoustic scene classification. In: 2019 IEEE 19th International Conference on Communication Technology (ICCT). IEEE, pp 198–202
Google Scholar
Piczak KJ (2015) Environmental sound classification with convolutional neural networks. In: 2015 IEEE 25th international workshop on machine learning for signal processing (MLSP). IEEE, pp 1–6
Google Scholar
Zhang Z, Xu S, Zhang S, Qiao T, Cao S (2021) Attention based convo- lutional recurrent neural network for environmental sound classification. Neurocomputing 453:896–903
Google Scholar
Su F, Yang L, Lu T, Wang G (2011) Environmental sound classification for scene recognition using local discriminant bases and HMM. In: Proceedings of the 19th ACM international conference on Multimedia, pp 1389–1392
Google Scholar
Uzkent B, Barkana BD, Cevikalp H (2012) Non-speech environmental sound classification using svms with a new set of features. Int J Innov Comput, Inf Control 8(5):3511–3524
Google Scholar
Zhan Y, Kuroda T (2014) Wearable sensor-based human activity recognition from environmental background sounds. J Ambient Intell Humaniz Comput 5(1):77–89
Google Scholar
Salamon J, Bello JP (2017) Deep convolutional neural networks and data aug- mentation for environmental sound classification. IEEE Signal Process Lett 24(3):279–283
Google Scholar
Mendoza JM, Tan V, Fuentes V, Perez G, Tiglao NM (2019) Audio event detection using wireless sensor networks based on deep learning. In: Wireless Internet: 11th EAI International Conference, WiCON 2018, Taipei, Taiwan, October 15-16, 2018, Proceedings 11. Springer International Publishing, pp 105–115
Google Scholar
Chi Z, Li Y, Chen C (2019) Deep convolutional neural network combined with concatenated spectrogram for environmental sound classification. In: 2019 IEEE 7th International Conference on Computer Science and Network Technology (ICCSNT). IEEE, pp 251–254
Google Scholar
Lezhenin I, Bogach N, Pyshkin E (2019) Urban sound classification using long short-term memory neural network. In: 2019 federated conference on computer science and information systems (FedCSIS). IEEE, pp 57–60
Google Scholar
Ahmed MR, Robin TI, Shafin AA (2020) Automatic Environmental Sound Recognition (AESR) using convolutional neural network. Int J Mod Educ Comput Sci 12(5)
Ïnik Ö (2023) Cnn hyper-parameter optimization for environmental sound classification. Appl Acoustics 202:109168
Google Scholar
Madhu A, Suresh K (2023) RQNet: Residual quaternion CNN for performance enhancement in low complexity and device robust acoustic scene classification. IEEE Trans Multimedia
Demir F, Abdullah DA, Sengur A (2020) A new deep cnn model for environmental sound classification. IEEE Access 8:66529–66537
Google Scholar
Olvera M, Vincent E, Serizel R, Gasso G (2021) Foreground-background ambient sound scene separation. In: 2020 28th European Signal Processing Conference (EUSIPCO). IEEE, pp 281–285
Google Scholar
Owens A, Wu J, McDermott JH, Freeman WT, Torralba A (2016) Ambient sound provides supervision for visual learning. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, pp 801–816
Google Scholar
Shen J, Nie L, Chua TS (2016) Smart ambient sound analysis via structured statistical modeling. In: MultiMedia Modeling: 22nd International Conference, MMM 2016, Miami, FL, USA, January 4-6, 2016, Proceedings, Part II 22. Springer International Publishing, pp 231–243
Google Scholar
Dai W, Dai C, Qu S, Li J, Das S (2017) Very deep convolutional neural networks for raw waveforms. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 421–425
Google Scholar
Li S, Yao Y, Hu J, Liu G, Yao X, Hu J (2018) An ensemble stacked convo- lutional neural network model for environmental event sound recognition. Appl Sci 8(7):1152
Google Scholar
Zhang Z, Xu S, Cao S, Zhang S (2018) Deep convolutional neural network with mixup for environmental sound classification. In: Chinese conference on pattern recognition and computer vision (prcv). Springer International Publishing, Cham, pp 356–367
Google Scholar
Da Silva Gomez B, Happi W, Braeken A, Touhafi A (2019) Evaluation of classical machine learning techniques towards urban sound recognition on embedded systems. Applied Sciences 9(18):1–27. https://doi.org/10.3390/app9183885
Article Google Scholar
Mu W, Yin B, Huang X, Xu J, Du Z (2021) Environmental sound clas- sification using temporal-frequency attention based convolutional neural network. Sci Rep 11(1):1–14
Google Scholar
Bubashait M, Hewahi N (2021) Urban sound classification using DNN, CNN & LSTM a comparative approach. In: 2021 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT). IEEE, pp 46–50
Google Scholar
Mohaimenuzzaman M, Bergmeir C, West I, Meyer B (2023) Environmental sound classification on the edge: A pipeline for deep acoustic networks on extremely resource-constrained devices. Pattern Recogn 133:109025
Google Scholar

Download references

Funding

None.

Author information

Authors and Affiliations

Computer Science and Engineering, GZS Campus College of Engineering and Technology, Maharaja Ranjit Singh Punjab Technical University, Bathinda, 151001, Punjab, India
Anam Bansal & Naresh Kumar Garg

Authors

Anam Bansal
View author publications
You can also search for this author in PubMed Google Scholar
Naresh Kumar Garg
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Anam Bansal: Conceptualization, Methodology, Software, Validation, Data curation, Visualization, Investigation, Writing- Reviewing and Editing.

Naresh Kumar Garg: Supervision, Quality Check, Reviewing, and Editing.

Corresponding author

Correspondence to Anam Bansal.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest in this work.

Corpus

The publicly available standard dataset UrbanSound8K is used.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bansal, A., Garg, N.K. Robust technique for environmental sound classification using convolutional recurrent neural network. Multimed Tools Appl 83, 54755–54772 (2024). https://doi.org/10.1007/s11042-023-17066-2

Download citation

Received: 01 March 2023
Revised: 18 August 2023
Accepted: 14 September 2023
Published: 07 December 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s11042-023-17066-2

Robust technique for environmental sound classification using convolutional recurrent neural network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Rethinking environmental sound classification using convolutional neural networks: optimized parameter tuning of single feature extraction

Sound Classification Using Residual Convolutional Network

Convolutional Neural Network-Gated Recurrent Unit Neural Network with Feature Fusion for Environmental Sound Classification

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Corpus

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Robust technique for environmental sound classification using convolutional recurrent neural network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Rethinking environmental sound classification using convolutional neural networks: optimized parameter tuning of single feature extraction

Sound Classification Using Residual Convolutional Network

Convolutional Neural Network-Gated Recurrent Unit Neural Network with Feature Fusion for Environmental Sound Classification

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Corpus

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation