[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

A Genetic Algorithm Approach to Automate Architecture Design for Acoustic Scene Classification

Published: 01 April 2023 Publication History

Abstract

Convolutional neural networks (CNNs) have been widely used with remarkable success in the acoustic scene classification (ASC) task. However, the performance of these CNNs highly relies on their architectures, requiring a lot of effort and expertise to design CNNs suitable for the investigated problem. In this work, we propose an efficient genetic algorithm (GA) that aims to find optimized CNN architectures for the ASC task. The proposed algorithm uses frequency-dimension splitting of the input spectrograms in order to explore the architecture search space in sub-CNN models in addition to classical single-path CNNs. Specifically, this algorithm aims to find the best number of sub-CNNs in addition to their architectures to better capture the distinct features of the input spectrograms. The proposed GA is specifically designed for sound classification to suit the ASC task than many other GAs that optimize conventional single-path CNN architectures. Experimental results on three benchmark datasets demonstrate the effectiveness of the proposed method. Specifically, the proposed algorithm has achieved around 17.8%, 16%, and 17.2%, relative improvement in accuracy with respect to the baseline systems on the development datasets of DCASE2018-Task1A, DCASE2019-Task1A, and DCASE2020-Task1A, respectively.

References

[1]
D. Barchiesi, D. Giannoulis, D. Stowell, and M. D. Plumbley, “Acoustic scene classification: Classifying environments from the sounds they produce,” IEEE Signal Process. Mag., vol. 32, no. 3, pp. 16–34, May 2015.
[2]
N. Sawhney and P. Maes, “Situational awareness from environmental sounds,” MIT Media Lab., Cambridge, MA, USA, Rep. MAS 738, 1997.
[3]
I. Martín-Morató, M. Cobos, and F. J. Ferri, “A case study on feature sensitivity for audio event classification using support vector machines,” in Proc. IEEE 26th Int. Workshop Mach. Learn. Signal Process. (MLSP), 2016, pp. 1–6.
[4]
S. Waldekar and G. Saha, “Analysis and classification of acoustic scenes with wavelet transform-based mel-scaled features,” Multimedia Tools Appl., vol. 79, no. 11, pp. 7911–7926, 2020.
[5]
J. C. Brown, “Calculation of a constant Q spectral transform,” J. Acoust. Soc. America, vol. 89, no. 1, pp. 425–434, 1991.
[6]
S. Abidin, R. Togneri, and F. Sohel, “Spectrotemporal analysis using local binary pattern variants for acoustic scene classification,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 26, no. 11, pp. 2112–2121, Nov. 2018.
[7]
A. Rakotomamonjy and G. Gasso, “Histogram of gradients of time–frequency representations for audio scene classification,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 1, pp. 142–153, Jan. 2015.
[8]
M. Valenti, S. Squartini, A. Diment, G. Parascandolo, and T. Virtanen, “A convolutional neural network approach for acoustic scene classification,” in Proc. IEEE Int. Joint Conf. Neural Netw. (IJCNN), 2017, pp. 1547–1554.
[9]
J. Abeßer, “A review of deep learning based methods for acoustic scene classification,” Appl. Sci., vol. 10, no. 6, p. 2020, 2020.
[10]
H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “Mixup: Beyond empirical risk minimization,” 2017, arXiv:1710.09412.
[11]
M. D. McDonnell and W. Gao, “Acoustic scene classification using deep residual networks with late fusion of separated high and low frequency paths,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), 2020, pp. 141–145.
[12]
H. Huet al., “Device-robust acoustic scene classification based on two-stage categorization and data augmentation,” 2020, arXiv:2007.08389.
[13]
T. Heittola, A. Mesaros, and T. Virtanen, “Acoustic scene classification in DCASE 2020 challenge: Generalization across devices and low complexity solutions,” 2020, arXiv:2005.14623.
[14]
S. Suh, S. Park, Y. Jeong, and T. Lee, “Designing acoustic scene classification models with CNN variants,” in Proc. DCASE Challenge, Jun. 2020, pp. 1–4.
[15]
S. S. R. Phaye, E. Benetos, and Y. Wang, “Subspectralnet—Using sub-spectrogram based convolutional neural networks for acoustic scene classification,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), 2019, pp. 825–829.
[16]
A. Mesaros, T. Heittola, and T. Virtanen, “A multi-device dataset for urban acoustic scene classification,” 2018, arXiv:1807.09840.
[17]
D. Agnelli, A. Bollini, and L. Lombardi, “Image classification: An evolutionary approach,” Pattern Recognit. Lett., vol. 23, nos. 1–3, pp. 303–309, 2002.
[18]
Z. Liu, A. Liu, C. Wang, and Z. Niu, “Evolving neural network using real coded genetic algorithm (GA) for multispectral image classification,” Future Gener. Comput. Syst., vol. 20, no. 7, pp. 1119–1129, 2004.
[19]
Z. Luet al., “Multiobjective evolutionary design of deep convolutional neural networks for image classification,” IEEE Trans. Evol. Comput., vol. 25, no. 2, pp. 277–291, Apr. 2021.
[20]
N. W. Hasan, A. S. Saudi, M. I. Khalil, and H. M. Abbas, “Automatically designing CNN architectures for acoustic scene classification,” in Proc. 16th Int. Conf. Comput. Eng. Syst. (ICCES), 2021, pp. 1–6.
[21]
T. Back, Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms. Oxford, U.K.: Oxford Univ. Press, 1996.
[22]
W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone, Genetic Programming: An Introduction, vol. 1. San Francisco, CA, USA: Morgan Kaufmann, 1998.
[23]
I. Rechenberg, “Evolution strategy: Nature’s way of optimization,” in Optimization: Methods and Applications, Possibilities and Limitations, H. W. Bergmann, Ed. Berlin, Germany: Springer, 1989, pp. 106–126.
[24]
L. J. Fogel, A. J. Owens, and M. J. Walsh, Artificial Intelligence Through Simulated Evolution. Hoboken, NJ, USA: Wiley, 1966.
[25]
D. E. Goldberg and J. H. Holland, “Genetic algorithms and machine learning,” Mach. Learn., vol. 3, nos. 2–3, pp. 95–99, Oct. 1988.
[26]
E. Galvan and P. Mooney, “Neuroevolution in deep neural networks: Current trends and future challenges,” IEEE Trans. Artif. Intell., vol. 2, no. 6, pp. 476–493, Dec. 2021.
[27]
J. H. Holland, et al., Adaptation in Natural and Artificial Systems: An Introductory Analysis With Applications to Biology, Control, and Artificial Intelligence. Cambridge, MA, USA: MIT Press, 1992.
[28]
L. M. Schmitt, “Theory of genetic algorithms,” Theor. Comput. Sci., vol. 259, nos. 1–2, pp. 1–61, 2001.
[29]
Y. Sun, B. Xue, M. Zhang, G. G. Yen, and J. Lv, “Automatically designing CNN architectures using the genetic algorithm for image classification,” IEEE Trans. Cybern., vol. 50, no. 9, pp. 3840–3854, Sep. 2020.
[30]
C. Roletscheck, T. Watzka, A. Seiderer, D. Schiller, and E. André, Using an Evolutionary Approach to Explore Convolutional Neural Networks for Acoustic Scene Classification, Universität Augsburg, Augsburg, Germany, 2018.
[31]
A. E. Eiben and J. E. Smith, Introduction to Evolutionary Computing, vol. 53. Berlin, Germany: Springer, 2003. 10.1007/978-3-662-05094-1.
[32]
J. Li, C. Liang, B. Zhang, Z. Wang, F. Xiang, and X. Chu, “Neural architecture search on acoustic scene classification,” 2019, arXiv:1912.12825.
[33]
K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGA-II,” IEEE Trans. Evol. Comput., vol. 6, no. 2, pp. 182–197, Apr. 2002.
[34]
G. Dekkers, L. Vuegen, T. van Waterschoot, B. Vanrumste, and P. Karsmakers, “DCASE 2018 challenge-task 5: Monitoring of domestic activities based on multi-channel acoustics,” 2018, arXiv:1807.11246.
[35]
Y. Wu and T. Lee, “Searching for efficient network architectures for acoustic scene classification,” in Proc. Detection Classification Acoust. Scenes Events Workshop (DCASE), Tokyo, Japan, Nov. 2020, pp. 220–224.
[36]
T. Heittola, A. Mesaros, and T. Virtanen, “TAU Urban Acoustic Scenes 2020 3Class, Development Dataset,” Zenodo. 10.5281/zenodo.3670185.
[37]
N. W. Hasan, A. S. Saudi, M. I. Khalil, and H. M. Abbas, “E-DARTS: Enhanced differentiable architecture search for acoustic scene classification,” in Proc. 16th Int. Conf. Comput. Eng. Syst. (ICCES), 2021, pp. 1–6.
[38]
H. Liu, K. Simonyan, and Y. Yang, “DARTS: Differentiable architecture search,” 2018, arXiv:1806.09055.
[39]
T. Heittola, A. Mesaros, and T. Virtanen, “TAU Urban Acoustic Scenes 2020 Mobile, Development Dataset,” Zenodo. 10.5281/zenodo.3670167.
[40]
K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 1026–1034.
[41]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Commun. ACM, vol. 60, no. 6, pp. 84–90, 2017.
[42]
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014, arXiv:1409.1556.
[43]
C. Szegedyet al., “Going deeper with convolutions,” in Proc. IEEE Conf. Comput. Vis. pattern Recognit., 2015, pp. 1–9.
[44]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. pattern Recognit., 2016, pp. 770–778.
[45]
S. Ioffe, “Batch renormalization: Towards reducing minibatch dependence in batch-normalized models,” 2017, arXiv:1702.03275.
[46]
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958, 2014.
[47]
R. Housley, “A 224-bit one-way hash function: SHA-224,” IETF, RFC 3874, Sep. 2004.
[48]
H. Iba and N. Noman, Deep Neural Evolution: Deep Learning With Evolutionary Computation. Singapore: Springer, 2020. 10.1007/978-981-15-3685-4.
[49]
B. L. Miller and D. E. Goldberg, “Genetic algorithms, tournament selection, and the effects of noise,” Complex Syst., vol. 9, no. 3, pp. 193–212, 1995.
[50]
T. Heittola, A. Mesaros, and T. Virtanen, “TAU Urban Acoustic Scenes 2019, Development Dataset,” Zenodo. 10.5281/zenodo.2589280.
[51]
B. McFeeet al., “Librosa: Audio and music signal analysis in python,” in Proc. 14th Python Sci. Conf., vol. 8, 2015, pp. 18–25.
[52]
S. Hersheyet al., “CNN architectures for large-scale audio classification,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), 2017, pp. 131–135.
[53]
T. Zhang, J. Liang, and B. Ding, “Acoustic scene classification using deep CNN with fine-resolution feature,” Expert Syst. Appl., vol. 143, Apr. 2020, Art. no.
[54]
S. Younget al., “The HTK book,” Cambridge Univ. Eng. Dept., vol. 3, no. 175, p. 12, 2002.
[55]
I. Loshchilov and F. Hutter, “SGDR: Stochastic gradient descent with warm restarts,” 2016, arXiv:1608.03983.
[56]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998.
[57]
H.-J. Shim, J.-W. Jung, J.-H. Kim, and H.-J. Yu, “Capturing discriminative information using a deep architecture in acoustic scene classification,” Appl. Sci., vol. 11, no. 18, p. 8361, 2021.
[58]
J. Liang, T. Zhang, and G. Feng, “Channel compression: Rethinking information redundancy among channels in CNN architecture,” IEEE Access, vol. 8, pp. 147265–147274, 2020.
[59]
L. Gao, H. Mi, B. Zhu, D. Feng, Y. Li, and Y. Peng, “An adversarial feature distillation method for audio classification,” IEEE Access, vol. 7, pp. 105319–105330, 2019.
[60]
V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A deep convolutional encoder–decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 2481–2495, Dec. 2017.
[61]
X. Ma, Y. Shao, Y. Ma, and W.-Q. Zhang, “Deep semantic encoder–decoder network for acoustic scene classification with multiple devices,” in Proc. IEEE Asia–Pac. Signal Inf. Process. Assoc. Annu. Summit Conf. (APSIPA ASC), 2020, pp. 365–370.
[62]
X. Wu, R. He, Z. Sun, and T. Tan, “A light CNN for deep face representation with noisy labels,” IEEE Trans. Inf. Forensics Security, vol. 13, no. 11, pp. 2884–2896, Nov. 2018.
[63]
H.-J. Shim, J.-H. Kim, J.-W. Jung, and H.-J. Yu, “Attentive max feature map for acoustic scene classification with joint learning considering the abstraction of classes,” 2021, arXiv:2104.07213.
[64]
J. Cho, S. Yun, H. Park, J. Eum, and K. Hwang, “Acoustic scene classification based on a large-margin factorized CNN,” 2019, arXiv:1910.06784.
[65]
H. Chen, Z. Liu, Z. Liu, P. Zhang, and Y. Yan, “Integrating the data augmentation scheme with various classifiers for acoustic scene modeling,” in Proc. DCASE Challenge, Jun. 2019, pp. 1–5.
[66]
J. Naranjo-Alcazar, S. Perez-Castanos, P. Zuccarello, and M. Cobos, “Acoustic scene classification with squeeze-excitation residual networks,” IEEE Access, vol. 8, pp. 112287–112296, 2020.
[67]
L. Pham, H. Phan, T. Nguyen, R. Palaniappan, A. Mertins, and I. McLoughlin, “Robust acoustic scene classification using a multi-spectrogram encoder–decoder framework,” Digit. Signal Process., vol. 110, Mar. 2021, Art. no.
[68]
Y. Liping, C. Xinxing, and T. Lianjie, “Acoustic scene classification using multi-scale features,” in Proc. Detection Classif. Acoustic Scenes Events Workshop (DCASE), Nov. 2018, pp. 29–33.
[69]
J. Naranjo-Alcazar, S. Perez-Castanos, M. Cobos, F. J. Ferri, and P. Zuccarello, “Task 1A DCASE 2021: Acoustic scene classification with mismatch-devices using squeeze-excitation technique and low-complexity constraint,” 2021, arXiv:2107.14658.
[70]
Y. Lee, S. Lim, and I.-Y. Kwak, “CNN-based acoustic scene classification system,” Electronics, vol. 10, no. 4, p. 371, 2021.
[71]
Y. Liu, H. Yang, C. Shi, and J. Liang, “HRTF-based data augmentation method for acoustic scene classification,” in Proc. IEEE Int. Symp. Broadband Multimedia Syst. Broadcast. (BMSB), 2021, pp. 1–5.
[72]
J.-H. Kim, J.-W. Jung, H.-J. Shim, and H.-J. Yu, “Audio tag representation guided dual attention network for acoustic scene classification,” in Proc. Detection Classification Acoust. Scenes Events Workshop (DCASE), 2020, pp. 1–5.
[73]
W. Gao and M. McDonnell, “Acoustic scene classification using deep residual networks with focal loss and mild domain adaptation,” in Proc. DCASE Challenge, Jun. 2020, pp. 1–2.
[74]
K. Koutini, F. Henkel, H. Eghbal-zadeh, and G. Widmer, “CP-JKU submissions to DCASE'20: Low-complexity cross-device acoustic scene classification with RF-regularized CNNs,” in Proc. DCASE Challenge, Jun. 2020, pp. 1–4.
[75]
L. Jie, “Acoustic scene classification with residual networks and attention mechanism,” in Proc. DCASE Challenge, Jun. 2020, pp. 1–3.
[76]
X.-Y. Kek, C.-S. Chin, and Y. Li, “Acoustic scene classification using bilinear pooling on time-liked and frequency-liked convolution neural network,” in Proc. IEEE Symp. Comput. Intell. (SSCI), 2019, pp. 3189–3194.
[77]
C. Paseddula and S. V. Gangashetty, “Acoustic scene classification using single frequency filtering cepstral coefficients and DNN,” in Proc. IEEE Int. Joint Conf. Neural Netw. (IJCNN), 2020, pp. 1–6.
[78]
A. G. Howardet al., “MobileNets: Efficient convolutional neural networks for mobile vision applications,” 2017, arXiv:1704.04861.
[79]
R. Zhang, W. Zou, and X. Li, “Cross-task pre-training for on-device acoustic scene classification,” 2019, arXiv:1910.09935.
[80]
Z. Ren, Q. Kong, J. Han, M. D. Plumbley, and B. W. Schuller, “CAA-Net: Conditional Atrous CNNs with attention for explainable device-robust acoustic scene classification,” IEEE Trans. Multimedia, vol. 23, pp. 4131–4142, 2021. 10.1109/TMM.2020.3037534. [Online]. Available: https://ieeexplore.ieee.org/xpl/issues?punumber=6046&isnumber=9687854
[81]
A. M. Tripathi and A. Mishra, “Self-supervised learning for environmental sound classification,” Appl. Acoust., vol. 182, Nov. 2021, Art. no.
[82]
S. Hyeji and P. Jihwan, “Acoustic scene classification using various pre-processed features and convolutional neural networks,” in Proc. DCASE Challenge, Jun. 2019, pp. 1–4.
[83]
J. Huang, P. L. Meyer, H. Lu, H. C. Maruri, and J. D. Hoyo, “Acoustic scene classification using deep learning-based ensemble averaging,” in Proc. DCASE Challenge, Jun. 2019, pp. 1–5.
[84]
X. Bai, J. Du, Z.-R. Wang, and C.-H. Lee, “A hybrid approach to acoustic scene classification based on universal acoustic models,” in Proc. Interspeech, 2019, pp. 3619–3623.
[85]
T. Zhang and J. Wu, “Constrained learned feature extraction for acoustic scene classification,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 27, no. 8, pp. 1216–1228, Aug. 2019.
[86]
Q. Kong, T. Iqbal, Y. Xu, W. Wang, and M. D. Plumbley, “DCASE 2018 challenge surrey cross-task convolutional neural network baseline,” in Proc. Detection Classification Acoust. Scenes Events Workshop (DCASE2018), Nov. 2018, pp. 217–221.
[87]
F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 1251–1258.
[88]
Z. Ren, Q. Kong, J. Han, M. D. Plumbley, and B. W. Schuller, “Attention-based atrous convolutional neural networks: Visualisation and understanding perspectives of acoustic scenes,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), 2019, pp. 56–60.
[89]
L. Zhang, Z. Shi, and J. Han, “Pyramidal temporal pooling with discriminative mapping for audio classification,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 28, pp. 770–784, 2020. 10.1109/TASLP.2020.2966868. [Online]. Available: https://ieeexplore.ieee.org/xpl/issues?punumber=6570655&isnumber=9657755
[90]
H. Zeinali, L. Burget, and H. Cernocky, “Convolutional neural networks and X-vector embedding for DCASE2018 acoustic scene classification challenge,” in Proc. Detection Classification Acoust. Scenes Events 2018 Workshop (DCASE), Nov. 2018, pp. 202–206.
[91]
Y. Sakashita and M. Aono, “Acoustic scene classification by ensemble of spectrograms based on adaptive temporal divisions,” in Proc. DCASE Challenge, Sep. 2018, pp. 1–5.
[92]
Z. Li, L. Zhang, S. Du, and W. Liu, “Acoustic scene classification based on binaural deep scattering spectra with CNN and LSTM,” in Proc. DCASE Challenge, Sep. 2018, pp. 1–3.
[93]
M. Dorfer, B. Lehner, H. Eghbal-zadeh, H. Christop, P. Fabian, and W. Gerhard, “Acoustic scene classification with fully convolutional neural networks and I-vectors,” in Proc. DCASE Challenge, Sep. 2018, pp. 1–4.

Cited By

View all
  • (2024)Acoustic scene classificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121902238:PBOnline publication date: 27-Feb-2024
  • (2023)A Genetic Causal Explainer for Deep Knowledge TracingIEEE Transactions on Evolutionary Computation10.1109/TEVC.2023.328666628:4(861-875)Online publication date: 15-Jun-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Evolutionary Computation
IEEE Transactions on Evolutionary Computation  Volume 27, Issue 2
April 2023
195 pages

Publisher

IEEE Press

Publication History

Published: 01 April 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Acoustic scene classificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121902238:PBOnline publication date: 27-Feb-2024
  • (2023)A Genetic Causal Explainer for Deep Knowledge TracingIEEE Transactions on Evolutionary Computation10.1109/TEVC.2023.328666628:4(861-875)Online publication date: 15-Jun-2023

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media