Abstract
Ensuring passengers’ safety is one of the daily concerns of railway operators. To do this, various image and sound processing techniques have been proposed in the scientific community. Since the beginning of the 2010s, the development of deep learning made it possible to develop these research areas in the railway field included. Thus, this article deals with the audio events detection task (screams, glass breaks, gunshots, sprays) using deep learning techniques. It describes the methodology for designing a deep learning architecture that is both suitable for audio detection and optimised for embedded railway systems. We will describe how we designed from scratch two CRNN (Convolutional Recurrent Neural Network) for the detection task. And since the creation of a large and varied training database is one of the challenges of deep learning, this article also deals with the innovative methodology used to build a database of audio events in the railway environment. Finally, we will show the very promising results obtained during the experimentation in real of the model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abeßer, J.: A review of deep learning based methods for acoustic scene classification. Appl. Sci. 10(6) (2020)
Adavanne, S., Pertilä, P., Virtanen, T.: Sound event detection using spatial features and convolutional recurrent neural network. In: IEEE International Conference on Acoustics, Speech and Signal Process, New Orleans, LA, USA, 5–9 March 2017, pp. 771–775 (2017)
Adavanne, S., Politis, A., Nikunen, J., Virtanen, T.: Sound event localization and detection of overlapping sources using convolutional recurrent neural networks. IEEE J. Sel. Top. Signal Process. 13(1), 34–48 (2019)
Adavanne, S., Parascandolo, G., Pertilä, P., Heittola, T., Virtanen, T.: Sound event detection in multichannel audio using spatial and harmonic features. In: Detection and Classification of Acoustic Scenes and Events Workshop, Budapest, Hungary, 3 September 2016
Cakir, E., Parascandolo, G., Heittola, T., Huttunen, H., Virtanen, T.: Convolutional recurrent neural networks for polyphonic sound event detection. EEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1291–1303 (2017)
Cho, K., van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics (2014)
Drossos, K., Magron, P., Virtanen, T.: Unsupervised adversarial domain adaptation based on the Wasserstein distance for acoustic scene classification. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 20–23 October 2019, pp. 259–263 (2019)
Foggia, P., Petkov, N., Saggese, A., Strisciuglio, N., Vento, M.: Audio surveillance of roads: a system for detecting anomalous sounds. IEEE Trans. Intell. Transp. Syst. 17(1), 279–288 (2016)
Font, F., Roma, G., Serra, X.: Freesound technical demo. In: ACM International Conference on Multimedia, Barcelona, Spain, 21 October 2013, pp. 411–412 (2013)
Huzaifah, M.: Comparison of time-frequency representations for environmental sound classification using convolutional neural networks. CoRR abs/1706.07156 (2017)
Laffitte, P., Sodoyer, D., Tatkeu, C., Girin, L.: Deep neural networks for automatic detection of screams and shouted speech in subway trains. In: IEEE International Conference on Acoustics, Speech and Signal Process, Shanghai, China, 20–25 March 2016, pp. 6460–6464 (2016)
Laffitte, P., Wang, Y., Sodoyer, D., Girin, L.: Assessing the performances of different neural network architectures for the detection of screams and shouts in public transportation. Expert. Syst. Appl. 117, 29–41 (2019)
Lim, H., Park, J., Lee, K., Han, Y.: Rare sound event detection using 1D convolutional recurrent neural networks. In: Detection and Classification of Acoustic Scenes and Events Workshop, Munich, Germany, 16 November 2017
Mesaros, A., Heittola, T., Virtanen, T.: Metrics for polyphonic sound event detection. Appl. Sci. 6(6), 162 (2016)
Pham, Q.C., et al.: Audio-video surveillance system for public transportation. In: 2nd International Conference on Image Processing Theory, Tools and Applications, Paris, France, 7–10 July 2010. https://doi.org/10.1109/ipta.2010.5586783
Purwins, H., Li, B., Virtanen, T., Schlüter, J., Chang, S., Sainath, T.: Deep learning for audio signal processing. IEEE J. Sel. Top. Signal Process. 13(2), 206–219 (2019)
Ravanelli, M., Bengio, Y.: Speaker recognition from raw waveform with SincNet. In: IEEE Spoken Language Technology Workshop, Athens, Greece, 18–21 December 2018, pp. 1021–1028 (2018)
Salamon, J., Bello, J.P., Farnsworth, A., Kelling, S.: Fusing shallow and deep learning for bioacoustic bird species classification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, New Orleans, LA, USA, 5–9 March 2017, pp. 141–145 (2017)
Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.A.: Striving for simplicity: the all convolutional net. In: 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015
Turpault, N., Serizel, R., Salamon, J., Shah, A.P.: Sound event detection in domestic environments with weakly labeled data and soundscape synthesis. In: Detection and Classification of Acoustic Scenes and Events Workshop, New York University, NY, USA, October 2019, pp. 253–257 (2019)
Virtanen, T., Plumbley, M.D., Ellis, D. (eds.): Computational Analysis of Sound Scenes and Events, 1st edn. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-63450-0
Xie, Y., Liang, R., Liang, Z., Huang, C., Zou, C., Schuller, B.: Speech emotion classification using attention-based LSTM. IEEE/ACM Trans. Audio Speech Lang. Process. 27(11), 1675–1685 (2019)
Zhang, Z., Coutinho, E., Deng, J., Schuller, B.: Cooperative learning and its application to emotion recognition from speech. IEEE/ACM Trans. Audio Speech Lang. Process. 23(1), 115–126 (2015)
Zouaoui, R., et al.: Embedded security system for multi-modal surveillance in a railway carriage. In: Optics and Photonics for Counterterrorism, Crime Fighting, and Defence XI and Optical Materials and Biomaterials in Security and Defence Systems Technology XII. SPIE, Toulouse, France, 21 October 2015
Acknowledgement
We would like to thank Helmi REBAI and Martin OLIVIER for strongly contributing to the advancement of this study.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Marteau, T., Afanou, S., Sodoyer, D., Ambellouis, S., Boukour, F. (2020). Audio Events Detection in Noisy Embedded Railway Environments. In: Bernardi, S., et al. Dependable Computing - EDCC 2020 Workshops. EDCC 2020. Communications in Computer and Information Science, vol 1279. Springer, Cham. https://doi.org/10.1007/978-3-030-58462-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-58462-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58461-0
Online ISBN: 978-3-030-58462-7
eBook Packages: Computer ScienceComputer Science (R0)