Deep Prior Approach for Room Impulse Response Reconstruction
<p>Example of a RIR image. The RIRs of the microphones are arranged along the columns of <math display="inline"><semantics> <mi mathvariant="bold">H</mi> </semantics></math>.</p> "> Figure 2
<p>The overall block scheme of the Unet-like structure of the network, along with the block diagrams of MultiRes and Residual path blocks.</p> "> Figure 3
<p>RIRs reconstruction procedure. The network output <math display="inline"><semantics> <mover accent="true"> <mi mathvariant="bold">H</mi> <mo>^</mo> </mover> </semantics></math> is shown at various iterations. The observation <math display="inline"><semantics> <mover accent="true"> <mi mathvariant="bold">H</mi> <mo>˜</mo> </mover> </semantics></math> and the ground truth RIR image <math display="inline"><semantics> <mi mathvariant="bold">H</mi> </semantics></math> are reported for reference.</p> "> Figure 4
<p>2D graphical representation of the simulated setups (top view). The microphones of the ULA are depicted as black dots, while the sources are represented as black triangles. The elements are scaled with respect to the room dimensions, hence the distance between adjacent microphones of the ULA is barely noticeable. (<b>a</b>) Variable DOA (<math display="inline"><semantics> <mi>θ</mi> </semantics></math>) and (<b>b</b>) 12 sources setups.</p> "> Figure 5
<p>Example of a RIR image <math display="inline"><semantics> <mi mathvariant="bold">H</mi> </semantics></math> (<b>a</b>,<b>d</b>), the observed data <math display="inline"><semantics> <mover accent="true"> <mi mathvariant="bold">H</mi> <mo>˜</mo> </mover> </semantics></math> (<b>b</b>,<b>e</b>) and the reconstruction <math display="inline"><semantics> <mover accent="true"> <mi mathvariant="bold">H</mi> <mo>^</mo> </mover> </semantics></math> obtained through the proposed technique (<b>c</b>,<b>f</b>). (<b>a</b>–<b>c</b>) The time axis is zoomed around the direct path and the first reflections. (<b>d</b>–<b>f</b>) The time axis is zoomed between <math display="inline"><semantics> <mrow> <mn>70</mn> <mo> </mo> <mrow> <mi mathvariant="normal">m</mi> <mi mathvariant="normal">s</mi> </mrow> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mn>100</mn> <mo> </mo> <mrow> <mi mathvariant="normal">m</mi> <mi mathvariant="normal">s</mi> </mrow> </mrow> </semantics></math>.</p> "> Figure 6
<p>(<b>a</b>) NMSE as a function of the DOA (<math display="inline"><semantics> <mi>θ</mi> </semantics></math>) of the source. (<b>b</b>) Average NMSE as a function of the <math display="inline"><semantics> <msub> <mi mathvariant="normal">T</mi> <mn>60</mn> </msub> </semantics></math> of the room and (<b>c</b>) SNR of the sensors.</p> "> Figure 7
<p>NMSE of the RIR reconstruction as a function of the SNR for rooms Balder (<b>a</b>), Freja (<b>b</b>) and Munin (<b>c</b>) from [<a href="#B20-sensors-22-02710" class="html-bibr">20</a>]. The RIR interpolations are obtained using 20 (triangle marks) or 33 (square marks) microphones. Adapted from Ref. [<a href="#B20-sensors-22-02710" class="html-bibr">20</a>], with permission from Elsevier (2022).</p> ">
Abstract
:1. Introduction
2. Problem Formulation
2.1. Rir Data Model
2.2. RIR Reconstruction Via Deep Prior
3. Network Description
4. Numerical Analysis
5. Experimental Analysis
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Acknowledgments
Conflicts of Interest
References
- Tohyama, M.; Koike, T. (Eds.) Transfer Function and Frequency Response Function. In Fundamentals of Acoustic Signal Processing; Academic Press: London, UK, 1998; pp. 75–110. [Google Scholar] [CrossRef]
- Nelson, P.A.; Elliott, S.J. Active Control of Sound; Academic Press: New York, NY, USA, 1991. [Google Scholar]
- Cobos, M.; Antonacci, F.; Alexandridis, A.; Mouchtaris, A.; Lee, B. A survey of sound source localization methods in wireless acoustic sensor networks. Wirel. Commun. Mob. Comput. 2017, 2017, 3956282. [Google Scholar] [CrossRef]
- Gannot, S.; Vincent, E.; Markovich-Golan, S.; Ozerov, A. A consolidated perspective on multimicrophone speech enhancement and source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 2017, 25, 692–730. [Google Scholar] [CrossRef] [Green Version]
- Pezzoli, M.; Carabias-Orti, J.J.; Cobos, M.; Antonacci, F.; Sarti, A. Ray-Space-Based Multichannel Nonnegative Matrix Factorization for Audio Source Separation. IEEE Signal Process. Lett. 2021, 28, 369–373. [Google Scholar] [CrossRef]
- Tylka, J.G.; Choueiri, E.Y. Fundamentals of a parametric method for virtual navigation within an array of ambisonics microphones. J. Audio Eng. Soc. 2020, 68, 120–137. [Google Scholar] [CrossRef]
- Rife, D.D.; Vanderkooy, J. Transfer-function measurement with maximum-length sequences. J. Audio Eng. Soc. 1989, 37, 419–444. [Google Scholar]
- Farina, A. Advancements in Impulse Response Measurements by Sine Sweeps. In Audio Engineering Society Convention 122; Audio Engineering Society: Vienna, Austria, 2007; Available online: http://www.aes.org/e-lib/browse.cfm?elib=14106 (accessed on 28 March 2022).
- Stan, G.B.; Embrechts, J.J.; Archambeau, D. Comparison of different impulse response measurement techniques. J. Audio Eng. Soc. 2002, 50, 249–262. [Google Scholar]
- Ajdler, T.; Sbaiz, L.; Vetterli, M. Dynamic measurement of room impulse responses using a moving microphone. J. Acoust. Soc. Am. 2007, 122, 1636–1645. [Google Scholar] [CrossRef]
- Thiergart, O.; Del Galdo, G.; Taseska, M.; Habets, E.A.P. Geometry-based spatial sound acquisition using distributed microphone arrays. IEEE Trans. Audio Speech Lang. Process. 2013, 21, 2583–2594. [Google Scholar] [CrossRef]
- Pezzoli, M.; Borra, F.; Antonacci, F.; Sarti, A.; Tubaro, S. Estimation of the Sound Field at Arbitrary Positions in Distributed Microphone Networks Based on Distributed Ray Space Transform. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 186–190. [Google Scholar]
- Pezzoli, M.; Borra, F.; Antonacci, F.; Sarti, A.; Tubaro, S. Reconstruction of the Virtual Microphone Signal Based on the Distributed Ray Space Transform. In Proceedings of the 26th European Signal Processing Conference (EUSIPCO), Rome, Italy, 3–7 September 2018; pp. 1537–1541. [Google Scholar]
- Pezzoli, M.; Borra, F.; Antonacci, F.; Tubaro, S.; Sarti, A. A parametric approach to virtual miking for sources of arbitrary directivity. IEEE/ACM Trans. Audio Speech Lang. Process. 2020, 28, 2333–2348. [Google Scholar] [CrossRef]
- Pulkki, V.; Delikaris-Manias, S.; Politis, A. Parametric Time-Frequency Domain Spatial Audio; Wiley Online Library: Hoboken, NJ, USA, 2018. [Google Scholar]
- Das, O.; Calamia, P.; Gari, S.V.A. Room Impulse Response Interpolation from a Sparse Set of Measurements Using a Modal Architecture. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 960–964. [Google Scholar]
- Haneda, Y.; Kaneda, Y.; Kitawaki, N. Common-acoustical-pole and residue model and its application to spatial interpolation and extrapolation of a room transfer function. IEEE Trans. Speech Audio Process. 1999, 7, 709–717. [Google Scholar] [CrossRef]
- Koyama, S.; Daudet, L. Sparse Representation of a Spatial Sound Field in a Reverberant Environment. IEEE J. Sel. Top. Signal Process. 2019, 13, 172–184. [Google Scholar] [CrossRef]
- Damiano, S.; Borra, F.; Bernardini, A.; Antonacci, F.; Sarti, A. Soundfield reconstruction in reverberant rooms based on compressive sensing and image-source models of early reflections. In Proceedings of the 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 17–20 October 2021. [Google Scholar]
- Zea, E. Compressed sensing of impulse responses in rooms of unknown properties and contents. J. Sound Vib. 2019, 459, 114871. [Google Scholar] [CrossRef] [Green Version]
- Antonello, N.; De Sena, E.; Moonen, M.; Naylor, P.A.; Van Waterschoot, T. Room impulse response interpolation using a sparse spatio-temporal representation of the sound field. IEEE/ACM Trans. Audio Speech Lang. Process. 2017, 25, 1929–1941. [Google Scholar] [CrossRef] [Green Version]
- Borra, F.; Gebru, I.D.; Markovic, D. Soundfield reconstruction in reverberant environments using higher-order microphones and impulse response measurements. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 281–285. [Google Scholar]
- Borra, F.; Krenn, S.; Gebru, I.D.; Marković, D. 1st-order microphone array system for large area sound field recording and reconstruction: Discussion and preliminary results. In Proceedings of the 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 20–23 October 2019; pp. 378–382. [Google Scholar]
- Birnie, L.; Abhayapala, T.; Tourbabin, V.; Samarasinghe, P. Mixed Source Sound Field Translation for Virtual Binaural Application With Perceptual Validation. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 1188–1203. [Google Scholar] [CrossRef]
- Mignot, R.; Chardon, G.; Daudet, L. Low frequency interpolation of room impulse responses using compressed sensing. IEEE/ACM Trans. Audio Speech Lang. Process. 2013, 22, 205–216. [Google Scholar] [CrossRef] [Green Version]
- Jin, W.; Kleijn, W.B. Theory and design of multizone soundfield reproduction using sparse methods. IEEE/ACM Trans. Audio Speech Lang. Process. 2015, 23, 2343–2355. [Google Scholar]
- Williams, E.G. Fourier Acoustics; Academic Press: London, UK, 1999. [Google Scholar]
- Fahim, A.; Samarasinghe, P.N.; Abhayapala, T.D. Sound field separation in a mixed acoustic environment using a sparse array of higher order spherical microphones. In Proceedings of the 2017 Hands-Free Speech Communications and Microphone Arrays (HSCMA), San Francisco, CA, USA, 1–3 March 2017; pp. 151–155. [Google Scholar]
- Pezzoli, M.; Cobos, M.; Antonacci, F.; Sarti, A. Sparsity-Based Sound Field Separation in The Spherical Harmonics Domain. In Proceedings of the Accepted for IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 22–27 May 2022. [Google Scholar]
- Lee, S. Review: The Use of Equivalent Source Method in Computational Acoustics. J. Comput. Acoust. 2016, 25, 1630001. [Google Scholar] [CrossRef] [Green Version]
- Donoho, D.L. Compressed sensing. IEEE Trans. Inf. Theory 2006, 52, 1289–1306. [Google Scholar] [CrossRef]
- Herrmann, F.J.; Hennenfent, G. Non-parametric seismic data recovery with curvelet frames. Geophys. J. Int. 2008, 173, 233–248. [Google Scholar] [CrossRef] [Green Version]
- Labate, D.; Lim, W.Q.; Kutyniok, G.; Weiss, G.; Labate, D.; Lim, W.Q.; Kutyniok, G.; Weiss, G. Sparse Multidimensional Representation Using Shearlets. In Proceedings of the Wavelets XI, International Society for Optics and Photonics, San Diego, CA, USA, 31 July–4 August 2005; Volume 5914, p. 59140U. [Google Scholar]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; Available online: http://www.deeplearningbook.org (accessed on 28 March 2022).
- Olivieri, M.; Pezzoli, M.; Antonacci, F.; Sarti, A. A Physics-Informed Neural Network Approach for Nearfield Acoustic Holography. Sensors 2021, 21, 7834. [Google Scholar] [CrossRef]
- Bianco, M.J.; Gerstoft, P.; Traer, J.; Ozanich, E.; Roch, M.A.; Gannot, S.; Deledalle, C.A. Machine learning in acoustics: Theory and applications. J. Acoust. Soc. Am. (JASA) 2019, 146, 3590–3628. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Olivieri, M.; Malvermi, R.; Pezzoli, M.; Zanoni, M.; Gonzalez, S.; Antonacci, F.; Sarti, A. Audio Information Retrieval and Musical Acoustics. IEEE Instrum. Meas. Mag. 2021, 24, 10–20. [Google Scholar] [CrossRef]
- Olivieri, M.; Pezzoli, M.; Malvermi, R.; Antonacci, F.; Sarti, A. Near-field Acoustic Holography analysis with Convolutional Neural Networks. In Proceedings of the INTER-NOISE and NOISE-CON Congress and Conference Proceedings, Seoul, Korea, 23–26 August 2020; Volume 261, pp. 5607–5618. [Google Scholar]
- Campagnoli, C.; Pezzoli, M.; Antonacci, F.; Sarti, A. Vibrational modal shape interpolation through convolutional auto encoder. In Proceedings of the INTER-NOISE and NOISE-CON Congress and Conference Proceedings, Seoul, Korea, 23–26 August 2020; Volume 261, pp. 5619–5626. [Google Scholar]
- Lluís, F.; Martínez-Nuevo, P.; Bo Møller, M.; Ewan Shepstone, S. Sound field reconstruction in rooms: Inpainting meets super-resolution. J. Acoust. Soc. Am. 2020, 148, 649–659. [Google Scholar] [CrossRef] [PubMed]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Bertalmio, M.; Sapiro, G.; Caselles, V.; Ballester, C. Image inpainting. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, New Orleans, LA, USA, 23–28 July 2000; pp. 417–424. [Google Scholar]
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Lee, K.M. Enhanced Deep Residual Networks for Single Image Super-Resolution. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 1132–1140. [Google Scholar]
- Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Deep image prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 9446–9454. [Google Scholar]
- Dittmer, S.; Kluth, T.; Maass, P.; Baguer, D.O. Regularization by architecture: A deep prior approach for inverse problems. J. Math. Imaging Vis. 2020, 62, 456–470. [Google Scholar] [CrossRef] [Green Version]
- Kong, F.; Lipari, V.; Picetti, F.; Bestagini, P.; Tubaro, S. A deep prior convolutional autoencoder for seismic data interpolation. In Proceedings of the EAGE 2020 Annual Conference & Exhibition Online, European Association of Geoscientists & Engineers, Online, 8–11 December 2020; pp. 1–5. [Google Scholar]
- Picetti, F.; Lipari, V.; Bestagini, P.; Tubaro, S. Anti-Aliasing Add-On for Deep Prior Seismic Data Interpolation. arXiv 2021, arXiv:2101.11361. [Google Scholar]
- Kong, F.; Picetti, F.; Lipari, V.; Bestagini, P.; Tang, X.; Tubaro, S. Deep Prior-Based Unsupervised Reconstruction of Irregularly Sampled Seismic Data. IEEE Geosci. Remote Sens. Lett. 2020, 19, 7501305. [Google Scholar] [CrossRef]
- Malvermi, R.; Antonacci, F.; Sarti, A.; Corradi, R. Prediction of Missing Frequency Response Functions through Deep Image Prior. Proceedings of 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 17–20 October 2021. [Google Scholar]
- Michelashvili, M.; Wolf, L. Audio denoising with deep network priors. arXiv 2019, arXiv:1904.07612. [Google Scholar]
- Ibtehaz, N.; Rahman, M.S. MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation. Neural Netw. 2020, 121, 74–87. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar] [CrossRef] [Green Version]
- Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. Proc. ICML Citeseer 2013, 30, 3. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32; Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2019; pp. 8024–8035. [Google Scholar]
- Kingma, D.; Ba, J. Adam: A Method for Stochastic Optimization, 3rd International Conference on Learning Representations. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Pezzoli, M.; Comanducci, L.; Waltz, J.; Agnello, A.; Bondi, L.; Canclini, A.; Sarti, A. A Dante Powered Modular Microphone Array System. In Proceedings of the Audio Engineering Society Convention 145, Audio Engineering Society, New York, NY, USA, 17–20 October 2018; Available online: http://www.aes.org/e-lib/browse.cfm?elib=19743 (accessed on 28 March 2022).
- Gunda, R.; Vijayakar, S.; Singh, R. Method of images for the harmonic response of beams and rectangular plates. J. Sound Vib. 1995, 185, 791–808. [Google Scholar] [CrossRef] [Green Version]
- Scheibler, R.; Bezzam, E.; Dokmanić, I. Pyroomacoustics: A python package for audio room simulation and array processing algorithms. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 351–355. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pezzoli, M.; Perini, D.; Bernardini, A.; Borra, F.; Antonacci, F.; Sarti, A. Deep Prior Approach for Room Impulse Response Reconstruction. Sensors 2022, 22, 2710. https://doi.org/10.3390/s22072710
Pezzoli M, Perini D, Bernardini A, Borra F, Antonacci F, Sarti A. Deep Prior Approach for Room Impulse Response Reconstruction. Sensors. 2022; 22(7):2710. https://doi.org/10.3390/s22072710
Chicago/Turabian StylePezzoli, Mirco, Davide Perini, Alberto Bernardini, Federico Borra, Fabio Antonacci, and Augusto Sarti. 2022. "Deep Prior Approach for Room Impulse Response Reconstruction" Sensors 22, no. 7: 2710. https://doi.org/10.3390/s22072710
APA StylePezzoli, M., Perini, D., Bernardini, A., Borra, F., Antonacci, F., & Sarti, A. (2022). Deep Prior Approach for Room Impulse Response Reconstruction. Sensors, 22(7), 2710. https://doi.org/10.3390/s22072710