[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Modelling individual head‐related transfer function (HRTF) based on anthropometric parameters and generic HRTF amplitudes

Published: 22 February 2023 Publication History

Abstract

The head‐related transfer function (HRTF) plays a vital role in immersive virtual reality and augmented reality technologies, especially in spatial audio synthesis for binaural reproduction. This article proposes a deep learning method with generic HRTF amplitudes and anthropometric parameters as input features for individual HRTF generation. By designing fully convolutional neural networks, the key anthropometric parameters and the generic HRTF amplitudes were used to predict each individual HRTF amplitude spectrum in the full‐space directions, and the interaural time delay (ITD) was predicted by the transformer module. In the amplitude prediction model, the attention mechanism was adopted to better capture the relationship of HRTF amplitude spectra at two distinctive directions with large angle differences in space. Finally, with the minimum phase model, the predicted amplitude spectrum and ITDs were used to obtain a set of individual head‐related impulse responses. Besides the separate training of the HRTF amplitude and ITD generation models, their joint training was also considered and evaluated. The root‐mean‐square error and the log‐spectral distortion were selected as objective measurement metrics to evaluate the performance. Subjective experiments further showed that the auditory source localisation performance of the proposed method was better than other methods in most cases.

References

[1]
Anderson, J., Rainie, L.: The metaverse in 2040. Pew Research Centre (2022)
[2]
Geronazzo, M., Spagnol, S., Avanzini, F.: Do we need individual head‐related transfer functions for vertical localization? the case study of a spectral notch distance metric. IEEE/ACM Trans. Audio Speech Lang. Process. 26(7), 1247–1260 (2018). https://doi.org/10.1109/taslp.2018.2821846
[3]
Li, S., Peissig, J.: Measurement of head‐related transfer functions: a review. Appl. Sci. 10(14), 5014 (2020). https://doi.org/10.3390/app10145014
[4]
Wenzel, E.M., Wightman, F.L., Kistler, D.J.: Localization with non‐individualized virtual acoustic display cues. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 351–359 (1991)
[5]
Wenzel, E.M., et al.: Localization using non‐individualized head‐related transfer functions. J. Acoust. Soc. Am. 94(1), 111–123 (1993). https://doi.org/10.1121/1.407089
[6]
Oberem, J., et al.: Experiments on localization accuracy with non‐individual and individual HRTFs comparing static and dynamic reproduction methods. bioRxiv (2020)
[7]
Simon, L.S.R., Zacharov, N., Katz, B.F.G.: Perceptual attributes for the comparison of head‐related transfer functions. J. Acoust. Soc. Am. 140(5), 3623–3632 (2016). https://doi.org/10.1121/1.4966115
[8]
Katz, B.F.G.: Computational model of an individual head‐related transfer function using the BEM. J. Acoust. Soc. Am. 105(2), 1193 (1999). https://doi.org/10.1121/1.425624
[9]
Kahana, Y., Nelson, P.A.: Boundary element simulations of the transfer function of human heads and baffled pinnae using accurate geometric models. J. Sound Vib. 300(3–5), 552–579 (2007). https://doi.org/10.1016/j.jsv.2006.06.079
[10]
Sebastian, P., et al.: Influence of voxelization on finite difference time domain simulations of head‐related transfer functions. J. Acoust. Soc. Am. 139(5), 2489 (2015)
[11]
Meshram, A., Mehra, R., Manocha, D.: Efficient HRTF computation using adaptive rectangular decomposition. In: Audio Engineering Society Conference: 55th International Conference: Spatial Audio (2014)
[12]
Nail, A.G., Ramani, D., Dmitry, N.Z.: Fast multipole accelerated boundary elements for numerical computation of the head related transfer function. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE (2007)
[13]
Zotkin, D., et al.: HRTF personalization using anthropometric measurements. In: 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 157–160. IEEE (2003)
[14]
Xie, B., Zhang, C., Zhong, X.: A cluster and subjective selection‐based HRTF customization scheme for improving binaural reproduction of 5.1 channel surround sound. In: Audio Engineering Society Convention 134 (2013)
[15]
Hu, G., Wahab, W., Gunawan, D.: A new selection method of anthropometric parameters in individualizing head‐related impulse responses. Telkomnika 13(3), 1014 (2015). https://doi.org/10.12928/telkomnika.v13i3.1792
[16]
Middlebrooks, J.C.: Individual differences in external‐ear transfer functions reduced by scaling in frequency. J. Acoust. Soc. Am. 106(3), 1480–1492 (1999). https://doi.org/10.1121/1.427176
[17]
Middlebrooks, J.C.: Virtual localization improved by scaling nonindividualized external‐ear transfer functions in frequency. J. Acoust. Soc. Am. 106(3), 1493–1510 (1999). https://doi.org/10.1121/1.427147
[18]
Hu, H., et al.: HRTF personalization based on artificial neural network in individual virtual auditory space. Appl. Acoust. 69(2), 163–172 (2008). https://doi.org/10.1016/j.apacoust.2007.05.007
[19]
Li, L., Huang, Q.: HRTF personalization modeling based on RBF neural network. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE (2013)
[20]
Wang, L., Zeng, X.Y.: New method for synthesizing personalized head‐related transfer function. In: 2016 IEEE International Workshop on Acoustic Signal Enhancement. IEEE (2016)
[21]
Lee, G.W., Kim, H.K.: Personalized HRTF modeling based on deep neural network using anthropometric measurements and images of the ear. Appl. Sci. 8(11), 2180 (2018). https://doi.org/10.3390/app8112180
[22]
Chen, T.Y., Kuo, T.H., Chi, T.S.: Autoencoding HRTFs for DNN based HRTF personalization using anthropometric features. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 271–275. IEEE (2019)
[23]
Algazi, V.R., et al.: The CIPIC HRTF database. In: Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics, pp. 99–102. IEEE (2001)
[24]
Xie, B.: Head‐Related Transfer Function and Virtual Auditory Display, 2nd edn. J. Ross Publishing, Plantation, FL (2013)
[25]
Zhang, M., et al.: Modeling of individual HRTFs based on spatial principal component analysis. IEEE Trans. Audio Speech Lang. Process. 28(1), 785–797 (2020). https://doi.org/10.1109/taslp.2020.2967539
[26]
Kulkarni, A., Isabelle, S., Colburn, H.: On the minimum‐phase approximation of head‐related transfer functions. In: Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics, pp. 84–87. IEEE (1995)
[27]
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
[28]
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, ser. NIPS’17, pp. 6000–6010. Curran Associates Inc., Red Hook (2017)
[29]
Kulkarni, A., Isabelle, S., Colburn, H.: Sensitivity of human subjects to head‐related transfer‐function phase spectra. J. Acoust. Soc. Am. 105(5), 2821–2840 (1999). https://doi.org/10.1121/1.426898
[30]
Chun, C.J., et al.: Deep neural network based HRTF personalization using anthropometric measurements. J. Audio Eng. Soc. (2017)
[31]
Guo, Z., et al.: Anthropometric‐based clustering of pinnae and its application in personalizing HRTFs. Int. J. Ind. Ergon. 81, 103076 (2021). https://doi.org/10.1016/j.ergon.2020.103076
[32]
Wu, R., Yu, G., So, R.H.: Key anthropometric parameters of pinna correlate with individualized head‐related transfer functions. In: INTER‐NOISE and NOISE‐CON Congress and Conference Proceedings, vol. 255(3), pp. 4023–4028 (2017)

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image CAAI Transactions on Intelligence Technology
CAAI Transactions on Intelligence Technology  Volume 8, Issue 2
June 2023
261 pages
EISSN:2468-2322
DOI:10.1049/cit2.v8.2
Issue’s Table of Contents
This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial‐NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.

Publisher

John Wiley & Sons, Inc.

United States

Publication History

Published: 22 February 2023

Author Tags

  1. audio databases
  2. augmented reality
  3. deep learning
  4. multimedia

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media