Modelling individual head‐related transfer function (HRTF) based on anthropometric parameters and generic HRTF amplitudes
Pages 364 - 378
Abstract
The head‐related transfer function (HRTF) plays a vital role in immersive virtual reality and augmented reality technologies, especially in spatial audio synthesis for binaural reproduction. This article proposes a deep learning method with generic HRTF amplitudes and anthropometric parameters as input features for individual HRTF generation. By designing fully convolutional neural networks, the key anthropometric parameters and the generic HRTF amplitudes were used to predict each individual HRTF amplitude spectrum in the full‐space directions, and the interaural time delay (ITD) was predicted by the transformer module. In the amplitude prediction model, the attention mechanism was adopted to better capture the relationship of HRTF amplitude spectra at two distinctive directions with large angle differences in space. Finally, with the minimum phase model, the predicted amplitude spectrum and ITDs were used to obtain a set of individual head‐related impulse responses. Besides the separate training of the HRTF amplitude and ITD generation models, their joint training was also considered and evaluated. The root‐mean‐square error and the log‐spectral distortion were selected as objective measurement metrics to evaluate the performance. Subjective experiments further showed that the auditory source localisation performance of the proposed method was better than other methods in most cases.
References
[1]
Anderson, J., Rainie, L.: The metaverse in 2040. Pew Research Centre (2022)
[2]
Geronazzo, M., Spagnol, S., Avanzini, F.: Do we need individual head‐related transfer functions for vertical localization? the case study of a spectral notch distance metric. IEEE/ACM Trans. Audio Speech Lang. Process. 26(7), 1247–1260 (2018). https://doi.org/10.1109/taslp.2018.2821846
[3]
Li, S., Peissig, J.: Measurement of head‐related transfer functions: a review. Appl. Sci. 10(14), 5014 (2020). https://doi.org/10.3390/app10145014
[4]
Wenzel, E.M., Wightman, F.L., Kistler, D.J.: Localization with non‐individualized virtual acoustic display cues. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 351–359 (1991)
[5]
Wenzel, E.M., et al.: Localization using non‐individualized head‐related transfer functions. J. Acoust. Soc. Am. 94(1), 111–123 (1993). https://doi.org/10.1121/1.407089
[6]
Oberem, J., et al.: Experiments on localization accuracy with non‐individual and individual HRTFs comparing static and dynamic reproduction methods. bioRxiv (2020)
[7]
Simon, L.S.R., Zacharov, N., Katz, B.F.G.: Perceptual attributes for the comparison of head‐related transfer functions. J. Acoust. Soc. Am. 140(5), 3623–3632 (2016). https://doi.org/10.1121/1.4966115
[8]
Katz, B.F.G.: Computational model of an individual head‐related transfer function using the BEM. J. Acoust. Soc. Am. 105(2), 1193 (1999). https://doi.org/10.1121/1.425624
[9]
Kahana, Y., Nelson, P.A.: Boundary element simulations of the transfer function of human heads and baffled pinnae using accurate geometric models. J. Sound Vib. 300(3–5), 552–579 (2007). https://doi.org/10.1016/j.jsv.2006.06.079
[10]
Sebastian, P., et al.: Influence of voxelization on finite difference time domain simulations of head‐related transfer functions. J. Acoust. Soc. Am. 139(5), 2489 (2015)
[11]
Meshram, A., Mehra, R., Manocha, D.: Efficient HRTF computation using adaptive rectangular decomposition. In: Audio Engineering Society Conference: 55th International Conference: Spatial Audio (2014)
[12]
Nail, A.G., Ramani, D., Dmitry, N.Z.: Fast multipole accelerated boundary elements for numerical computation of the head related transfer function. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE (2007)
[13]
Zotkin, D., et al.: HRTF personalization using anthropometric measurements. In: 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 157–160. IEEE (2003)
[14]
Xie, B., Zhang, C., Zhong, X.: A cluster and subjective selection‐based HRTF customization scheme for improving binaural reproduction of 5.1 channel surround sound. In: Audio Engineering Society Convention 134 (2013)
[15]
Hu, G., Wahab, W., Gunawan, D.: A new selection method of anthropometric parameters in individualizing head‐related impulse responses. Telkomnika 13(3), 1014 (2015). https://doi.org/10.12928/telkomnika.v13i3.1792
[16]
Middlebrooks, J.C.: Individual differences in external‐ear transfer functions reduced by scaling in frequency. J. Acoust. Soc. Am. 106(3), 1480–1492 (1999). https://doi.org/10.1121/1.427176
[17]
Middlebrooks, J.C.: Virtual localization improved by scaling nonindividualized external‐ear transfer functions in frequency. J. Acoust. Soc. Am. 106(3), 1493–1510 (1999). https://doi.org/10.1121/1.427147
[18]
Hu, H., et al.: HRTF personalization based on artificial neural network in individual virtual auditory space. Appl. Acoust. 69(2), 163–172 (2008). https://doi.org/10.1016/j.apacoust.2007.05.007
[19]
Li, L., Huang, Q.: HRTF personalization modeling based on RBF neural network. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE (2013)
[20]
Wang, L., Zeng, X.Y.: New method for synthesizing personalized head‐related transfer function. In: 2016 IEEE International Workshop on Acoustic Signal Enhancement. IEEE (2016)
[21]
Lee, G.W., Kim, H.K.: Personalized HRTF modeling based on deep neural network using anthropometric measurements and images of the ear. Appl. Sci. 8(11), 2180 (2018). https://doi.org/10.3390/app8112180
[22]
Chen, T.Y., Kuo, T.H., Chi, T.S.: Autoencoding HRTFs for DNN based HRTF personalization using anthropometric features. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 271–275. IEEE (2019)
[23]
Algazi, V.R., et al.: The CIPIC HRTF database. In: Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics, pp. 99–102. IEEE (2001)
[24]
Xie, B.: Head‐Related Transfer Function and Virtual Auditory Display, 2nd edn. J. Ross Publishing, Plantation, FL (2013)
[25]
Zhang, M., et al.: Modeling of individual HRTFs based on spatial principal component analysis. IEEE Trans. Audio Speech Lang. Process. 28(1), 785–797 (2020). https://doi.org/10.1109/taslp.2020.2967539
[26]
Kulkarni, A., Isabelle, S., Colburn, H.: On the minimum‐phase approximation of head‐related transfer functions. In: Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics, pp. 84–87. IEEE (1995)
[27]
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
[28]
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, ser. NIPS’17, pp. 6000–6010. Curran Associates Inc., Red Hook (2017)
[29]
Kulkarni, A., Isabelle, S., Colburn, H.: Sensitivity of human subjects to head‐related transfer‐function phase spectra. J. Acoust. Soc. Am. 105(5), 2821–2840 (1999). https://doi.org/10.1121/1.426898
[30]
Chun, C.J., et al.: Deep neural network based HRTF personalization using anthropometric measurements. J. Audio Eng. Soc. (2017)
[31]
Guo, Z., et al.: Anthropometric‐based clustering of pinnae and its application in personalizing HRTFs. Int. J. Ind. Ergon. 81, 103076 (2021). https://doi.org/10.1016/j.ergon.2020.103076
[32]
Wu, R., Yu, G., So, R.H.: Key anthropometric parameters of pinna correlate with individualized head‐related transfer functions. In: INTER‐NOISE and NOISE‐CON Congress and Conference Proceedings, vol. 255(3), pp. 4023–4028 (2017)
Index Terms
- Modelling individual head‐related transfer function (HRTF) based on anthropometric parameters and generic HRTF amplitudes
Index terms have been assigned to the content through auto-classification.
Recommendations
Interaural time difference individualization in HRTF by scaling through anthropometric parameters
AbstractHead-related transfer function (HRTF) individualization can improve the perception of binaural sound. The interaural time difference (ITD) of the HRTF is a relevant cue for sound localization, especially in azimuth. Therefore, individualization of ...
Measuring Anthropometric Data for HRTF Personalization
SITIS '10: Proceedings of the 2010 Sixth International Conference on Signal-Image Technology and Internet Based SystemsNowadays, multimodal human-like sensing, e.g. vision, hap tics and audition seeks to improve interaction between an operator (human) and a teleoperator (robot) in human centered robotic systems. Head Related Transfer Function (HRTF) based sound ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
© 2023 The Authors. CAAI Transactions on Intelligence Technology published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology and Chongqing University of Technology.
This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial‐NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.
Publisher
John Wiley & Sons, Inc.
United States
Publication History
Published: 22 February 2023
Author Tags
Qualifiers
- Research-article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Reflects downloads up to 18 Jan 2025