Predicting the Colouration between Binaural Signals
<p>Block diagram of the predicted binaural colouration method. ERB denotes equivalent rectangular bandwidth.</p> "> Figure 2
<p>Illustration of the PBC dataset normalisation; marker colours get lighter with iterations. The normalisation gain applied to the test dataset is adjusted until the average PBC between the two datasets is a minimum.</p> "> Figure 3
<p>Conversion of 2 sample input spectra (A and B shown in black and blue respectively) from a dB scale (<b>a</b>) to a Phons (Equal loudness) scale (<b>b</b>). Equal loudness contours are shown in red and labelled in Phons in panel (<b>a</b>).</p> "> Figure 4
<p>Conversion of the sample input signals A and B from <a href="#applsci-12-02441-f003" class="html-fig">Figure 3</a>b from a Phons scale to a sones (perceptual loudness) scale.</p> "> Figure 5
<p>Depiction of the linear sampling of an FFT shown on a logarithmic scale, for sample input signals A and B from <a href="#applsci-12-02441-f004" class="html-fig">Figure 4</a>. When averaging the samples the contribution of each data point is weighted by the inverse ERB value.</p> "> Figure 6
<p>Frequency response of the 3 kHz (blue line) and 10 kHz (red line) +20 dB peak filters with equal ERB filter bandwidths, used to assess the use of equal loudness curves, for which a peak at 3 kHz should be weighted higher than 10 kHz.</p> "> Figure 7
<p>Frequency response of the 1 kHz +20 dB peak filters at 65 dB SPL (blue line) and 45 dB SPL (red line), used to assess the use of the sone scale, for which a peak at a lower amplitude should be weighted lower than a peak at a higher amplitude.</p> "> Figure 8
<p>Frequency response of the 1 kHz +20 dB peak (blue line) and −20 dB notch (red line) filters, used to assess the use of the sone scale, which should weight a notch lower than a peak.</p> "> Figure 9
<p>Frequency response of the 1 kHz (blue line) and 5.5 kHz (red line) +20 dB peak filters of 100 Hz −3 dB filter bandwidths, used to assess the use of equivalent rectangular bandwidth (ERB) weighting, for which a 100 Hz bandwidth peak at a lower frequency should be weighted greater.</p> "> Figure 10
<p>Illustration of the 10-band equalisation used in degrading the binaural stimuli for the the listening test.</p> "> Figure 11
<p>Comparing the mean MUSHRA test results on perceived similarity to the four colouration methods between the test stimuli and the references. Colours denote test sound locations, black line denotes linear regression and <span class="html-italic">r</span> is the Pearson’s correlation coefficient.</p> ">
Abstract
:1. Introduction
2. Method
2.1. Iterative Dataset Normalisation
2.2. Equal Loudness Weighting
2.3. Phon to Sone Conversion
2.4. Equivalent Rectangular Bandwidth Weighting
3. Validation
3.1. Test Scenarios
3.1.1. Feature 1: Equal Loudness
3.1.2. Feature 2: Binaural Loudness Difference
3.1.3. Feature 3: Non-Linear Frequency Scaling
3.2. Listening Test
3.2.1. Test Paradigm
3.2.2. Results
3.3. Discussion
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Munson, W.A.; Gardner, M.B. Standardizing auditory tests. J. Acoust. Soc. Am. 1950, 22, 675. [Google Scholar] [CrossRef]
- Bregman, A.S. Auditory Scene Analysis: The Perceptual Organization of Sound; The MIT Press: Cambridge, MA, USA, 1990. [Google Scholar]
- Rumsey, F.; Zielinski, S.; Kassier, R.; Bech, S. On the relative importance of spatial and timbral fidelities in judgments of degraded multichannel audio quality. J. Acoust. Soc. Am. 2005, 118, 968–976. [Google Scholar] [CrossRef] [PubMed]
- Schärer, Z.; Lindau, A. Evaluation of equalization methods for binaural signals. In Proceedings of the 126th Convention of the Audio Engineering Society, Munich, Germany, 7–10 May 2009. [Google Scholar]
- Schoeffler, M.; Herre, J. The relationship between basic audio quality and overall listening experience. J. Acoust. Soc. Am. 2016, 140, 2101–2112. [Google Scholar] [CrossRef] [PubMed]
- Wiggins, B.; Paterson-Stephens, I.; Schillebeeckx, P. The analysis of multi-channel sound reproduction algorithms using HRTF data. In Proceedings of the AES 19th International Conference, Schloss Elmau, Germany, 21–24 June 2001. [Google Scholar]
- Otani, M.; Hirahara, T.; Ise, S. Numerical study on source-distance dependency of head-related transfer functions. J. Acoust. Soc. Am. 2009, 125, 3253. [Google Scholar] [CrossRef] [PubMed]
- Moore, A.H.; Tew, A.I.; Nicol, R. An initial validation of individualized crosstalk cancellation filters for binaural perceptual experiments. J. Audio Eng. Soc. 2010, 58, 36–45. [Google Scholar]
- Spagnol, S. On distance dependence of pinna spectral patterns in head-related transfer functions. J. Acoust. Soc. Am. 2015, 137, EL58–EL64. [Google Scholar] [CrossRef] [Green Version]
- Yost, W.A.; Fay, R.R. Auditory Perception of Sound Sources; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
- Fletcher, H.; Munson, W.A. Loudness, its definition, measurement and calculation. Bell Syst. Tech. J. 1933, 12, 337–430. [Google Scholar] [CrossRef]
- Fletcher, H.; Munson, W.A. Relation between loudness and masking. J. Acoust. Soc. Am. 1937, 9, 1–10. [Google Scholar] [CrossRef]
- Stevens, S.S. The measurement of loudness. J. Acoust. Soc. Am. 1955, 27, 815–829. [Google Scholar] [CrossRef]
- Zwicker, E. Über psychologische und methodische Grundlagen der Lautheit. Acta Acust. United Acust. 1958, 8, 237–258. [Google Scholar]
- Zwicker, E.; Scharf, B. A model of loudness summation. Psychol. Rev. 1965, 72, 3. [Google Scholar] [CrossRef] [PubMed]
- Zwicker, E.; Zwicker, U.T. Dependence of binaural loudness summation on interaural level differences, spectral distribution, and temporal distribution. J. Acoust. Soc. Am. 1991, 89, 756. [Google Scholar] [CrossRef] [PubMed]
- Moore, B.C.J.; Glasberg, B.R. A revision of Zwicker’s loudness model. Acustica 1996, 82, 335–345. [Google Scholar]
- Moore, B.C.J.; Glasberg, B.R.; Baer, T. A model for the prediction of thresholds, loudness, and partial loudness. J. Audio Eng. Soc. 1997, 45, 224–240. [Google Scholar]
- Thiede, T.; Treurniet, W.C.; Bitto, R.; Schmidmer, C.; Sporer, T.; Beerends, J.G.; Colomes, C.; Keyhl, M.; Stoll, G.; Brandenburg, K.; et al. PEAQ—The ITU standard for objective measurement of perceived audio quality. J. Audio Eng. Soc. 2000, 48, 3–29. [Google Scholar]
- Frank, M. Phantom Sources Using Multiple Loudspeakers in the Horizontal Plane. Ph.D. Thesis, University of Music and Performing Arts Graz, Graz, Austria, 2013. [Google Scholar]
- Zotter, F.; Frank, M.; Haar, C. Spherical microphone array equalization for Ambisonics. In Proceedings of the Fortschritte der Akustik DAGA 2015, Nuremberg, Germany, 16–19 March 2015. [Google Scholar]
- Zaunschirm, M.; Schörkhuber, C.; Höldrich, R. Binaural rendering of Ambisonic signals by HRIR time alignment and a diffuseness constraint. J. Acoust. Soc. Am. 2018, 143, 3616–3627. [Google Scholar] [CrossRef]
- Schörkhuber, C.; Zaunschirm, M.; Höldrich, R. Binaural rendering of Ambisonic signals via magnitude least squares. In Proceedings of the DAGA 2018, Munich, Germany, 19–22 March 2018; pp. 339–342. [Google Scholar]
- Pulkki, V.; Karjalainen, M.; Huopaniemi, J. Analyzing virtual sound source attributes using a binaural auditory model. J. Audio Eng. Soc. 1999, 47, 203–217. [Google Scholar]
- Karjalainen, M. Binaural auditory model for sound quality measurements and spatial hearing studies. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Atlanta, GA, USA, 9 May 1996; pp. 985–988. [Google Scholar] [CrossRef]
- Pulkki, V.; Karjalainen, M. Localization of amplitude-panned virtual sources. I: Stereophonic panning. J. Audio Eng. Soc. 2001, 49, 739–752. [Google Scholar]
- Ono, K.; Pulkki, V.; Karjalainen, M. Binaural modeling of multiple sound source perception: Methodology and coloration experiments. In Proceedings of the 111th Convention of the Audio Engineering Society, New York, NY, USA, 30 November–3 December 2001. [Google Scholar]
- Ono, K.; Pulkki, V.; Karjalainen, M. Binaural modeling of multiple sound source perception: Coloration of wideband sound. In Proceedings of the 112th Convention of the Audio Engineering Society, Munich, Germany, 10–13 May 2002. [Google Scholar]
- Hameed, S.; Pulkki, V. Modeling of coloration of virtual sound sources in listening rooms. In Proceedings of the Baltic-Nordic Acoustics Meeting, Mariehamn, Åland, Finland, 8–10 June 2004. [Google Scholar]
- Morimoto, M. The contribution of two ears to the perception of vertical angle in sagittal planes. J. Acoust. Soc. Am. 2001, 109, 1596–1603. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Rix, A.W.; Beerends, J.G.; Hollier, M.P.; Hekstra, A. Perceptual evaluation of speech quality (PESQ)—A new method for speech quality assessment of telephone networks and codecs. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT, USA, 7–11 May 2001; pp. 749–752. [Google Scholar] [CrossRef]
- Oosterom, A.V.; Strackee, J. The solid angle of a plane triangle. IEEE Trans. Biomed. Eng. 1983, BME-30, 125–126. [Google Scholar] [CrossRef]
- International Organization for Standardization. ISO 226:2003; Normal Equal-Loudness-Level Contours. International Organization for Standardization: Geneva, Switzerland, 2003.
- Jesteadt, W.; Valente, D.L.; Joshi, S.N.; Schmid, K.K. Perceptual weights for loudness judgments of six-tone complexes. J. Acoust. Soc. Am. 2014, 136, 728–735. [Google Scholar] [CrossRef]
- Katz, B.F.G.; Parseihian, G. Perceptually based head-related transfer function database optimization. J. Acoust. Soc. Am. 2012, 131, EL99–EL105. [Google Scholar] [CrossRef]
- Hartmann, W.M.; Rakerd, B. Auditory spectral discrimination and the localization of clicks in the sagittal plane. J. Acoust. Soc. Am. 1993, 94, 2083–2092. [Google Scholar] [CrossRef] [PubMed]
- Hammershøi, D.; Møller, H. Sound transmission to and within the human ear canal. J. Acoust. Soc. Am. 1996, 100, 408–427. [Google Scholar] [CrossRef] [Green Version]
- Zwicker, E.; Fastl, H. Psychoacoustics: Facts and Models; Springer Science and Business Media: Berlin/Heidelberg, Germany, 2013; Volume 22, pp. 1–471. [Google Scholar]
- Bauer, B.B.; Torick, E.L. Researches in loudness measurement. IEEE Trans. Audio Electroacoust. 1966, 14, 141–151. [Google Scholar] [CrossRef]
- Bücklein, R. The audibility of frequency response irregularities. J. Audio Eng. Soc. 1981, 29, 126–131. [Google Scholar]
- Moore, B.C.J.; Glasberg, B.R. Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. J. Acoust. Soc. Am. 1983, 74, 750–753. [Google Scholar] [CrossRef] [PubMed]
- Kabal, P. An Examination and Interpretation of ITU-R BS. 1387: Perceptual Evaluation of Audio Quality; Technical Report; McGill University: Montreal, QC, Canada, 2002. [Google Scholar]
- Härmä, A.; Palomäki, K. HUTear—A free Matlab toolbox for modeling of auditory system. In Proceedings of the Matlab DSP Conference, Tampere, Finland, 16–17 November 1999; pp. 96–99. [Google Scholar]
- Farina, A. Simultaneous measurement of impulse response and distortion with a swept-sine technique. In Proceedings of the 108th Convention of the Audio Engineering Society, Paris, France, 19–22 February 2000. [Google Scholar]
- Kirkeby, O.; Nelson, P.A. Digital filter design for inversion problems in sound reproduction. J. Audio Eng. Soc. 1999, 47, 583–595. [Google Scholar]
- Hatziantoniou, P.D.; Mourjopoulos, J.N. Generalized fractional-octave smoothing of audio and acoustic responses. J. Audio Eng. Soc. 2000, 48, 259–280. [Google Scholar]
- International Telecommunication Union. ITU-R BS.1534-2: Method for the Subjective Assessment of Intermediate Quality Level of Audio Systems BS Series Broadcasting Service; Technical Report; International Telecommunication Union: Geneva, Switzerland, 2015. [Google Scholar]
- Schoeffler, M.; Bartoschek, S.; Stöter, F.R.; Roess, M.; Westphal, S.; Edler, B.; Herre, J. webMUSHRA—A comprehensive framework for web-based listening tests. J. Open Res. Softw. 2018, 6, 1–8. [Google Scholar] [CrossRef] [Green Version]
- Bernschütz, B. A spherical far field HRIR/HRTF compilation of the Neumann KU 100. In Proceedings of the Fortschritte der Akustik–AIA-DAGA 2013, Merano, Italy, 18–21 March 2013; pp. 592–595. [Google Scholar]
- Schäfer, M.; Bahram, M.; Vary, P. An extension of the PEAQ measure by a binaural hearing model. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 8164–8168. [Google Scholar]
- Fleßner, J.H.; Huber, R.; Ewert, S.D. Assessment and prediction of binaural aspects of audio quality. J. Audio Eng. Soc. 2017, 65, 929–942. [Google Scholar] [CrossRef]
- Baby, D.; Van Den Broucke, A.; Verhulst, S. A convolutional neural-network model of human cochlear mechanics and filter tuning for real-time applications. Nat. Mach. Intell. 2021, 3, 134–143. [Google Scholar] [CrossRef]
Feature | 3 kHz | 10 kHz |
---|---|---|
BSD (dB) | 0.48 | 1.01 |
PEAQ (ODG) | 0.86 | 0.71 |
CLL (Phons) | 0.62 | 0.48 |
PBC (sones) | 1.12 | 0.26 |
Feature | 65 dB | 45 dB |
---|---|---|
BSD (dB) | 0.07 | 0.07 |
PEAQ (ODG) | 0.19 | 0.19 |
CLL (Phons) | 0.39 | 0.13 |
PBC (sones) | 0.46 | 0.11 |
Feature | Peak | Notch |
---|---|---|
BSD (dB) | 0.15 | 0.15 |
PEAQ (ODG) | 1.08 | 0.95 |
CLL (Phons) | 0.50 | 0.43 |
PBC (sones) | 0.91 | 0.63 |
Feature | 1 kHz | 5.5 kHz |
---|---|---|
BSD (dB) | 0.15 | 0.15 |
PEAQ (ODG) | 1.08 | 0.27 |
CLL (Phons) | 0.50 | 0.15 |
PBC (sones) | 0.91 | 0.12 |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|
(°) | 180 | 50 | 118 | 0 | 180 | 62 | 130 | 0 |
(°) | 64 | 46 | 16 | 0 | 0 | −16 | −46 | −64 |
Correlation | r | p |
---|---|---|
BSD | −0.83 | <0.001 |
PEAQ | −0.52 | <0.001 |
CLL | −0.90 | <0.001 |
PBC | −0.95 | <0.001 |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|
r (BSD) | −0.96 | −0.98 | −0.90 | −0.97 | −0.93 | −0.96 | −0.98 | −0.71 |
r (PEAQ) | −0.87 | −0.24 | −0.39 | −0.85 | −0.96 | −0.50 | −0.21 | −0.20 |
r (CLL) | −0.97 | −0.96 | −0.96 | −0.95 | −0.99 | −0.94 | −0.98 | −0.92 |
r (PBC) | −0.99 | −0.99 | −0.95 | −0.99 | −0.99 | −0.98 | −0.98 | −0.97 |
p (BSD) | <0.001 | <0.001 | 0.006 | <0.001 | 0.002 | <0.001 | <0.001 | 0.073 |
p (PEAQ) | 0.010 | 0.600 | 0.386 | 0.016 | <0.001 | 0.247 | 0.651 | 0.667 |
p (CLL) | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | 0.002 | <0.001 | 0.004 |
p (PBC) | <0.001 | <0.001 | 0.001 | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
McKenzie, T.; Armstrong, C.; Ward, L.; Murphy, D.T.; Kearney, G. Predicting the Colouration between Binaural Signals. Appl. Sci. 2022, 12, 2441. https://doi.org/10.3390/app12052441
McKenzie T, Armstrong C, Ward L, Murphy DT, Kearney G. Predicting the Colouration between Binaural Signals. Applied Sciences. 2022; 12(5):2441. https://doi.org/10.3390/app12052441
Chicago/Turabian StyleMcKenzie, Thomas, Cal Armstrong, Lauren Ward, Damian T. Murphy, and Gavin Kearney. 2022. "Predicting the Colouration between Binaural Signals" Applied Sciences 12, no. 5: 2441. https://doi.org/10.3390/app12052441
APA StyleMcKenzie, T., Armstrong, C., Ward, L., Murphy, D. T., & Kearney, G. (2022). Predicting the Colouration between Binaural Signals. Applied Sciences, 12(5), 2441. https://doi.org/10.3390/app12052441