[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/ICASSP.2016.7471747guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Equalization matching of speech recordings in real-world environments

Published: 01 March 2016 Publication History

Abstract

When different parts of speech content such as voice-overs and narration are recorded in real-world environments with different acoustic properties and background noise, the difference in sound quality between the recordings is typically quite audible and therefore undesirable. We propose an algorithm to equalize multiple such speech recordings so that they sound like they were recorded in the same environment. As the timbral content of the speech and background noise typically differ considerably, a simple equalization matching results in a noticeable mismatch in the output signals. A single equalization filter affects both timbres equally and thus cannot disambiguate the competing matching equations of each source. We propose leveraging speech enhancement methods in order to separate speech and background noise, independently apply equalization filtering to each source, and recombine the outputs. By independently equalizing the separated sources, our method is able to better disambiguate the matching equations associated with each source. Therefore the resulting matched signals are perceptually very similar. Additionally, by retaining the background noise in the final output signals, most artifacts from speech enhancement methods are considerably reduced and in general perceptually masked. Subjective listening tests show that our approach significantly outperforms simple equalization matching.

6. References

[1]
R. A. Katz, Mastering Audio: The Art and the Science, Focal Press, Burlington, MA, 3rd edition, 2014.
[2]
S. Savage, Mixing and Mastering in the Box: the Guide to Making Great Mixes and Final Masters on your Computer, 2014.
[3]
Apple Inc., Logic Pro X Effects for Mac OSX, 2013.
[4]
Fab Filter Software Instruments, Pro-Q2 Manual, 2014.
[5]
iZotope, Inc., Ozone 6 Help Documentation, 2014.
[6]
E. A. P. Habets, Single- and multi-microphone speech dereverberation using spectral enhancement., Ph.D. thesis, Technische Universiteit Eindhoven, 2007.
[7]
P. A. Naylor and N. D. Gaubitch, Speech dereverberation, Springer Science & Business Media, 2010.
[8]
D. Liang, M. D. Hoffman, and G. J. Mysore, “Speech dereverberation using a learned speech model,” in Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 1871–1875.
[9]
P. C. Loizou, Speech enhancement: theory and practice, vol. 30, CRC Press, Boca Raton, FL, 2007.
[10]
S. M. Ross, A First Course in Probability, Pearson Prentice Hall, 2008.
[11]
D. L. Wang and G. J. Brown, Eds., Computational Auditory Scene Analysis: Principles, Algorithms, and Applications, Wiley/IEEE Press, 2006.
[12]
S. Makino, T.-W. Lee, and H. Sawada, Eds., Blind speech separation, Springer, 2007.
[13]
P. Comon and C. Jutten, Eds., Handbook of Blind Source Separation: Independent component analysis and applications, Academic press, 2010.
[14]
E. Vincent, M. G. Jafari, S. A. Abdallah, M. D. Plumbley, and M. E. Davies, “Probabilistic modeling paradigms for audio source separation,” in Machine Audition: Principles, Algorithms and Systems, Wang W., Ed. IGI Global, 2010.
[15]
P. Smaragdis, Févotte C., G. J. Mysore, N. Mohammadiha, and M. Hoffman, “Static and dynamic source separation using nonnegative factorizations: A unified view,” IEEE Signal Processing Magazine, vol. 31, no. 3, pp. 66–75, 2014.
[16]
P. Scalart and J. V. Filho, “Speech enhancement based on a priori signal to noise estimation,” in Proceedings of the 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1996, vol. 2, pp. 629–632.
[17]
I. Cohen, “Relaxed statistical model for speech enhancement and a priori SNR estimation,” IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, pp. 870–881, 2005.
[18]
J. B. Allen and L. Rabiner, “A unified approach to short-time Fourier analysis and synthesis,” Proceedings of the IEEE, vol. 65, no. 11, pp. 1558–1564, 1977.
[19]
J. Le Roux, N. Ono, and S. Sagayama, “Explicit consistency constraints for STFT spectrograms and their application to phase reconstruction,” in Proceedings of the 2008 Workshop on Statistical and Perceptual Audition (SAPA), 2008, pp. 23–28.
[20]
J. Le Roux and E. Vincent, “Consistent Wiener filtering for audio source separation,” IEEE Signal Processing Letters, vol. 20, no. 3, pp. 217–220, 2013.
[21]
E. Vincent, R. Gribonval, and Févotte C., “Performance measurement in blind audio source separation,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 4, pp. 1462–1469, 2006.
[22]
V. Emiya, E. Vincent, N. Harlander, and V. Hohmann, “Subjective and objective quality assessment of audio source separation,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2046–2057, 2011.
[23]
International Telecommunication Union, “BS. 1534–2: Method for the subjective assessment of intermediate quality levels of coding systems,” 2014.
[24]
E. Vincent, M. Jafari, and M. Plumbley, “Preliminary guidelines for subjective evalutation of audio source separation algorithms,” in UK ICA Research Network Workshop, 2006.
[25]
G. J. Mysore, “Can we automatically transform speech recorded on common consumer devices in real-world environments into professional production quality speech?-a dataset, insights, and challenges,” IEEE Signal Processing Letters, vol. 22, no. 8, pp. 1006–1010, 2015.
[26]
D. C. Montgomery, E. A. Peck, and G. G. Vining, Introduction to linear regression analysis, John Wiley & Sons, 2012.

Cited By

View all
  • (2020)Listening to sounds of silence for speech denoisingProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496532(9633-9648)Online publication date: 6-Dec-2020
  • (2019)Audible PanoramaProceedings of the 2019 CHI Conference on Human Factors in Computing Systems10.1145/3290605.3300851(1-11)Online publication date: 2-May-2019
  • (2018)Scene-aware audio for 360° videosACM Transactions on Graphics10.1145/3197517.320139137:4(1-12)Online publication date: 30-Jul-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
6592 pages

Publisher

IEEE Press

Publication History

Published: 01 March 2016

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Listening to sounds of silence for speech denoisingProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496532(9633-9648)Online publication date: 6-Dec-2020
  • (2019)Audible PanoramaProceedings of the 2019 CHI Conference on Human Factors in Computing Systems10.1145/3290605.3300851(1-11)Online publication date: 2-May-2019
  • (2018)Scene-aware audio for 360° videosACM Transactions on Graphics10.1145/3197517.320139137:4(1-12)Online publication date: 30-Jul-2018
  • (2017)AutoDubProceedings of the 30th Annual ACM Symposium on User Interface Software and Technology10.1145/3126594.3126661(533-538)Online publication date: 20-Oct-2017

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media