[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
Prediction-driven computational auditory scene analysis
Publisher:
  • Massachusetts Institute of Technology
  • 201 Vassar Street, W59-200 Cambridge, MA
  • United States
Order Number:AAI0597425
Pages:
1
Reflects downloads up to 14 Dec 2024Bibliometrics
Skip Abstract Section
Abstract

The sound of a busy environment, such as a city street, gives rise to a perception of numerous distinct events in a human listener--the 'auditory scene analysis' of the acoustic information. Recent advances in the understanding of this process from experimental psychoacoustics have led to several efforts to build a computer model capable of the same function. This work is known as 'computational auditory scene analysis'.

The dominant approach to this problem has been as a sequence of modules, the output of one forming the input to the next. Sound is converted to its spectrum, cues are picked out, and representations of the cues are grouped into an abstract description of the initial input. This 'data-driven' approach has some specific weaknesses in comparison to the auditory system: it will interpret a given sound in the same way regardless of its context, and it cannot 'infer' the presence of a sound for which direct evidence is hidden by other components.

The 'prediction-driven' approach is presented as an alternative, in which analysis is a process of reconciliation between the observed acoustic features and the predictions of an internal model of the sound-producing entities in the environment. In this way, predicted sound events will form part of the scene interpretation as long as they are consistent with the input sound, regardless of whether direct evidence is found. A blackboard-based implementation of this approach is described which analyzes dense, ambient sound examples into a vocabulary of noise clouds, transient clicks, and a correlogram-based representation of wide-band periodic energy called the weft.

The system is assessed through experiments that firstly investigate subjects' perception of distinct events in ambient sound examples, and secondly collect quality judgments for sound events resynthesized by the system. Although rated as far from perfect, there was good agreement between the events detected by the model and by the listeners. In addition, the experimental procedure does not depend on special aspects of the algorithm (other than the generation of resyntheses), and is applicable to the assessment and comparison of other models of human auditory organization. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

Cited By

  1. Li C, Zhu L, Xu S, Gao P and Xu B CBLDNN-Based Speaker-Independent Speech Separation Via Generative Adversarial Training 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (711-715)
  2. Yu D, Kolbæk M, Tan Z and Jensen J Permutation invariant training of deep models for speaker-independent multi-talker speech separation 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (241-245)
  3. Kolbaek M, Yu D, Tan Z, Jensen J, Kolbaek M, Dong Yu , Zheng-Hua Tan and Jensen J (2017). Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks, IEEE/ACM Transactions on Audio, Speech and Language Processing, 25:10, (1901-1913), Online publication date: 1-Oct-2017.
  4. Le Roux J, Kameoka H, Ono N, de Cheveigné A and Sagayama S (2011). Computational auditory induction as a missing-data model-fitting problem with Bregman divergence, Speech Communication, 53:5, (658-676), Online publication date: 1-May-2011.
  5. Vincent E and Plumbley M (2008). Efficient Bayesian inference for harmonic models via adaptive posterior factorization, Neurocomputing, 72:1-3, (79-87), Online publication date: 1-Dec-2008.
  6. Chu S Unstructured audio classification for environment recognition Proceedings of the 23rd national conference on Artificial intelligence - Volume 3, (1845-1846)
  7. Wang D and Hu G Cocktail party processing Proceedings of the 2008 IEEE world conference on Computational intelligence: research frontiers, (333-348)
  8. Woodruff J and Pardo B (2007). Using pitch, amplitude modulation, and spatial cues for separation of harmonic instruments from stereo music recordings, EURASIP Journal on Advances in Signal Processing, 2007:1, (162-162), Online publication date: 1-Jan-2007.
  9. Velikic G and Bocko M Employing the relative phase of overtones in single mixture musical source separation Proceedings of the Ninth IASTED International Conference on Signal and Image Processing, (136-141)
  10. Vincent E and Plumbley M Single-Channel mixture decomposition using bayesian harmonic models Proceedings of the 6th international conference on Independent Component Analysis and Blind Signal Separation, (722-730)
  11. ACM
    Oermann A, Lang A and Dittmann J Verifier-tuple for audio-forensic to determine speaker environment Proceedings of the 7th workshop on Multimedia and security, (57-62)
  12. Büchler M, Allegro S, Launer S and Dillier N (2005). Sound classification in hearing aids inspired by auditory scene analysis, EURASIP Journal on Advances in Signal Processing, 2005, (2991-3002), Online publication date: 1-Jan-2005.
  13. Haykin S and Chen Z (2005). The Cocktail Party Problem, Neural Computation, 17:9, (1875-1902), Online publication date: 1-Sep-2005.
  14. Paiva R, Mendes T and Cardoso A An auditory model based approach for melody detection in polyphonic musical recordings Proceedings of the Second international conference on Computer Music Modeling and Retrieval, (21-40)
  15. ACM
    Sundaram H and Chang S Determining computable scenes in films and their structures using audio-visual memory models Proceedings of the eighth ACM international conference on Multimedia, (95-104)
  16. Jalics L, Hemami H, Clymer B and Groff A (1997). Rocking, Tapping and Stepping, Autonomous Robots, 4:3, (227-242), Online publication date: 1-Jul-1997.
Contributors
  • Google LLC
  • Massachusetts Institute of Technology
Please enable JavaScript to view thecomments powered by Disqus.

Recommendations