Prediction-driven computational auditory scene analysis

January 1996

Author:
Daniel P. W. Ellis,
Supervisor:
Barry L. Vercoe

Publisher:

Massachusetts Institute of Technology
201 Vassar Street, W59-200 Cambridge, MA
United States

Order Number:AAI0597425

Pages:

Purchase on ProQuest

Bibliometrics

Abstract

The sound of a busy environment, such as a city street, gives rise to a perception of numerous distinct events in a human listener--the 'auditory scene analysis' of the acoustic information. Recent advances in the understanding of this process from experimental psychoacoustics have led to several efforts to build a computer model capable of the same function. This work is known as 'computational auditory scene analysis'.

The dominant approach to this problem has been as a sequence of modules, the output of one forming the input to the next. Sound is converted to its spectrum, cues are picked out, and representations of the cues are grouped into an abstract description of the initial input. This 'data-driven' approach has some specific weaknesses in comparison to the auditory system: it will interpret a given sound in the same way regardless of its context, and it cannot 'infer' the presence of a sound for which direct evidence is hidden by other components.

The 'prediction-driven' approach is presented as an alternative, in which analysis is a process of reconciliation between the observed acoustic features and the predictions of an internal model of the sound-producing entities in the environment. In this way, predicted sound events will form part of the scene interpretation as long as they are consistent with the input sound, regardless of whether direct evidence is found. A blackboard-based implementation of this approach is described which analyzes dense, ambient sound examples into a vocabulary of noise clouds, transient clicks, and a correlogram-based representation of wide-band periodic energy called the weft.

The system is assessed through experiments that firstly investigate subjects' perception of distinct events in ambient sound examples, and secondly collect quality judgments for sound events resynthesized by the system. Although rated as far from perfect, there was good agreement between the events detected by the model and by the listeners. In addition, the experimental procedure does not depend on special aspects of the algorithm (other than the generation of resyntheses), and is applicable to the assessment and comparison of other models of human auditory organization. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

Cited By

Contributors

Daniel P W Ellis
Google LLC
- Publication Years1996 - 2018
- Publication counts49
- Citation count1,273
- Available for Download13
- Downloads (cumulative)9,438
- Downloads (12 months)335
- Downloads (6 weeks)47
- Average Downloads per Article726
- Average Citation per Article26
View Full Profile
Barry L Vercoe
Massachusetts Institute of Technology
- Publication Years1999 - 2007
- Publication counts8
- Citation count48
- Available for Download4
- Downloads (cumulative)1,734
- Downloads (12 months)97
- Downloads (6 weeks)17
- Average Downloads per Article434
- Average Citation per Article6
View Full Profile

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Recommendations

Residue-driven architecture for computational auditory scene analysis
IJCAI'95: Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1

The Residue-Driven Architecture presented here is a model of auditory stream segregation from input sounds. A subsystem to extract auditory streams by using some sound attributes is called an agency and the design of each agency is based on the residue-...
Computational auditory scene induction
Computational auditory saliency

Browse Theses

Sections

Cited By

Residue-driven architecture for computational auditory scene analysis

Computational auditory scene induction

Computational auditory saliency

Sections

Cited By

Save to Binder

Recommendations

Residue-driven architecture for computational auditory scene analysis

Computational auditory scene induction

Computational auditory saliency