Using duration models to reduce fragmentation in audio segmentation

Samer Abdallah¹,
Mark Sandler¹,
Christophe Rhodes² &
…
Michael Casey²

702 Accesses
Explore all metrics

Abstract

We investigate explicit segment duration models in addressing the problem of fragmentation in musical audio segmentation. The resulting probabilistic models are optimised using Markov Chain Monte Carlo methods; in particular, we introduce a modification to Wolff’s algorithm to make it applicable to a segment classification model with an arbitrary duration prior. We apply this to a collection of pop songs, and show experimentally that the generated segmentations suffer much less from fragmentation than those produced by segmentation algorithms based on clustering, and are closer to an expert listener’s annotations, as evaluated by two different performance measures.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Abdallah, S., Noland, K., Sandler, M., Casey, M., & Rhodes, C. (2005). Theory and evaluation of a Bayesian music structure extractor. In J.D. Reiss & G.A. Wiggins (Eds), Proceedings of the sixth international conference on music information retrieval, (pp. 420–425).
Allen, J. (1984). Towards a general theory of action and time. Artificial Intelligence, 23, 123–154.
Article MATH Google Scholar
Aucouturier, J.-J., Pachet, F., & Sandler, M. (2005). The way it sounds: Timbre models for analysis and retrieval of polyphonic music signals. IEEE Transactions of Multimedia.
Barbu, A. & Zhu, S.-C. (2004). Cluster sampling and its applications in image processing. Technical Report 409, Department of Statistics, UCLA.
Bello, J. P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., & Sandler, M. (2004). A tutorial on onset detection in music signals. IEEE Transactions in Speech and Audio Processing, 13(5), 1035–1047.
Article Google Scholar
Brown, J. C. (1991). Calculation of a constant Q spectral transform. Journal of the Acoustic Society of America, 89(1), 425–434.
Article Google Scholar
Dannenberg, R., & Hu, N. (2002). Discovering musical structure in audio recordings. In Music and artifical intelligence: second international conference. Edinburgh.
Downie, S., & Nelson, M. (2000). Evaluation of a simple and effective music information retrieval method. In Proceedings of the ACM SIGIR (pp. 73–80).
Eckmann, J.-P., Kamphorst, S. O., & Ruelle, D. (1987). Recurrence plots of dynamical systems. Europhysics Letters, 5, 973–977.
Google Scholar
Foote, J. (1999). Visualizing music and audio using self-similarity. In ACM Multimedia, vol. 1, pp. 77–80.
Galton, A. (Ed) (1987). Temporal logics and their applications. Academic Press, London.
MATH Google Scholar
Goto, M. (2003). A chorus-section detecting method for musical audio signals. In Proc. ICASSP, vol. V, pp. 437–440.
Hainsworth, S., & Macleod, M. (2003). Onset detection in musical audio signals. In Proc. ICMC.
Hofmann, T., & Buhmann, J. M. (1997). Pairwise data clustering by deterministic annealing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(1).
Huang, Q., & Dom, B. (1995). Quantitative methods of evaluating image segmentation. In Proc. IEEE Intl. Conf. on Image Processing (ICIP’95).
Logan, B. (2000). Mel frequency cepstral coefficients for music modeling. In International Symposium on Music Information Retrieval.
Logan, B., & Chu, S. (2000). Music summarization using key phrases. In International Conference on Acoustics, Speech and Signal Processing.
Lu, L., Wang, M., & Zhang, H. (2004). Repeating pattern discovery and structure analysis from acoustic music data. In 6th ACM SIGMM International Workshop on Multimedia Information Retrieval.
Maddage, N., Changsheng, X., Kankanhalli, M., & Shao, X. (2004). Content-based music structure analysis with applications to music semantics understanding. In 6th ACM SIGMM International Workshop on Multimedia Information Retrieval.
Merhav, N., & Lee, C.-H. (1993). On the asymptotic statistical behaviour of empirical cepstral coefficients. IEEE Transactions on Signal Processing, 41(5), 1990–1993.
Article MATH Google Scholar
Orio, N., & Neve, G. (2005). Experiments on segmentation techniques for music documents indexing. In J. D. Reiss & G. A. Wiggins (Eds), Proceedings of the sixth international conference on music information retrieval (pp. 624–627).
Peeters, G., Burthe, A. L., & Rodet, X. (2002). Toward automatic music audio summary generation from signal analysis. In International Symposium on Music Information Retrieval.
Puzicha, J., Hofmann, T., & Buhmann, J. M. (1999). Histogram clustering for unsupervised image segmentation. Proceedings of CVPR ’99.
Rabiner, L. R. (1989). A tutorial on hidden markov models and selection applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.
Google Scholar
Robert, C. P., & Casella, G. (1999). Monte carlo statistical methods. Springer, New York.
MATH Google Scholar
Shoham, Y. (1988). Reasoning about change: time and causation from the standpoint of artificial intelligence. MIT Press, Cambridge, MA.
Google Scholar
Swendsen, R. H., & Wang, J.-S. (1987). Non-universal critical dynamics in Monte-Carlo simulations. Physical Review Letters, 58(2), 86–88.
Article Google Scholar
Wakefield, G. H. (1999). Mathematical representation of joint time-chroma distributions. In Advanced Signal Processing Algorithms, Architectures, and Implementations, vol. 3807, IX, pp. 637–645. SPIE.
Wolff, U. (1989). Collective Monte Carlo updating for spin systems. Physical Review Letters, 62(4), 361–364.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Queen Mary, University of London, Mile End Road, London, E1 4NS
Samer Abdallah & Mark Sandler
Goldsmiths College, University of London, New Cross, London, SE14 6NW
Christophe Rhodes & Michael Casey

Authors

Samer Abdallah
View author publications
You can also search for this author in PubMed Google Scholar
Mark Sandler
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Rhodes
View author publications
You can also search for this author in PubMed Google Scholar
Michael Casey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samer Abdallah.

Additional information

Editor: Gerhard Widmer

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abdallah, S., Sandler, M., Rhodes, C. et al. Using duration models to reduce fragmentation in audio segmentation. Mach Learn 65, 485–515 (2006). https://doi.org/10.1007/s10994-006-0586-4

Download citation

Received: 27 July 2006
Revised: 04 October 2006
Accepted: 05 October 2006
Published: 14 November 2006
Issue Date: December 2006
DOI: https://doi.org/10.1007/s10994-006-0586-4

Using duration models to reduce fragmentation in audio segmentation

Abstract

Article PDF

Similar content being viewed by others

The Influence of the Rhythm with the Pitch on Melodic Segmentation

Probabilistic Segmentation of Musical Sequences Using Restricted Boltzmann Machines

On Hierarchical Clustering of Spectrogram

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Using duration models to reduce fragmentation in audio segmentation

Abstract

Article PDF

Similar content being viewed by others

The Influence of the Rhythm with the Pitch on Melodic Segmentation

Probabilistic Segmentation of Musical Sequences Using Restricted Boltzmann Machines

On Hierarchical Clustering of Spectrogram

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords