Abstract
We investigate explicit segment duration models in addressing the problem of fragmentation in musical audio segmentation. The resulting probabilistic models are optimised using Markov Chain Monte Carlo methods; in particular, we introduce a modification to Wolff’s algorithm to make it applicable to a segment classification model with an arbitrary duration prior. We apply this to a collection of pop songs, and show experimentally that the generated segmentations suffer much less from fragmentation than those produced by segmentation algorithms based on clustering, and are closer to an expert listener’s annotations, as evaluated by two different performance measures.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Abdallah, S., Noland, K., Sandler, M., Casey, M., & Rhodes, C. (2005). Theory and evaluation of a Bayesian music structure extractor. In J.D. Reiss & G.A. Wiggins (Eds), Proceedings of the sixth international conference on music information retrieval, (pp. 420–425).
Allen, J. (1984). Towards a general theory of action and time. Artificial Intelligence, 23, 123–154.
Aucouturier, J.-J., Pachet, F., & Sandler, M. (2005). The way it sounds: Timbre models for analysis and retrieval of polyphonic music signals. IEEE Transactions of Multimedia.
Barbu, A. & Zhu, S.-C. (2004). Cluster sampling and its applications in image processing. Technical Report 409, Department of Statistics, UCLA.
Bello, J. P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., & Sandler, M. (2004). A tutorial on onset detection in music signals. IEEE Transactions in Speech and Audio Processing, 13(5), 1035–1047.
Brown, J. C. (1991). Calculation of a constant Q spectral transform. Journal of the Acoustic Society of America, 89(1), 425–434.
Dannenberg, R., & Hu, N. (2002). Discovering musical structure in audio recordings. In Music and artifical intelligence: second international conference. Edinburgh.
Downie, S., & Nelson, M. (2000). Evaluation of a simple and effective music information retrieval method. In Proceedings of the ACM SIGIR (pp. 73–80).
Eckmann, J.-P., Kamphorst, S. O., & Ruelle, D. (1987). Recurrence plots of dynamical systems. Europhysics Letters, 5, 973–977.
Foote, J. (1999). Visualizing music and audio using self-similarity. In ACM Multimedia, vol. 1, pp. 77–80.
Galton, A. (Ed) (1987). Temporal logics and their applications. Academic Press, London.
Goto, M. (2003). A chorus-section detecting method for musical audio signals. In Proc. ICASSP, vol. V, pp. 437–440.
Hainsworth, S., & Macleod, M. (2003). Onset detection in musical audio signals. In Proc. ICMC.
Hofmann, T., & Buhmann, J. M. (1997). Pairwise data clustering by deterministic annealing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(1).
Huang, Q., & Dom, B. (1995). Quantitative methods of evaluating image segmentation. In Proc. IEEE Intl. Conf. on Image Processing (ICIP’95).
Logan, B. (2000). Mel frequency cepstral coefficients for music modeling. In International Symposium on Music Information Retrieval.
Logan, B., & Chu, S. (2000). Music summarization using key phrases. In International Conference on Acoustics, Speech and Signal Processing.
Lu, L., Wang, M., & Zhang, H. (2004). Repeating pattern discovery and structure analysis from acoustic music data. In 6th ACM SIGMM International Workshop on Multimedia Information Retrieval.
Maddage, N., Changsheng, X., Kankanhalli, M., & Shao, X. (2004). Content-based music structure analysis with applications to music semantics understanding. In 6th ACM SIGMM International Workshop on Multimedia Information Retrieval.
Merhav, N., & Lee, C.-H. (1993). On the asymptotic statistical behaviour of empirical cepstral coefficients. IEEE Transactions on Signal Processing, 41(5), 1990–1993.
Orio, N., & Neve, G. (2005). Experiments on segmentation techniques for music documents indexing. In J. D. Reiss & G. A. Wiggins (Eds), Proceedings of the sixth international conference on music information retrieval (pp. 624–627).
Peeters, G., Burthe, A. L., & Rodet, X. (2002). Toward automatic music audio summary generation from signal analysis. In International Symposium on Music Information Retrieval.
Puzicha, J., Hofmann, T., & Buhmann, J. M. (1999). Histogram clustering for unsupervised image segmentation. Proceedings of CVPR ’99.
Rabiner, L. R. (1989). A tutorial on hidden markov models and selection applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.
Robert, C. P., & Casella, G. (1999). Monte carlo statistical methods. Springer, New York.
Shoham, Y. (1988). Reasoning about change: time and causation from the standpoint of artificial intelligence. MIT Press, Cambridge, MA.
Swendsen, R. H., & Wang, J.-S. (1987). Non-universal critical dynamics in Monte-Carlo simulations. Physical Review Letters, 58(2), 86–88.
Wakefield, G. H. (1999). Mathematical representation of joint time-chroma distributions. In Advanced Signal Processing Algorithms, Architectures, and Implementations, vol. 3807, IX, pp. 637–645. SPIE.
Wolff, U. (1989). Collective Monte Carlo updating for spin systems. Physical Review Letters, 62(4), 361–364.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Gerhard Widmer
Rights and permissions
About this article
Cite this article
Abdallah, S., Sandler, M., Rhodes, C. et al. Using duration models to reduce fragmentation in audio segmentation. Mach Learn 65, 485–515 (2006). https://doi.org/10.1007/s10994-006-0586-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-006-0586-4