Abstract
This paper presents and evaluates a method of audio compression specifically designed to exploit the natural repetition that occurs within musical audio. Our system is entitled Audio Compression Exploiting Repetition (ACER). ACER is a perceptual technique, but one that does not consider exploiting masking, but rather attempts to apply the principles of Lempel-Ziv and run-length encoding, by substituting audio sequences for numeric or character strings. The ACER procedure applies a pseudo exhaustive search process and spectral difference grading. Since ACER exploits musical structure, the amount of data reduction achieved varies from piece-to-piece. The system is described before results on a corpus of material are presented. The analysis shows moderate amounts of data reduction take place whilst the system is operating within parameters designed to maintain high-levels of perceptual audio quality, whilst lower rates of perceptual quality yield greater data reduction. Objective quality evaluations are conducted that reveal degradation in fidelity that is relative to the compression parameters.
Similar content being viewed by others
References
Aucouturier J-J, Pachet F, Sandler M (2005) “The Way It Sounds”: timbre models for analysis and retrieval of music signals. IEEE Trans Multimedia 6(7):1028–1035
Bello JP (2011) Measuring structural similarity in music. IEEE Trans AudioSpeech Lang Process 7(19):2013–2025
Bogdanov D, Serrà J, Wack N, Herrera P, Serra X (2011) Unifying low-level and high-level music similarity measures. IEEE Trans Multimedia 4(14):687–701
Brandenburg K (1999) MP3 and AAC explained, Proc AES 17th International Conference on High Quality Audio Coding. Audio Engineering Society, New York, NY, USA
Cai R, Lu L, Hanjalic A (2008) Co-clustering for auditory scene categorization. IEEE Trans Multimedia 4(10):596–606
Cheng K, Nazer B, Uppuluri J, Verret R Beat this >beat detection algorithm. Electrical & Computer Engineering Department, Rice University, Texas, USA. Available at: http://www.owlnet.rice.edu/~elec301/Projects01/beat_sync/beatalgo.html [Last accessed 10th May 2012]
Cunningham S (2005) Waveform analysis for high-quality loop-based audio distribution. Proc of ISCA 20th International Conference on Computers and Their Applications, New Orleans, USA
Cunningham S, Grout V (2005) Play it again, babbage!—a framework to exploit musical repetition for high-quality audio compression. Proc of IADIS—International Conference on WWW/Internet, Lisbon, Portugal
Cunningham S, Grout V (2007) Advances in similarity-based audio compression. In: Bleimann UG, Dowland PS, Furnell SM (eds) Proc of the Third Collaborative Research Symposium on Security, E-Learning, Internet and Networking, ISNRG, Plymouth, UK
Cunningham S, Grout V (2009) Audio Compression Exploiting Repetition (ACER): challenges and solutions. Proc of Third International Conference on Internet Technologies and Applications, Glyndwr University, Wrexham, Wales, UK
Dubnov S (2008) Unified view of prediction and repetition structure in audio signals with application to interest point detection. IEEE Trans AudioSpeech Lang Process 16(2):327–337
Foote J Visualizing music and audio using self-similarity. Proc of seventh ACM international conference on Multimedia (Part 1). Orlando, Florida, USA, pp 77–80
Foster P, Klapuri A, Dixon S (2012) A method for identifying repetition structure in musical audio based on time series prediction. Proc of 20th European Signal Processing Conference (EUSIPCO 2012), Bucharest, Romania, pp 1299–1303
ITU-R (2001) Recommendation ITU-R BS.1387-1, method for objective measurements of perceived audio quality. International Telecommunication Union—Radio communication Sector (ITU-R), Geneva
ITU-R (2003) Recommendation ITU-R BS.1284-1, general methods for the subjective assessment of sound quality. International Telecommunication Union Radio communication Sector (ITU-R), Geneva
Jensen K (2007) Multiple scale music segmentation using rhythm, timbre, and harmony. EURASIP J Adv Sig Process 2007(1):159–159
Kabal P (2004) TSP Lab Software. Electrical & Computer Engineering Department, McGill University, Canada. Available at: http://www-mmsp.ece.mcgill.ca/Documents/Software/index.html [Last accessed 20th July 2012]
Kashino K, Kurozumi T, Murase H (2003) A quick search method for audio and video signals based on histogram pruning. IEEE Trans Multimedia 3(5):348–357
Kirovski D, Landau Z (2005) Parameter analysis for the generalized LZ compression of audio. Proc of Data Compression Conference DCC 2005, Snowbird, UT, USA, pp 465
Kirovski D, Landau Z (2007) Generalized Lempel–Ziv compression for audio. IEEE Trans AudioSpeech Lang Process 15(2):509–518
Kirovski D, Landau Z (2009) Generalized Lempel-Ziv compression for multimedia signals. U.S. Patent 7505897, March 17
Kurth F, Muller M (2008) Efficient index-based audio matching. IEEE Trans AudioSpeech Lang Process 2(16):382–395
Lagrange M, Raspaud M (2010) Spectral similarity metrics for sound source formation based on the common variation cue. Multimedia Tools Appl 1(48):185–205, Springer
Lyons RG (1999) Understanding digital signal processing. Addison-Wesley, Reading
Marolt M (2006) A mid-level melody-based representation for calculating audio similarity. Proc. of 7th International Society for Music Information Retrival (ISMIR) conference, Victoria, Canada
Moffitt J (2001) Ogg Vorbis—open, free audio—set your media free. Linux J 81:(January), Specialized Systems Consultants Inc, Seattle, WA, USA
Muller M, Nanzhu J, Grosche P (2013) A robust fitness measure for capturing repetitions in music recordings with applications to audio thumbnailing. IEEE Trans AudioSpeech Lang Process 3(21):531–543
Novello A, McKinney MF, Kohlrausch A (2006) Perceptual evaluation of music similarity. Proc of 7th International Society for Music Information Retrival (ISMIR) conference, Victoria, Canada
Paulus J, Klapuri A (2009) Music structure analysis using a probabilistic fitness measure and a greedy search algorithm. IEEE Trans AudioSpeech Lang Process 17(6):1159–1170
Pohle T, Knees P, Schedl M, Widmer G (2006) Independent component analysis for music similarity computation. Proc of 7th International Society for Music Information Retrival (ISMIR) conference, Victoria, Canada
Rafailidis D, Nanopoulos A, Manolopoulos Y (2011) Nonlinear dimensionality reduction for efficient and effective audio similarity searching. Multimedia Tools Appl 3(51):881–895, Springer
Rao VM, Pohlmann KC (2006) Audio compression using repetitive structures. U.S. Patent 20060173692, August 3
Schnitzer D, Flexer A, Widmer G (2012) A fast audio similarity retrieval method for millions of music tracks. Multimedia Tools Appl 1(58):23–40, Springer
Sturm B, Daudet I (2011) On similarity search in audio signals using adaptive sparse approximations. Adaptive multimedia retrieval. Understanding media and adapting to the user, LCNS, volume 6535. Springer, pp 59–71
Tabus I, Tabus V, Astola J (2012) Information theoretic methods for aligning audio signals using chromagram representations. Proc of 5th International Symposium on Communications Control and Signal Processing (ISCCSP), Rome, Italy, pp 1–4
Terrell MJ, Fzekas G, Simpson AJR, Smith J, Dixon S (2012) Listening level changes music similarity. Proc of 13th International Society for Music Information Retrival (ISMIR) conference, Porto, Portugal
Various Artists (2011) Now that’s what I call music! 80. Compilation [Double Audio CD]. EMI TV
Zapata G (2012) Efficient detection of exact redundancies in audio signals. Proc of 125th AES Convention, San Francisco, CA, USA
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cunningham, S., Grout, V. Data reduction of audio by exploiting musical repetition. Multimed Tools Appl 72, 2299–2320 (2014). https://doi.org/10.1007/s11042-013-1504-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-013-1504-y