[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2009916.2010012acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Picasso - to sing, you must close your eyes and draw

Published: 24 July 2011 Publication History

Abstract

We study the problem of automatically assigning appropriate music pieces to a picture or, in general, series of pictures. This task, commonly referred to as soundtrack suggestion, is non-trivial as it requires a lot of human attention and a good deal of experience, with master pieces distinguished, e.g., with the Academy Award for Best Original Score. We put forward PICASSO to solve this task in a fully automated way. PICASSO makes use of genuine samples obtained from first-class contemporary movies. Hence, the training set can be arbitrarily large and is also inexpensive to obtain but still provides an excellent source of information. At query time, PICASSO employs a three-level algorithm. First, it selects for a given query image a ranking of the most similar screenshots taken, and subsequently, selects for each screenshot the most similar songs to the music played in the movie when the screenshot was taken. Last, it issues a top-K aggregation algorithm to find the overall best suitable songs available. We have created a large training set consisting of over 40,000 image/soundtrack samples obtained from 28 movies and evaluated the suitability of PICASSO by means of a user study.

References

[1]
Stefan Berchtold, Christian Böhm, and Hans-Peter Kriegel. The pyramid-technique: Towards breaking the curse of dimensionality. In SIGMOD Conference, 1998.
[2]
Kevin S. Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. When is "nearest neighbor" meaningful? In ICDT, 1999.
[3]
Rui Cai, Lei Zhang, Feng Jing, Wei Lai, and Wei-Ying Ma. Automated music video generation using web image resource. In ICASSP, 2007.
[4]
Shih-Fu Chang, T. Sikora, and A. Purl. Overview of the mpeg-7 standard. In IEEE Trans. Circuits Syst. Video Techn., June 2001.
[5]
Marco Cristani, Anna Pesarin, Carlo Drioli, Vittorio Murino, Antonio Rodà, Michele Grapulin, and Nicu Sebe. Toward an automatically generated soundtrack from low-level cross-modal correlations for automotive scenarios. In ACM Multimedia, 2010.
[6]
Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on Computational Geometry, 2004.
[7]
D.P.W. Ellis and G.E. Poliner. Identifying 'cover songs' with chroma features and dynamic programming beat tracking. In ICASSP, 2007.
[8]
Ronald Fagin, Ravi Kumar, Mohammad Mahdian, D. Sivakumar, and Erik Vee. Comparing partial rankings. In SIAM J. Discrete Math., 2006.
[9]
Flickr. http://www.flickr.com/.
[10]
Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Similarity search in high dimensions via hashing. In VLDB, 1999.
[11]
Alfred Haar. Zur Theorie der orthogonalen Funktionensysteme. Mathematische Annalen, 69. 1910.
[12]
Xian-Sheng Hua, Lie Lu, and HongJiang Zhang. Automatic music video generation based on temporal pattern analysis. In ACM Multimedia, 2004.
[13]
Xian-Sheng Hua, Lie Lu, and HongJiang Zhang. Optimization-based automated home video editing system. In IEEE Trans. Circuits Syst. Video Techn., 2004.
[14]
Internet movie database. http://www.imdb.com.
[15]
Cheng-Te Li and Man-Kwan Shan. Emotion-based impressionism slideshow with automatic music accompaniment. In ACM Multimedia, 2007.
[16]
David G. Lowe. Distinctive image features from scale-invariant keypoints. In International Journal of Computer Vision, 2004.
[17]
Michael I. Mandel and Dan Ellis. Song-level features and support vector machines for music classification. In ISMIR, 2005.
[18]
B. S. Manjunath, Jens-Rainer Ohm, Vinod V. Vasudevan, and Akio Yamada. Color and texture descriptors. In IEEE Trans. Circuits Syst. Video Techn., 2001.
[19]
Marsyas - music/speech dataset. http://marsyas.info/download/data_sets.
[20]
A. Martin, D. Charlet, and L. Mauuary. Robust speech/non-speech detection using lda applied to mfcc. In ICASSP, 2001.
[21]
Music2ten. http://music2ten.com.
[22]
Irina Rish. An empirical study of the naive Bayes classifier. In IJCAI-01 workshop on "Empirical Methods in AI", 2001.
[23]
H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word recognition. In IEEE Transactions on Acoustics, Speech and Signal Processing,February 1978.
[24]
Hanan Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, 2006.
[25]
Zbigniew R. Struzik and Arno Siebes. The haar wavelet transform in the time series similarity paradigm. In PKDD, 1999.
[26]
G. Tzanetakis and P. Cook. Musical genre classification of audio signals. In IEEE Transactions on Speech and Audio Processing, July 2002.
[27]
George Tzanetakis. Music analysis, retrieval and synthesis of audio signals marsyas. In ACM Multimedia, 2009.
[28]
Jinjun Wang, Changsheng Xu, Engsiong Chng, Lingyu Duan, Kongwah Wan, and Qi Tian. Automatic generation of personalized music sports video. In ACM Multimedia, 2005.
[29]
Songhua Xu, Tao Jin, and Francis Chi-Moon Lau. Automatic generation of music slide show using personal photos. In ISM, 2008.
[30]
Youtube. http://www.youtube.com.

Cited By

View all
  • (2016)Suggesting Sounds for Images from Video CollectionsComputer Vision – ECCV 2016 Workshops10.1007/978-3-319-48881-3_59(900-917)Online publication date: 3-Nov-2016
  • (2016)From Water Music to ‘Underwater Music’: Multimedia Soundtrack Retrieval with Social Mass Media ResourcesResearch and Advanced Technology for Digital Libraries10.1007/978-3-319-43997-6_18(227-238)Online publication date: 10-Aug-2016
  • (2016)Emotion-Based Matching of Music to PlacesEmotions and Personality in Personalized Services10.1007/978-3-319-31413-6_14(287-310)Online publication date: 14-Jul-2016
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
July 2011
1374 pages
ISBN:9781450307574
DOI:10.1145/2009916
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 July 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. automatic music selection
  2. background music
  3. slide show
  4. soundtrack recommendation

Qualifiers

  • Research-article

Conference

SIGIR '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Suggesting Sounds for Images from Video CollectionsComputer Vision – ECCV 2016 Workshops10.1007/978-3-319-48881-3_59(900-917)Online publication date: 3-Nov-2016
  • (2016)From Water Music to ‘Underwater Music’: Multimedia Soundtrack Retrieval with Social Mass Media ResourcesResearch and Advanced Technology for Digital Libraries10.1007/978-3-319-43997-6_18(227-238)Online publication date: 10-Aug-2016
  • (2016)Emotion-Based Matching of Music to PlacesEmotions and Personality in Personalized Services10.1007/978-3-319-31413-6_14(287-310)Online publication date: 14-Jul-2016
  • (2015)Music Recommender SystemsRecommender Systems Handbook10.1007/978-1-4899-7637-6_13(453-492)Online publication date: 2015
  • (2014)Knowledge-based identification of music suited for places of interestInformation Technology & Tourism10.1007/s40558-014-0004-x14:1(73-95)Online publication date: 2-Mar-2014
  • (2013)SRbench--a benchmark for soundtrack recommendation systemsProceedings of the 22nd ACM international conference on Information & Knowledge Management10.1145/2505515.2505658(2285-2290)Online publication date: 27-Oct-2013
  • (2013)When music makes a sceneInternational Journal of Multimedia Information Retrieval10.1007/s13735-012-0031-32:1(15-30)Online publication date: 24-Jan-2013
  • (2013)Ontology-Based Identification of Music for PlacesInformation and Communication Technologies in Tourism 201310.1007/978-3-642-36309-2_37(436-447)Online publication date: 27-Apr-2013
  • (2012)Being pickyProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2396877(912-921)Online publication date: 29-Oct-2012
  • (2012)MuseSyncProceedings of the 20th ACM international conference on Multimedia10.1145/2393347.2396496(1383-1384)Online publication date: 29-Oct-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media