More Web Proxy on the site http://driver.im/

research-article

Picasso - to sing, you must close your eyes and draw

Authors:

Aleksandar Stupar,

Sebastian MichelAuthors Info & Claims

SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Pages 715 - 724

https://doi.org/10.1145/2009916.2010012

Published: 24 July 2011 Publication History

Abstract

We study the problem of automatically assigning appropriate music pieces to a picture or, in general, series of pictures. This task, commonly referred to as soundtrack suggestion, is non-trivial as it requires a lot of human attention and a good deal of experience, with master pieces distinguished, e.g., with the Academy Award for Best Original Score. We put forward PICASSO to solve this task in a fully automated way. PICASSO makes use of genuine samples obtained from first-class contemporary movies. Hence, the training set can be arbitrarily large and is also inexpensive to obtain but still provides an excellent source of information. At query time, PICASSO employs a three-level algorithm. First, it selects for a given query image a ranking of the most similar screenshots taken, and subsequently, selects for each screenshot the most similar songs to the music played in the movie when the screenshot was taken. Last, it issues a top-K aggregation algorithm to find the overall best suitable songs available. We have created a large training set consisting of over 40,000 image/soundtrack samples obtained from 28 movies and evaluated the suitability of PICASSO by means of a user study.

References

[1]

Stefan Berchtold, Christian Böhm, and Hans-Peter Kriegel. The pyramid-technique: Towards breaking the curse of dimensionality. In SIGMOD Conference, 1998.

Digital Library

[2]

Kevin S. Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. When is "nearest neighbor" meaningful? In ICDT, 1999.

Digital Library

[3]

Rui Cai, Lei Zhang, Feng Jing, Wei Lai, and Wei-Ying Ma. Automated music video generation using web image resource. In ICASSP, 2007.

[4]

Shih-Fu Chang, T. Sikora, and A. Purl. Overview of the mpeg-7 standard. In IEEE Trans. Circuits Syst. Video Techn., June 2001.

Digital Library

[5]

Marco Cristani, Anna Pesarin, Carlo Drioli, Vittorio Murino, Antonio Rodà, Michele Grapulin, and Nicu Sebe. Toward an automatically generated soundtrack from low-level cross-modal correlations for automotive scenarios. In ACM Multimedia, 2010.

Digital Library

[6]

Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on Computational Geometry, 2004.

Digital Library

[7]

D.P.W. Ellis and G.E. Poliner. Identifying 'cover songs' with chroma features and dynamic programming beat tracking. In ICASSP, 2007.

[8]

Ronald Fagin, Ravi Kumar, Mohammad Mahdian, D. Sivakumar, and Erik Vee. Comparing partial rankings. In SIAM J. Discrete Math., 2006.

Digital Library

[9]

Flickr. http://www.flickr.com/.

[10]

Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Similarity search in high dimensions via hashing. In VLDB, 1999.

Digital Library

[11]

Alfred Haar. Zur Theorie der orthogonalen Funktionensysteme. Mathematische Annalen, 69. 1910.

[12]

Xian-Sheng Hua, Lie Lu, and HongJiang Zhang. Automatic music video generation based on temporal pattern analysis. In ACM Multimedia, 2004.

Digital Library

[13]

Xian-Sheng Hua, Lie Lu, and HongJiang Zhang. Optimization-based automated home video editing system. In IEEE Trans. Circuits Syst. Video Techn., 2004.

Digital Library

[14]

Internet movie database. http://www.imdb.com.

[15]

Cheng-Te Li and Man-Kwan Shan. Emotion-based impressionism slideshow with automatic music accompaniment. In ACM Multimedia, 2007.

Digital Library

[16]

David G. Lowe. Distinctive image features from scale-invariant keypoints. In International Journal of Computer Vision, 2004.

Digital Library

[17]

Michael I. Mandel and Dan Ellis. Song-level features and support vector machines for music classification. In ISMIR, 2005.

[18]

B. S. Manjunath, Jens-Rainer Ohm, Vinod V. Vasudevan, and Akio Yamada. Color and texture descriptors. In IEEE Trans. Circuits Syst. Video Techn., 2001.

Digital Library

[19]

Marsyas - music/speech dataset. http://marsyas.info/download/data_sets.

[20]

A. Martin, D. Charlet, and L. Mauuary. Robust speech/non-speech detection using lda applied to mfcc. In ICASSP, 2001.

[21]

Music2ten. http://music2ten.com.

[22]

Irina Rish. An empirical study of the naive Bayes classifier. In IJCAI-01 workshop on "Empirical Methods in AI", 2001.

[23]

H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word recognition. In IEEE Transactions on Acoustics, Speech and Signal Processing,February 1978.

[24]

Hanan Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, 2006.

Digital Library

[25]

Zbigniew R. Struzik and Arno Siebes. The haar wavelet transform in the time series similarity paradigm. In PKDD, 1999.

Digital Library

[26]

G. Tzanetakis and P. Cook. Musical genre classification of audio signals. In IEEE Transactions on Speech and Audio Processing, July 2002.

[27]

George Tzanetakis. Music analysis, retrieval and synthesis of audio signals marsyas. In ACM Multimedia, 2009.

Digital Library

[28]

Jinjun Wang, Changsheng Xu, Engsiong Chng, Lingyu Duan, Kongwah Wan, and Qi Tian. Automatic generation of personalized music sports video. In ACM Multimedia, 2005.

Digital Library

[29]

Songhua Xu, Tao Jin, and Francis Chi-Moon Lau. Automatic generation of music slide show using personal photos. In ISM, 2008.

Digital Library

[30]

Youtube. http://www.youtube.com.

Cited By

Solèr MBazin JWang OKrause ASorkine-Hornung A(2016)Suggesting Sounds for Images from Video CollectionsComputer Vision – ECCV 2016 Workshops10.1007/978-3-319-48881-3_59(900-917)Online publication date: 3-Nov-2016
https://doi.org/10.1007/978-3-319-48881-3_59
Liem C(2016)From Water Music to ‘Underwater Music’: Multimedia Soundtrack Retrieval with Social Mass Media ResourcesResearch and Advanced Technology for Digital Libraries10.1007/978-3-319-43997-6_18(227-238)Online publication date: 10-Aug-2016
https://doi.org/10.1007/978-3-319-43997-6_18
Kaminskas MRicci F(2016)Emotion-Based Matching of Music to PlacesEmotions and Personality in Personalized Services10.1007/978-3-319-31413-6_14(287-310)Online publication date: 14-Jul-2016
https://doi.org/10.1007/978-3-319-31413-6_14
Show More Cited By

Index Terms

Picasso - to sing, you must close your eyes and draw

Recommendations

PICASSO: automated soundtrack suggestion for multi-modal data
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

We demonstrate PICASSO, a novel approach to soundtrack recommendation. Given text, video, or image documents, PICASSO selects the best fitting music pieces, out of a given set of files, for instance, a user's personal mp3 collection. This task, commonly ...
SING: symbol-to-instrument neural generator
NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems

Recent progress in deep learning for audio synthesis opens the way to models that directly produce the waveform, shifting away from the traditional paradigm of relying on vocoders or MIDI synthesizers for speech or music generation. Despite their ...
FocusMusicRecommender: A System for Recommending Music to Listen to While Working
IUI '18: Proceedings of the 23rd International Conference on Intelligent User Interfaces

This paper proposes FocusMusicRecommender, an automated system recommending background music to listen to while working. Recommendation systems matching user preferences have been widely researched even though research has shown that music that listeners ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

July 2011

1374 pages

ISBN:9781450307574

DOI:10.1145/2009916

General Chairs:
Wei-Ying Ma
Microsoft Research Asia, China
,
Jian-Yun Nie
University of Montreal, Canada
,
Program Chairs:
Ricardo Baeza-Yates
Yahoo! Research, Spain
,
Tat-Seng Chua
National University of Singapore
,
W. Bruce Croft
University of Massachusetts, Amherst, USA

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 July 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR '11

Sponsor:

SIGIR

SIGIR '11: The 34th International ACM SIGIR conference on research and development in Information Retrieval

July 24 - 28, 2011

Beijing, China

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
402
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Solèr MBazin JWang OKrause ASorkine-Hornung A(2016)Suggesting Sounds for Images from Video CollectionsComputer Vision – ECCV 2016 Workshops10.1007/978-3-319-48881-3_59(900-917)Online publication date: 3-Nov-2016
https://doi.org/10.1007/978-3-319-48881-3_59
Liem C(2016)From Water Music to ‘Underwater Music’: Multimedia Soundtrack Retrieval with Social Mass Media ResourcesResearch and Advanced Technology for Digital Libraries10.1007/978-3-319-43997-6_18(227-238)Online publication date: 10-Aug-2016
https://doi.org/10.1007/978-3-319-43997-6_18
Kaminskas MRicci F(2016)Emotion-Based Matching of Music to PlacesEmotions and Personality in Personalized Services10.1007/978-3-319-31413-6_14(287-310)Online publication date: 14-Jul-2016
https://doi.org/10.1007/978-3-319-31413-6_14
Schedl MKnees PMcFee BBogdanov DKaminskas M(2015)Music Recommender SystemsRecommender Systems Handbook10.1007/978-1-4899-7637-6_13(453-492)Online publication date: 2015
https://doi.org/10.1007/978-1-4899-7637-6_13
Kaminskas MFernández-Tobías IRicci FCantador I(2014)Knowledge-based identification of music suited for places of interestInformation Technology & Tourism10.1007/s40558-014-0004-x14:1(73-95)Online publication date: 2-Mar-2014
https://doi.org/10.1007/s40558-014-0004-x
Stupar AMichel SHe QIyengar ANejdl WPei JRastogi R(2013)SRbench--a benchmark for soundtrack recommendation systemsProceedings of the 22nd ACM international conference on Information & Knowledge Management10.1145/2505515.2505658(2285-2290)Online publication date: 27-Oct-2013
https://dl.acm.org/doi/10.1145/2505515.2505658
Liem CLarson MHanjalic A(2013)When music makes a sceneInternational Journal of Multimedia Information Retrieval10.1007/s13735-012-0031-32:1(15-30)Online publication date: 24-Jan-2013
https://doi.org/10.1007/s13735-012-0031-3
Kaminskas MFernández-Tobías ICantador IRicci F(2013)Ontology-Based Identification of Music for PlacesInformation and Communication Technologies in Tourism 201310.1007/978-3-642-36309-2_37(436-447)Online publication date: 27-Apr-2013
https://doi.org/10.1007/978-3-642-36309-2_37
Stupar AMichel SChen XLebanon GWang HZaki M(2012)Being pickyProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2396877(912-921)Online publication date: 29-Oct-2012
https://dl.acm.org/doi/10.1145/2396761.2396877
Liem CBazzica AHanjalic ABabaguchi NAizawa KSmith JSatoh SPlagemann THua XYan R(2012)MuseSyncProceedings of the 20th ACM international conference on Multimedia10.1145/2393347.2396496(1383-1384)Online publication date: 29-Oct-2012
https://dl.acm.org/doi/10.1145/2393347.2396496
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents