[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3586183.3606757acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
research-article

Automated Conversion of Music Videos into Lyric Videos

Published: 29 October 2023 Publication History

Abstract

Musicians and fans often produce lyric videos, a form of music videos that showcase the song’s lyrics, for their favorite songs. However, making such videos can be challenging and time-consuming as the lyrics need to be added in synchrony and visual harmony with the video. Informed by prior work and close examination of existing lyric videos, we propose a set of design guidelines to help creators make such videos. Our guidelines ensure the readability of the lyric text while maintaining a unified focus of attention. We instantiate these guidelines in a fully automated pipeline that converts an input music video into a lyric video. We demonstrate the robustness of our pipeline by generating lyric videos from a diverse range of input sources. A user study shows that lyric videos generated by our pipeline are effective in maintaining text readability and unifying the focus of attention.

Supplemental Material

CSV File
List of top 100 most viewed lyric videos on YouTube
ZIP File
Supplemental File

References

[1]
Maneesh Agrawala, Wilmot Li, and Floraine Berthouzoz. 2011. Design Principles for Visual Communication. Commun. ACM 54, 4 (apr 2011), 60–69. https://doi.org/10.1145/1924421.1924439
[2]
Aitor Álvarez, Haritz Arzelus, and Thierry Etchegoyhen. 2014. Towards Customized Automatic Segmentation of Subtitles. In Advances in Speech and Language Technologies for Iberian Languages. Springer, Cham, 229–238.
[3]
Sudan Archives. 2022. Sudan Archives - Selfish Soul (Official Video). Retrieved July 24, 2023 from https://youtu.be/eaY8kI0oEpA
[4]
David R Bennett. 2002. Meant to be read: Typesetting principles for the digital age. Lamar University-Beaumont, Beaumont, Texas.
[5]
BLACKPINK and Selena Gomez. 2021. BLACKPINK - ’Ice Cream (with Selena Gomez)’ M/V. Retrieved July 24, 2023 from https://youtu.be/vRXZj0DzXIA
[6]
Andy Brown, Rhia Jones, Mike Crabb, James Sandford, Matthew Brooks, Mike Armstrong, and Caroline Jay. 2015. Dynamic Subtitles: The User Experience. In Proceedings of the ACM International Conference on Interactive Experiences for TV and Online Video(TVX ’15). Association for Computing Machinery, New York, NY, USA, 103–112. https://doi.org/10.1145/2745197.2745204
[7]
The Chainsmokers. 2022. The Chainsmokers - iPad (Live at SUMMIT at One Vanderbilt). Retrieved July 24, 2023 from https://youtu.be/w1DZWaOHImk
[8]
Chenglizhao Chen, Mengke Song, Wenfeng Song, Li Guo, and Muwei Jian. 2023. A Comprehensive Survey on Video Saliency Detection With Auditory Information: The Audio-Visual Consistency Perceptual is the Key!IEEE Transactions on Circuits and Systems for Video Technology 33, 2 (2023), 457–477. https://doi.org/10.1109/TCSVT.2022.3203421
[9]
Ho Kei Cheng and Alexander G. Schwing. 2022. XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model. In ECCV 2022, Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (Eds.). Springer, Cham, 640–658.
[10]
Coldplay. 2022. Coldplay - The Scientist. Retrieved July 24, 2023 from https://www.youtube.com/shorts/KsyALRZSn2o
[11]
Miley Cyrus. 2023. Miley Cyrus - Flowers (Official Video). Retrieved July 24, 2023 from https://youtu.be/G7KNmW9a75Y
[12]
The Described and Captioned Media Program. 2023. Guidelines and Best Practices for Captioning Educational Video. https://dcmp.org/learn/captioningkey
[13]
Bob Dylan. 1965. Bob Dylan - Subterranean Homesick Blues (Official HD Video). https://youtu.be/MGxjIBEZvx0
[14]
C. Feng, Y. Zhong, Y. Gao, M. R. Scott, and W. Huang. 2021. TOOD: Task-aligned One-stage Object Detection. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE Computer Society, Los Alamitos, CA, USA, 3490–3499. https://doi.org/10.1109/ICCV48922.2021.00349
[15]
Olivia Gerber-Morón and Agnieszka Szarkowska. 2018. Line breaks in subtitling: an eye tracking study on viewer preferences. Journal of eye movement research 11, 3 (2018), 18 pages.
[16]
Chitralekha Gupta, Emre Yılmaz, and Haizhou Li. 2020. Automatic Lyrics Alignment and Transcription in Polyphonic Music: Does Background Music Help?. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Barcelona, Spain, 496–500. https://doi.org/10.1109/ICASSP40776.2020.9054567
[17]
Srinidhi Hegde, Jitender Maurya, Ramya Hebbalaguppe, and Aniruddha Kalkar. 2020. SmartOverlays: A Visual Saliency Driven Label Placement for Intelligent Human-Computer Interfaces. In 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, Snowmass, CO, USA, 1110–1119. https://doi.org/10.1109/WACV45572.2020.9093587
[18]
Yongtao Hu, Jan Kautz, Yizhou Yu, and Wenping Wang. 2015. Speaker-Following Video Subtitles. ACM Trans. Multimedia Comput. Commun. Appl. 11, 2, Article 32 (jan 2015), 17 pages. https://doi.org/10.1145/2632111
[19]
Q. Huang, Y. Xiong, and D. Lin. 2018. Unifying Identification and Context Learning for Person Recognition. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 2217–2225. https://doi.org/10.1109/CVPR.2018.00236
[20]
Adobe Inc.2023. Premiere Pro. https://www.adobe.com/products/premiere.html
[21]
Jun Kato, Tomoyasu Nakano, and Masataka Goto. 2015. TextAlive: Integrated Design Environment for Kinetic Typography. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems(CHI ’15). Association for Computing Machinery, New York, NY, USA, 3403–3412. https://doi.org/10.1145/2702123.2702140
[22]
Fumi Katsuki and Christos Constantinidis. 2014. Bottom-Up and Top-Down Attention: Different Processes and Overlapping Neural Systems. The Neuroscientist 20, 5 (2014), 509–521. https://doi.org/10.1177/1073858413514136 arXiv:https://doi.org/10.1177/1073858413514136PMID: 24362813.
[23]
Kuno Kurzhals, Emine Cetinkaya, Yongtao Hu, Wenping Wang, and Daniel Weiskopf. 2017. Close to the Action: Eye-Tracking Evaluation of Speaker-Following Subtitles. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems(CHI ’17). Association for Computing Machinery, New York, NY, USA, 6559–6568. https://doi.org/10.1145/3025453.3025772
[24]
Kuno Kurzhals, Fabian Göbel, Katrin Angerbauer, Michael Sedlmair, and Martin Raubal. 2020. A View on the Viewer: Gaze-Adaptive Captions for Videos. Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3313831.3376266
[25]
John Legend. 2022. John Legend - Nervous (Live in Las Vegas). Retrieved July 24, 2023 from https://youtu.be/V5vLVPQ-S-0
[26]
Google LLC. 2023. YouTube Studio. https://studio.youtube.com/
[27]
Brian McFee, Colin Raffel, Dawen Liang, Daniel Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. 2015. librosa: Audio and Music Signal Analysis in Python. In Proceedings of the 14th Python in Science Conference. Scipy, USA, 18–24. https://doi.org/10.25080/majora-7b98e3ed-003
[28]
Idina Menzel. 2013. Idina Menzel - Let It Go (from Frozen) (Official Video). Retrieved July 24, 2023 from https://youtu.be/YVVTZgwYwVo
[29]
Musixmatch. 2023. Musixmatch. https://www.musixmatch.com/
[30]
Netflix. 2023. English Timed Text Style Guide. https://partnerhelp.netflixstudios.com/hc/en-us/articles/217350977-English-Timed-Text-Style-Guide
[31]
NFL. 2023. Rihanna’s FULL Apple Music Super Bowl LVII Halftime Show. Retrieved July 24, 2023 from https://youtu.be/HjBo–1n8lI
[32]
OneRepublic. 2022. OneRepublic - I Ain’t Worried (Live From The Tonight Show Starring Jimmy Fallon). Retrieved July 24, 2023 from https://youtu.be/fDuNemLHGzw
[33]
Matthieu Paul, Martin Danelljan, Christoph Mayer, and Luc Van Gool. 2022. Robust Visual Tracking By Segmentation. In European Conference on Computer Vision (ECCV). Springer-Verlag, Berlin, Heidelberg, 571–588. https://doi.org/10.1007/978-3-031-20047-2_33
[34]
Elisa Perego. 2008. What Would We Read Best? Hypotheses and Suggestions for the Location of Line Breaks in Film Subtitles. The Sign Language Translator and Interpreter 2 (2008), 35–63.
[35]
Katy Perry. 2013. Katy Perry - Birthday. https://youtu.be/jqYxyd1iSNk
[36]
Prince. 1987. Prince - Sign O’ The Times. https://youtu.be/8EdxM72EZ94
[37]
Charlie Puth and Selena Gomez. 2016. Charlie Puth & Selena Gomez - We Don’t Talk Anymore [Official Live Performance]. Retrieved July 24, 2023 from https://youtu.be/i_yLpCLMaKk
[38]
A. Rao, L. Xu, Y. Xiong, G. Xu, Q. Huang, B. Zhou, and D. Lin. 2020. A Local-to-Global Approach to Multi-Modal Movie Scene Segmentation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Los Alamitos, CA, 10143–10152. https://doi.org/10.1109/CVPR42600.2020.01016
[39]
Keith Rayner. 1975. The perceptual span and peripheral cues in reading. Cognitive psychology 7, 1 (1975), 65–81.
[40]
Reddit. 2023. r/HighQualityGifs. http://reddit.com/r/HQGStudios
[41]
Rihanna. 2009. Rihanna - Umbrella (Orange Version) (Official Music Video) ft. JAY-Z. Retrieved July 24, 2023 from https://youtu.be/CvBfHwUxHIk
[42]
Rihanna. 2010. Rihanna - Only Girl (In The World). Retrieved July 24, 2023 from https://youtu.be/pa14VNsdSYM
[43]
S. Shao, Z. Li, T. Zhang, C. Peng, G. Yu, X. Zhang, J. Li, and J. Sun. 2019. Objects365: A Large-Scale, High-Quality Dataset for Object Detection. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE Computer Society, Los Alamitos, CA, USA, 8429–8438. https://doi.org/10.1109/ICCV.2019.00852
[44]
Ed Sheeran. 2022. Ed Sheeran - Sandman. Retrieved July 24, 2023 from https://youtube.com/shorts/0T5yt0MzmdQ
[45]
Taylor Swift. 2021. Taylor Swift - august (studio sessions). Retrieved July 24, 2023 from https://youtu.be/pc_2ZKB4LVc
[46]
Taylor Swift. 2022. Taylor Swift - Anti-Hero. https://youtu.be/XqN2qFvY64U
[47]
Tan Tang, Junxiu Tang, Jiewen Lai, Lu Ying, Yingcai Wu, Lingyun Yu, and Peiran Ren. 2022. SmartShots: An Optimization Approach for Generating Videos with Data Visualizations Embedded. ACM Trans. Interact. Intell. Syst. 12, 1, Article 4 (mar 2022), 21 pages. https://doi.org/10.1145/3484506
[48]
Friederike Tegge and Katharina Parry. 2020. The impact of differences in text segmentation on the automated quantitative evaluation of song-lyrics. Plos one 15, 11 (2020), e0241979.
[49]
Quoc V. Vy, Jorge A. Mori, David W. Fourney, and Deborah I. Fels. 2008. EnACT: A Software Tool for Creating Animated Text Captions. In Computers Helping People with Special Needs, Klaus Miesenberger, Joachim Klaus, Wolfgang Zagler, and Arthur Karshmer (Eds.). Springernet, Berlin, Heidelberg, 609–616.
[50]
Fangzhou Wang, Hidehisa Nagano, Kunio Kashino, and Takeo Igarashi. 2017. Visualizing Video Sounds With Sound Word Animation to Enrich User Experience. IEEE Transactions on Multimedia 19, 2 (2017), 418–429. https://doi.org/10.1109/TMM.2016.2613641
[51]
Waxahatchee. 2020. Waxahatchee - Fire (Official Video). Retrieved July 24, 2023 from https://youtu.be/cEyYlyRr2_U
[52]
Waxahatchee. 2020. Waxahatchee - Lilacs (Official Video). Retrieved July 24, 2023 from https://youtu.be/OaA7I7B1pOk
[53]
Gareth Ford Williams. 2009. BBC Online Subtitling Editorial Guidelines. https://www.bbc.co.uk/accessibility/forproducts/guides/subtitles
[54]
YouTube. 2023. HQG Studios. https://www.youtube.com/@HQGStudios
[55]
Sean Zdenek. 2015. Reading Sounds: Closed-Captioned Media and Popular Culture. University of Chicago Press, Chicago, IL, USA.

Cited By

View all
  • (2024)ExpressEdit: Video Editing with Natural Language and SketchingProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645164(515-536)Online publication date: 18-Mar-2024

Index Terms

  1. Automated Conversion of Music Videos into Lyric Videos

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    UIST '23: Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology
    October 2023
    1825 pages
    ISBN:9798400701320
    DOI:10.1145/3586183
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Design guidelines
    2. lyrics
    3. video generation

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    UIST '23

    Acceptance Rates

    Overall Acceptance Rate 561 of 2,567 submissions, 22%

    Upcoming Conference

    UIST '25
    The 38th Annual ACM Symposium on User Interface Software and Technology
    September 28 - October 1, 2025
    Busan , Republic of Korea

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)428
    • Downloads (Last 6 weeks)33
    Reflects downloads up to 10 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)ExpressEdit: Video Editing with Natural Language and SketchingProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645164(515-536)Online publication date: 18-Mar-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media