Article

A voice-to-MIDI system for singing melodies with lyrics

Authors:

Naoki Itou,

Kazushi NishimotoAuthors Info & Claims

ACE '07: Proceedings of the international conference on Advances in computer entertainment technology

Pages 183 - 189

https://doi.org/10.1145/1255047.1255085

Published: 13 June 2007 Publication History

Get Access

Abstract

In this paper, we propose a robust Voice-to-MIDI (V to M) system with which a user can input MIDI sequence data by naturally singing melodies with lyrics. A Voice-to-MIDI system translates singing voices into digital musical data, i.e., MIDI sequence data. Therefore, with such a system, users can input melodies intuitively, which releases them from manual translating memorized melodies into chromatic pitches. However, the quality of translation of ordinary Voice-to-MIDI systems is insufficient. One of the most significant problems is the poor accuracy of the segmentation of notes. We solve this problem by employing "rhythmic tapping" concurrently with singing. We examined the proposed method by the accuracy of the numbers of segmented notes and their pitches. As a result, we confirmed that our system outperformed ordinary Voice-to-MIDI systems. Thus, this system satisfies both of easy and intuitive composition of MIDI sequence data and high accuracy of translation of sung data into MIDI sequence data.

References

[1]

YAMAHA Corp., XGworks ST, http://www.yamaha.co.jp/product/syndtm/p/cmp/xgwstw/index.html

Google Scholar

[2]

Media Navigation,Inc., Hanauta Musician 2, http://medianavi.co.jp/product/hana2/hana2.html

Google Scholar

[3]

Jun, S., Takeshi, M., Masanobu, M. and Masuzo, Y., Automatic Scoring of Melodies Sung by Humming. Tech. Rep. Musical Acoust. Acoust. Soc. Jpn.,allVol.23, No.5, pp.95--100, 2004.

Google Scholar

[4]

Epinoisis Software, Digital Ear, http://www.digital-ear.com/digital-ear/index.asp

Google Scholar

[5]

Lloyd A. S., Eline F. C., Brian L. S., A Speech Interface for Building Musical Score Collections. Proceedings of the fifth ACM conference on Digital libraries, 2000.

Digital Library

Google Scholar

[6]

Goto, M., SmartMusicKIOSK: music listening station with chorus-search function. Proc.of the 16th annual ACM symp. on User interface software and technology (UIST), 2003.

Digital Library

Google Scholar

[7]

Goto, M., A Chorus-Section detection Method for Musical Audio Signals. Proc. of IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2003.

Google Scholar

[8]

Tzanetakis, G., Song-specific bootstrapping of singing voice structure. Proc. of the Intl Conf. on Multimedia and Expo (ICME), 2004.

Crossref

Google Scholar

[9]

Wang, C. K., Lyu, R. Y. and Chiang, Y.C., An Automatic Singing Transcription System with Multilingual Singing Lyric Recognizer and Robust Melody Tracker. Proc. of EUROSpeech, 2003.

Google Scholar

[10]

Ye, W., Min-Yen, K., Tin L. N., Arun S. and Jun Y., LyricAlly: automatic synchronization of acoustic musical signals and textual lyrics. Proc. of the 12th annual ACM intl. conf. on Multimedia (MULTIMEDIA), 2004.

Digital Library

Google Scholar

[11]

Yuichiro, H. and Seiji, I., Frequency Identification by Complex Spectrum. Soc. Inst. And Cont. Engineering., pp.718--723, 1983.

Google Scholar

[12]

INTERNET .Co.,Ltd., SingerSongWriter Lite5, http://www.ssw.co.jp/products/ssw/win/sswlt50w/index.html

Google Scholar

[13]

Hideki, K., Haruhiro, K., Alain de C. and Roy D. P., Fixed Point Analysis of Frequency to Instantaneous Frequency Mapping for Accurate Estimation of F0 and Periodicity, Proc. EUROSPEECH'99, Volume 6, 2781--2784, 1999.

Google Scholar

Cited By

View all

Donati EChousidis C(2022)Electroglottography based voice-to-MIDI real time converter with AI voice act classification2022 IEEE International Symposium on Medical Measurements and Applications (MeMeA)10.1109/MeMeA54994.2022.9856413(1-6)Online publication date: 22-Jun-2022
https://doi.org/10.1109/MeMeA54994.2022.9856413
Donati EChousidis C(2022)Electroglottography based real-time voice-to-MIDI controllerNeuroscience Informatics10.1016/j.neuri.2022.100041(100041)Online publication date: Jan-2022
https://doi.org/10.1016/j.neuri.2022.100041
Baratè ALudovico LSantucci E(2013)A Semantics-Driven Approach to Lyrics SegmentationProceedings of the 2013 8th International Workshop on Semantic and Social Media Adaptation and Personalization10.1109/SMAP.2013.15(73-79)Online publication date: 12-Dec-2013
https://dl.acm.org/doi/10.1109/SMAP.2013.15
Show More Cited By

Index Terms

A voice-to-MIDI system for singing melodies with lyrics
1. Hardware
  1. Communication hardware, interfaces and storage
    1. Signal processing systems
  2. Robustness
    1. Hardware reliability
      1. Signal integrity and noise analysis

Recommendations

Music/lyrics composition system considering user's image and music genre
SMC'09: Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics

This paper proposes a music/lyrics composition system consisting of two sections, a lyric composing section and a music composing section, which considers user's image of a song and music genre. First of all, a user has an image of music/lyrics to ...
Rhythm Speech Lyrics Input for MIDI-Based Singing Voice Synthesis
PCM '09: Proceedings of the 10th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing

This paper presents useful techniques and considerations in implementing underlying mandarin singing voice synthesis system using the RSLI unit. The system can receive the continuous speech of the lyrics of a song, and can synthesize the intended song ...
LAMP, A Lyrics and Audio MandoPop Dataset for Music Mood Estimation: Dataset Compilation, System Construction, and Testing
TAAI '10: Proceedings of the 2010 International Conference on Technologies and Applications of Artificial Intelligence

Music mood estimation (MME) is an emerging subfield in music information retrieval research. Whereas most MME research focuses on audio analysis, exploring the significance of lyrics in predicting song emotion has been receiving more attention in recent ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

ACE '07: Proceedings of the international conference on Advances in computer entertainment technology

June 2007

324 pages

ISBN:9781595936400

DOI:10.1145/1255047

General Chairs:
Masa Inakage
Keio University
,
Newton Lee
NUS Hollywood Lab
,
Manfred Tscheligi
University of Salzburg
,
Program Chairs:
Regina Bernhaupt
University of Salzburg
,
Stephane Natkin
Conservatoire National des Arts et Métiers

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ACE2007

Sponsor:

ACE2007: International Conference on Advances in Computer Entertainment Technology

June 13 - 15, 2007

Salzburg, Austria

Acceptance Rates

Overall Acceptance Rate 36 of 90 submissions, 40%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
563
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)2

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Donati EChousidis C(2022)Electroglottography based voice-to-MIDI real time converter with AI voice act classification2022 IEEE International Symposium on Medical Measurements and Applications (MeMeA)10.1109/MeMeA54994.2022.9856413(1-6)Online publication date: 22-Jun-2022
https://doi.org/10.1109/MeMeA54994.2022.9856413
Donati EChousidis C(2022)Electroglottography based real-time voice-to-MIDI controllerNeuroscience Informatics10.1016/j.neuri.2022.100041(100041)Online publication date: Jan-2022
https://doi.org/10.1016/j.neuri.2022.100041
Baratè ALudovico LSantucci E(2013)A Semantics-Driven Approach to Lyrics SegmentationProceedings of the 2013 8th International Workshop on Semantic and Social Media Adaptation and Personalization10.1109/SMAP.2013.15(73-79)Online publication date: 12-Dec-2013
https://dl.acm.org/doi/10.1109/SMAP.2013.15
Kitahara TKimura SSuzuki YSuzuki TBabaguchi NAizawa KSmith JSatoh SPlagemann THua XYan R(2012)Hummi-comProceedings of the 20th ACM international conference on Multimedia10.1145/2393347.2396463(1321-1322)Online publication date: 29-Oct-2012
https://dl.acm.org/doi/10.1145/2393347.2396463
Oshima CItou NNishimoto KHosoi NYasuda KNakayama K(2011)An accompaniment system for healing emotions of patients with dementia who repeat stereotypical utterancesProceedings of the 9th international conference on Toward useful services for elderly and people with disabilities: smart homes and health telematics10.5555/2026187.2026198(65-71)Online publication date: 20-Jun-2011
https://dl.acm.org/doi/10.5555/2026187.2026198
Nie ZYang S(2011)An Implementation Method for Converting the Erhu Music from Wav to MidProceedings of the 2011 Seventh International Conference on Computational Intelligence and Security10.1109/CIS.2011.318(1425-1429)Online publication date: 3-Dec-2011
https://dl.acm.org/doi/10.1109/CIS.2011.318
Kanamaru MHanaue KWatanabe T(2011)Notation-Support Method in Music Composition Based on Interval-Pitch ConversionIntelligent Decision Technologies10.1007/978-3-642-22194-1_54(547-556)Online publication date: 2011
https://doi.org/10.1007/978-3-642-22194-1_54
Oshima CItou NNishimoto KHosoi NYasuda KNakayama K(2011)An Accompaniment System for Healing Emotions of Patients with Dementia Who Repeat Stereotypical UtterancesToward Useful Services for Elderly and People with Disabilities10.1007/978-3-642-21535-3_9(65-71)Online publication date: 2011
https://doi.org/10.1007/978-3-642-21535-3_9
Arvin FDoraisamy S(2009)A Real-Time Note Transcription Technique Using Static and Dynamic Window SizesProceedings of the 2009 International Conference on Signal Acquisition and Processing10.1109/ICSAP.2009.20(30-33)Online publication date: 3-Apr-2009
https://dl.acm.org/doi/10.1109/ICSAP.2009.20
Khorasani EDoraisamy SArvin F(2009)An Approach for Heartbeat Sound TranscriptionProceedings of the 2009 International Conference on Computer Technology and Development - Volume 0110.1109/ICCTD.2009.94(38-41)Online publication date: 13-Nov-2009
https://dl.acm.org/doi/10.1109/ICCTD.2009.94
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Music/lyrics composition system considering user's image and music genre

Rhythm Speech Lyrics Input for MIDI-Based Singing Voice Synthesis

LAMP, A Lyrics and Audio MandoPop Dataset for Music Mood Estimation: Dataset Compilation, System Construction, and Testing