Abstract
Handwritten music recognition is a challenging task that could be of great use if mastered, e.g., to improve the accessibility of archival manuscripts or to ease music composition. Many modern machine learning techniques, however, cannot be easily applied to this task because of the limi‘ted availability of high-quality training data. Annotating such data manually is expensive and thus not feasible at the necessary scale. This problem has already been tackled in other fields by training on automatically generated synthetic data. We bring this approach to handwritten music recognition and present a method to generate synthetic handwritten music images (limited to monophonic scores) and show that training on such data leads to state-of-the-art results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
Writers(pages): 13(2, 3, 16); 17(1); 20(2, 3, 16); 34(2, 3, 16); 41(2, 3, 16); 49(3, 5, 9, 11).
References
Baró, A., Badal, C., Fornés, A.: Handwritten historical music recognition by sequence-to-sequence with attention mechanism. In: 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), Dortmund, Germany, pp. 205–210 (2020)
Baró, A., Riba, P., Calvo-Zaragoza, J., Fornés, A.: From optical music recognition to handwritten music recognition: a baseline. Pattern Recogn. Lett. 123, 1–8 (2019)
Baró, A., Riba, P., Fornés, A.: Towards the recognition of compound music notes in handwritten music scores. In: 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), Shenzhen, China, pp. 465–470 (2016)
Calvo-Zaragoza, J., Castellanos, F., Vigliensoni, G., Fujinaga, I.: Deep neural networks for document processing of music score images. Appl. Sci. 8(5), 654 (2018)
Calvo-Zaragoza, J., Hajič, J., Jr., Pacha, A.: Understanding optical music recognition. ACM Comput. Surv. 53(4), 77 (2020)
Calvo-Zaragoza, J., Rizo, D.: End-to-end neural optical music recognition of monophonic scores. Appl. Sci. 8(4), 606 (2018)
Calvo-Zaragoza, J., Toselli, A., Vidal, E.: Handwritten music recognition for mensural notation with convolutional recurrent neural networks. Pattern Recogn. Lett. 128, 115–121 (2019)
Fornés, A., Dutta, A., Gordo, A., Lladós, J.: CVC-MUSCIMA: a ground truth of handwritten music score images for writer identification and staff removal. Int. J. Doc. Anal. Recogn. 15, 243–251 (2011)
Fornés, A., Sánchez, G.: Analysis and recognition of music scores. In: Doermann, D., Tombre, K. (eds.) Handbook of Document Image Processing and Recognition, pp. 749–774. Springer, London (2014). https://doi.org/10.1007/978-0-85729-859-1_24
Good, M.: MusicXML: An internet-friendly format for sheet music. In: Proceedings of the XML Conference, Orlando, FL, USA, pp. 3–4 (2001)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification. In: Proceedings of the 23rd International Conference on Machine Learning (ICML), Pittsburgh, PA, USA, pp. 369–376 (2006)
Hajič, J., Jr., Pecina, P.: The MUSCIMA++ dataset for handwritten optical music recognition. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, pp. 39–46 (2017)
Hwang, K., Sung, W.: Character-level incremental speech recognition with recurrent neural networks. In: IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), Lujiazui, Shanghai, China, pp. 5335–5339 (2016)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition (2014)
Kingma, D., Ba, J.: Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations (ICLR), San Diego, USA (2014)
Krishnan, P., Jawahar, C.: Generating synthetic data for text recognition (2016)
Levenshtein, V.: Binary codes capable of correcting spurious insertions and deletions of ones. Probl. Inf. Transm. 1, 8–17 (1965)
Pacha, A., Calvo-Zaragoza, J., Hajič, J., Jr.: Learning notation graph construction for full-pipeline optical music recognition. In: 20th International Society for Music Information Retrieval Conference (ISMIR), Delft, Netherlands, pp. 75–82 (2019)
Pacha, A., Choi, K.Y., Eidenberger, H., Ricquebourg, Y., Coüasnon, B., Zanibbi, R.: Handwritten music object detection: open issues and baseline results. In: 13th IAPR Interantional Workshop on Document Analysis Systems (DAS), Vienna, Austria, pp. 163–168 (2018)
Peng, X., Sun, B., Ali, K., Saenko, K.: Learning deep object detectors from 3D models. In: IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, pp. 1278–1286 (2015)
Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, pp. 67–72 (2017)
Rebelo, A., Fujinaga, I., Paszkiewicz, F., Marçal, A., Guedes, C., Cardoso, J.: Optical music recognition: State-of-the-art and open issues. Int. J. Multimed. Inf. Retr. 1, 173–190 (2012)
Roland, P.: The music encoding initiative (MEI). In: First International Conference on Musical Application Using XML, Milan, Italy, pp. 55–59 (2002)
Rothstein, J.: MIDI: A Comprehensive Introduction, vol. 7. AR Editions, Inc. (1992)
Scheidl, H.: Handwritten text recognition in historical documents. Master’s thesis, Vienna University of Technology (2018)
Tuggener, L., Elezi, I., Schmidhuber, J., Pelillo, M., Stadelmann, T.: DeepScores - A dataset for segmentation, detection and classification of tiny objects. In: 24th International Conference on Pattern Recognition (ICPR), Beijing, China, pp. 3704–3709 (2018)
Tuggener, L., Elezi, I., Schmidhuber, J., Stadelmann, T.: Deep watershed detector for music object recognition. In: Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, pp. 271–278 (2018)
van der Wel, E., Ullrich, K.: Optical music recognition with convolutional sequence-to-sequence models. In: Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR), Suzhou, China, pp. 731–737 (2017)
Acknowledgment
This work described in this paper has been supported by the Czech Science Foundation (grant no. 19-26934X), CELSA (project no. 19/018), and has been using data provided by the LINDAT/CLARIAH-CZ Research Infrastructure (https://lindat.cz), supported by the Ministry of Education, Youth and Sports of the Czech Republic (project no. LM2018101). The authors would like to thank Jan Hajič jr. for his valuable comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Mayer, J., Pecina, P. (2021). Synthesizing Training Data for Handwritten Music Recognition. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12823. Springer, Cham. https://doi.org/10.1007/978-3-030-86334-0_41
Download citation
DOI: https://doi.org/10.1007/978-3-030-86334-0_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86333-3
Online ISBN: 978-3-030-86334-0
eBook Packages: Computer ScienceComputer Science (R0)