Abstract
The transcription of music sources requires new ways of interacting with musical documents. Assuming that automatic technologies will never guarantee a perfect transcription, our intention is to develop an interactive system in which user and software collaborate to complete the task. Since the use of traditional software for score edition might be tedious, our work studies the interaction by means of electronic pen (e-pen). In our framework, users trace symbols using an e-pen over a digital surface, which provides both the underlying image (offline data) and the drawing made (online data). Using both sources, the system is capable of reaching an error below 4% when recognizing the symbols with a Convolutional Neural Network.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The dataset is freely available at http://grfia.dlsi.ua.es/ (Bimodal music symbols from Early notation).
References
Azeem, S.A., Ahmed, H.: Combining online and offline systems for arabic handwriting recognition. In: Proceedings of the 21st International Conference on Pattern Recognition ICPR 2012, pp. 3725–3728 (2012)
Benetos, E., Dixon, S., Giannoulis, D., Kirchhoff, H., Klapuri, A.: Automatic music transcription: challenges and future directions. J. Intell. Inf. Syst. 41(3), 407–434 (2013)
Bourlard, H., Wellekens, C.: Links between markov models and multilayer perceptrons. IEEE Trans. Pattern Anal. Mach. Intell. 12(11), 1167–1178 (1990)
Donald Byrd and Jakob Grue Simonsen: Towards a standard testbed for optical music recognition: definitions, metrics, and page images. J. New Music Res. 44(3), 169–195 (2015)
Calvo-Zaragoza, J., Oncina, J.: Recognition of pen-based music notation with finite-state machines. Expert Syst. Appl. 72, 395–406 (2017)
Calvo-Zaragoza, J., Rizo, D., Quereda, J.M.I.: Two (note) heads are better than one: pen-based multimodal interaction with music scores. In: Proceedings of the 17th International Society for Music Information Retrieval Conference, ISMIR 2016, New York City, United States, pp. 509–514, 7–11 August 2016
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, USA, pp. 315–323 (2011)
Graves, A., Mohamed, A.-R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649, May 2013
Keysers, D., Deselaers, T., Rowley, H.A., Wang, L.L., Carbune, V.: Multi-language online handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1180–1194 (2017)
Kherfi, M.L.: Review of Human-Computer Interaction Issues in Image Retrieval, chapter 14, pp. 215–240 (2008)
Konidaris, T., Gatos, B., Ntzios, K., Pratikakis, I., Theodoridis, S., Perantonis, S.J.: Keyword-guided word spotting in historical printed documents using synthetic data and user feedback. Int. J. Doc. Anal. Recognit. (IJDAR) 9(2), 167–177 (2007)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–44 (2015)
Plamondon, R., Srihari, S.N.: On-line and off-line handwriting recognition: a comprehensive survey. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 63–84 (2000)
Rebelo, A., Fujinaga, I., Paszkiewicz, F., Marçal, A.R.S., Guedes, C., Cardoso, J.S.: Optical music recognition: state-of-the-art and open issues. Int. J. Multimedia Inf. Retrieval 1(3), 173–190 (2012)
Toselli, A.H., Vidal, E., Casacuberta, F.: Multimodal Interactive Pattern Recognition and Applications, 1st edn. Springer, London (2011). https://doi.org/10.1007/978-0-85729-479-1
Vidal, E., Rodríguez, L., Casacuberta, F., García-Varea, I.: Interactive pattern recognition. In: Popescu-Belis, A., Renals, S., Bourlard, H. (eds.) MLMI 2007. LNCS, vol. 4892, pp. 60–71. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78155-4_6
Vinciarelli, A., Perrone, M.P.: Combining online and offline handwriting recognition. In: Proceedings of 7th International Conference on Document Analysis and Recognition, pp. 844–848 (2003)
Yin, F., Wang, Q.-F., Zhang, X.-Y., Liu, C.-L.: ICDAR 2013 chinese handwriting recognition competition. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1464–1470, August 2013
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Acknowledgment
This work was supported by the Social Sciences and Humanities Research Council of Canada, and by the Spanish Ministerio de Ciencia, Innovación y Universidades through Project HISPAMUS (No. TIN2017-86576-R supported by EU FEDER funds).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Sober-Mira, J., Calvo-Zaragoza, J., Rizo, D., Iñesta, J.M. (2018). Pen-Based Music Document Transcription with Convolutional Neural Networks. In: Fornés, A., Lamiroy, B. (eds) Graphics Recognition. Current Trends and Evolutions. GREC 2017. Lecture Notes in Computer Science(), vol 11009. Springer, Cham. https://doi.org/10.1007/978-3-030-02284-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-02284-6_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02283-9
Online ISBN: 978-3-030-02284-6
eBook Packages: Computer ScienceComputer Science (R0)