Abstract
Though it is commonly agreed that increasing the training set size leads to improved recognition rates, the deficit of publicly available Japanese character pattern databases prevents us from verifying this assumption empirically for large data sets. Whereas the typical number of training samples has usually been between 100-200 patterns per category until now, newly collected databases and increased computing power allows us to experiment with a much higher number of samples per category. In this paper, we experiment with off-line classifiers trained with up to 1550 patterns for 3036 categories respectively. We show that this bigger training set size indeed leads to improved recognition rates compared to the smaller training sets normally used.
Chapter PDF
Similar content being viewed by others
Keywords
- Training Sample
- Recognition Rate
- Character Recognition
- Minimum Classification Error
- Increase Computing Power
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
T. Kawatani, H. Shimizu; Handwritten Kanji Recognition with the LDA Method, Proc. 14th ICPR, Brisbane, 1998, Vol.II, pp.1031–1035
M. Nakagawa, et al., On-line character pattern database sampled in a sequence of sentences without any writing instructions, Proc. 4th ICDAR, 1997, pp.376–380.
S. Jaeger, M. Nakagawa, Two on-line Japanese character databases in Unipen format, Proc. 6th ICDAR, Seattle, 2001, pp.566–570.
K. Matsumoto, T. Fukushima, M. Nakagawa, Collection and analysis of on-line handwritten Japanese character patterns, Proc. 6th ICDAR, Seattle, 2001, pp.496–500.
O. Velek, Ch. Liu, M. Nakagawa, Generating Realistic Kanji Character Images from Online Patterns, Proc. 6th ICDAR, pp.556–560, 2001
O. Velek, Ch. Liu, S. Jaeger, M. Nakagawa, An Improved Approach to Generating Realistic Kanji Character Images and its Effect to Improve Off-line Recognition Performance, accepted for ICPR 2002.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Velek, O., Nakagawa, M. (2002). The Impact of Large Training Sets on the Recognition Rate of Off-line Japanese Kanji Character Classifiers. In: Lopresti, D., Hu, J., Kashi, R. (eds) Document Analysis Systems V. DAS 2002. Lecture Notes in Computer Science, vol 2423. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45869-7_13
Download citation
DOI: https://doi.org/10.1007/3-540-45869-7_13
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44068-0
Online ISBN: 978-3-540-45869-2
eBook Packages: Springer Book Archive