[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3366030.3366050acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiiwasConference Proceedingsconference-collections
research-article

Building Classifier Models for on-off Javanese Character Recognition

Published: 22 February 2020 Publication History

Abstract

In this paper, we demostrated the building process of four classifier models as a part of an on-off character recognition system for Javanese characters. As Javanese character is no longer used in everyday writing and books, the dataset were collected by scanning the historical manuscripts and a reading lesson book. The rough dataset comprises 15.414 annotated characters and 633 classes. However, only 162 classes have sufficient data samples to be the training and testing one. Using this dataset, we measured the performance of four classifiers, namely k-NN, LDA, SVM, and Gaussian NB on the accuracy, micro-averaged precision, micro-averaged sensitivity and weighted-averaged precision and sensitivity metrices. The experiment shows that k-NN outperforms any other classifiers almost in most metrices, while SVM suffers the poorest performance. The research byproduct worth mentioning here is that it has identified 633 classes of distinct Javanese characters which comprise both common characters and compound characters found in modern Javanese writing as well as the archaic characters found in the literary works only.

References

[1]
Bea Alex, Claire Grover, Ewan Klein, and Richard Tobin. 2012. Digitised Historical Text: Does it have to be mediOCRe?. In Proceedings of KONVENS 2012 (LThist 2012 workshop). Vienna, 401--409.
[2]
Muhammad R. Asif, Qi Chun, Sajid Hussain, Muhammed S. Fareeq, and Subhan Khan. 2017. Multinational vehicle license plate detection in complex backgrounds. Journal of Visual Communication and Image Representation 6, C (2017), 176--186. https://doi.org/10.1016/j.jvcir.2017.03.020
[3]
Youssef Bassil and Mohammad Alwani. 2012. OCR Post-processing Error Correction Algorithm Using Google's Spelling Suggestion. Journal of Emerging Trends in Computing and Information S 3, 1 (Jan. 2012). http://arxiv.org/abs/1204.0191
[4]
Wojciech Bieniecki, Szymon Grabowski, and Wojciech Rozenberg. 2007. Image Preprocessing for Improving OCR Accuracy. In MEMSTEC'07. Lviv-Polyana, 75--80.
[5]
Mohamed Cheriet, Nawwaf Kharma, Cheng-Lin Liu, and Ching Y. Suen. 2007. Character Recognition Systems: A Guide for Students and Practiotioners (1st. ed.). Wiley, New Jersey.
[6]
Chandra K. Dewa, Amanda L. Fadhilah, and Afiahayati. 2018. Convolutional Neural Networks for Handwritten Javanese Character Recognition. Indonesian Journal of Computing and Cybernetics Systems 12, 1 (Jan 2018), 83--94. https://doi.org/10.22146/ijccs.31144
[7]
Maruf A. Dhali, Seng He, Mladen Popović, Eibert Tigschelaar, and Lambert Schomaker. 2017. A Digital Palaeographic Approach towards Writer Identification in the Dead Sea Scrolls. In Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods (ICPRAM), Vol. 1. Porto, Portugal, 693--702. https://doi.org/10.5220/0006249706930702
[8]
Rudy Adipranata Gregorius S. Budhi. 2015. Handwritten Javanese Characters Recognition Using Several Artificial Neural Network Methods. Journal of ICT Research and Applications 8, 3 (2015), 195--212.
[9]
Karez A. Hamad and Mehmet Kaya. 2016. A Detailed Analysis of Optical Character Recognition Technology. International Journal of Applied Mathematics, Electronics and Computers 4 (2016), 244--249.
[10]
Mahdi Hamdani, Haikal El Abed, Monji Kherallah, and Adel M. Alimi. 2009. Combining Multiple HMMs Using On-line and Off-line Features for Off-line Arabic Handwriting Recognition. In 10th International Conference on Document Analysis and Recognition. IEEE Computer Society, 201--205.
[11]
Brian Karundeng, Kho I. Eng, and Anto S. Nugroho. 2009. An Evaluation of Feature Extraction Aalgorithm for Automatic Language Transcription System for Acient Handwriting Javanese Manuscripts. In Proceeding of drd International Seminar on Industrial Engineering and Management. Bali, G59-G65.
[12]
Lucia D. Krisnawati and Aditya W. Mahastama. 2018. A Javanese Syllabifier Based on its Orthographic Forms. In 2018 International Conference on Asian Language Processing, IALP. Bandung, Indonesia, 244--249. https://doi.org/10.1109/IALP.2018.8629173
[13]
Changqing Liu. 2012. On Tangut Historical Documents Recognition. In International Conference on Medical Physics and Biomedical Engineering (Pysics Procedia), Vol. 3. Elsevier, 1212--1216.
[14]
Jean-Marc Ogier Made W. A. Kesiman, Jean-Christophe Burie. 2016. A New Scheme for Text Line and Character Segmentation from Gray Scale Images of Palm Leaf Manuscript. In 15th International Conference on Frontiers in Handwriting Recognition 2016. Shenzhen, China, 32--330.
[15]
Aditya W. Mahastama and Lucia D. Krisnawati. 2019. Improving Projection Profile for Segmenting Characters from Javanese Manuscripts. In International Conference on Intermedia Arts and Creative Technology (CreativeArts). Yogya karta.
[16]
R. Manmatha and Toni M. Rath. 2003. Indexing of Handwritten Historical Documents -Recent Progress. Retrieved May 18, 2019 from https://pdfs.semanticscholar.org/47a3/de4595eb1d486b9283e415356bc0c322b462.pdf
[17]
Marcin Namysl and Iuliu Konya. 2019. Efficient, Lexicon-Free OCR using Deep Learning. CoRR abs/1906.01969 (2019). http://arxiv.org/abs/1906.01969
[18]
Nazaruddin and Sayed Muchallil. 2017. Comparison Online to Offline Handwritten Jawi Character Recognition Application. Indian Journal of Science and Technology 10, 12 (2017), 1--5. www.indjst.org
[19]
Toni M. Rath and R. Manmatha. 2003. Features for Word Spotting in Historical Manuscripts. In Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR '03). IEEE Computer Society, 218--222.
[20]
Christian Reul, Marco Dittrich, and Martin Gruner. 2017. Case Study of a highly automated Layout Analysis and OCR of an incunabulum: 'Der Heiligen Leben' (1488). In Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage (DATeCH2017). ACM, G'ottingen, Germany, 155--160. https://doi.org/10.1145/3078081.3078098
[21]
Shai Shalev-Shwartz and Shai Ben-David. 2014. On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. Cambridge University Press, New York.
[22]
Olarik Surinta, Mahir F. Karaaba, Lambert R. B. Schomaker, and Marco A. Wiering. 2015. Recognition of Handwritten Characters Using Local Gradient Feature Descriptors. Engineering ApplicationsofArtificial Intelligence 45 (2015), 405--414.
[23]
Alaa Tharwata, Tarek Gaber, Abdelhameed Ibrahimd, and Aboul E. Hassanien. 2017. Linear Discriminant Analysis: A Detailed Tutorial. AI Communications 30, 2 (2017), 169--190. https://doi.org/10.3233/AIC-170729
[24]
Anastasia R. Widiarti and Phalita N Wastu. 2009. Javanese Character Recognition Using Hidden Markov Model. International Journal of Computer, Electrical, Automation, Control and Information Engineering 3, 9 (2009), 2201--2204.

Index Terms

  1. Building Classifier Models for on-off Javanese Character Recognition

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    iiWAS2019: Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services
    December 2019
    709 pages
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • JKU: Johannes Kepler Universität Linz
    • @WAS: International Organization of Information Integration and Web-based Applications and Services

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 February 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Javanese Characters
    2. Linear Discriminant Analysis
    3. Optical Character Recognition
    4. Support Vector Machine
    5. k-NN

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • Wikimedia Indonesia

    Conference

    iiWAS2019

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 52
      Total Downloads
    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media