[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Dataset Generation for Gujarati Language Using Handwritten Character Images

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

In pattern recognition, the handwritten character recognition (HCR) is considered as the classical challenge. In particular, the benchmark dataset for HCR in the Gujarati language is limited. To overcome this challenge, a proper dataset is required for experimentation. Hence, this work introduces dataset generation for the Gujarati language using pre-processing and classification techniques. Initially, the handwritten data is collected from various native Gujarati writers. In this work, there are three processes carried out to generate the dataset. Initially, the pre-processing stages like a selection of image, noise removal, normalization, conversion of integer value to double, grayscale image into a binary image, dimensionality reduction, and vector conversation are performed. Then, the pre-processed image is segmented using line segmentation, character segmentation and word segmentation. Finally, the data are classified using a Convolutional neural network (CNN). The kappa and FPR (False Positive Rate) values achieved by the CNN are 0.981 and 0.189.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data Availability

Data sharing is not applicable to this article.

References

  1. Sharma, M. K., & Dhaka, V. P. (2016). Segmentation of English offline handwritten cursive scripts using a feedforward neural network. Neural Computing and Applications, 27, 1369–1379.

    Article  Google Scholar 

  2. Gaur, S., Sonkar, S., & Roy, P. P. (2015). Generation of synthetic training data for handwritten Indic script recognition. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 491–495. IEEE.

  3. Rabi, M., Amrouch, M., & Mahani, Z. (2018). Recognition of cursive Arabic handwritten text using embedded training based on hidden Markov models. International Journal of Pattern Recognition and Artificial Intelligence, 32(01), 1860007.

    Article  Google Scholar 

  4. Varga, T., & Bunke, H. (2004). Off-line handwritten text line recognition using a mixture of natural and synthetic training data. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR, 2004(2), pp. 545–549. IEEE.

  5. Grother, P. J. (1995). NIST special database 19. Hand printed forms and characters database, National Institute of Standards and Technology, 10, 69.

    Google Scholar 

  6. Marti, U. V., & Bunke, H. (2002). The IAM-database: An English sentence database for offline handwriting recognition. International Journal on Document Analysis and Recognition, 5, 39–46.

    Article  Google Scholar 

  7. Al-Ohali, Y., Cheriet, M., & Suen, C. (2003). Databases for recognition of handwritten Arabic cheques. Pattern Recognition, 36(1), 111–121.

    Article  Google Scholar 

  8. Mahmoud, S. A., Ahmad, I., Alshayeb, M., Al-Khatib, W. G., Parvez, M. T., Fink, G. A., & El Abed, H. (2012). Khatt: Arabic offline handwritten text database. In 2012 International conference on frontiers in handwriting recognition, pp. 449–454. IEEE.

  9. Liu, C. L., Yin, F., Wang, D. H., & Wang, Q. F. (2011). CASIA online and offline Chinese handwriting databases. In 2011 international conference on document analysis and recognition, pp. 37–41. IEEE.

  10. Su, T., Zhang, T., & Guan, D. (2007). Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text. International Journal of Document Analysis and Recognition (IJDAR), 10, 27–38.

    Article  Google Scholar 

  11. Rajyagor, B., & Rakholia, R. (2021). Isolated Gujarati Handwritten Character Recognition (HCR) using Deep Learning (LSTM). In 2021 Fourth International Conference on Electrical, Computer and Communication Technologies (ICECCT), pp. 1–6. IEEE.

  12. Rajyagor, B., & Rakholia, R. (2021). Tri-level handwritten text segmentation techniques for Gujarati language. Indian Journal of Science and Technology, 14(7), 618–627.

    Article  Google Scholar 

  13. Jain, A. A., & Arolkar, H. A. (2021). A Study of Gujarati Character Recognition. In Proceedings of International Conference on Communication and Computational Technologies: ICCCT-2019, pp. 229–239. Springer Singapore.

  14. Borad, P., Dethaliya, P., & Mehta, A. (2020). Augmentation based Convolutional Neural Network for recognition of Handwritten Gujarati Characters. In 2020 IEEE International Conference for Innovation in Technology (INOCON), pp. 1–4. IEEE.

  15. Chaudhari, S., & Gulati, R. M. (2016). Script identification using Gabor feature and SVM classifier. Procedia Computer Science, 79, 85–92.

    Article  Google Scholar 

  16. Hassan, E., Garg, R., Chaudhury, S., & Gopal, M. (2011). Script based text identification: a multi-level architecture. In Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data, pp. 1–8.

  17. Manjusha, K., Kumar, M. A., & Soman, K. P. (2019). On developing handwritten character image database for Malayalam language script. Engineering Science and Technology, an International Journal, 22(2), 637–645.

    Article  Google Scholar 

  18. Uddin, I., Ramli, D. A., Khan, A., Bangash, J. I., Fayyaz, N., Khan, A., & Kundi, M. (2021). Benchmark pashto handwritten character dataset and pashto object character recognition (OCR) using deep neural network with rule activation function. Complexity, 2021, 1–16.

    Article  Google Scholar 

  19. Bin Ahmed, S., Naz, S., Swati, S., Razzak, I., Umar, A. I., & Ali Khan, A. (2017). UCOM offline dataset-an Urdu handwritten dataset generation. The international Arab journal of information technology, 14(2), 239–245.

    Google Scholar 

  20. Singh, P. K., Sarkar, R., Das, N., Basu, S., Kundu, M., & Nasipuri, M. (2018). Benchmark databases of handwritten Bangla-Roman and Devanagari-Roman mixed-script document images. Multimedia Tools and Applications, 77, 8441–8473.

    Article  Google Scholar 

  21. Singh, H., Sharma, R. K., Kumar, R., Verma, K., Kumar, R., & Kumar, M. (2020). A benchmark dataset of online handwritten gurmukhi script words and numerals. In Computer Vision and Image Processing: 4th International Conference, CVIP 2019, Jaipur, India, September 27–29, 2019, Revised Selected Papers, Part II, 4, pp. 457–466. Springer Singapore.

  22. Pareek, J., Singhania, D., Kumari, R. R., & Purohit, S. (2020). Gujarati handwritten character recognition from text images. Procedia Computer Science, 171, 514–523.

    Article  Google Scholar 

  23. Sorathiya, D. R. (2021). Gujarati Handwritten Character Recognition using Convolution Neural Network (Doctoral dissertation, Dublin, National College of Ireland).

  24. Rajyagor, B., & Rakholia, R. (2021). Isolated Gujarati handwritten character recognition (HCR) using deep learning (LSTM). In 2021 Fourth International Conference on Electrical, Computer and Communication Technologies (ICECCT). pp. 1–6. IEEE.

Download references

Funding

No funding is provided for the preparation of manuscript.

Author information

Authors and Affiliations

Authors

Contributions

All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Sanket B. Suthar or Amit R. Thakkar.

Ethics declarations

Conflict of interest

Authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent to Participate

All the authors involved have agreed to participate in this submitted article.

Consent to Publish

All the authors involved in this manuscript give full consent for publication of this submitted article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Suthar, S.B., Thakkar, A.R. Dataset Generation for Gujarati Language Using Handwritten Character Images. Wireless Pers Commun 136, 2163–2184 (2024). https://doi.org/10.1007/s11277-024-11369-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-024-11369-9

Keywords

Navigation