Dataset Generation for Gujarati Language Using Handwritten Character Images

Sanket B. Suthar¹ &
Amit R. Thakkar²

99 Accesses
Explore all metrics

Abstract

In pattern recognition, the handwritten character recognition (HCR) is considered as the classical challenge. In particular, the benchmark dataset for HCR in the Gujarati language is limited. To overcome this challenge, a proper dataset is required for experimentation. Hence, this work introduces dataset generation for the Gujarati language using pre-processing and classification techniques. Initially, the handwritten data is collected from various native Gujarati writers. In this work, there are three processes carried out to generate the dataset. Initially, the pre-processing stages like a selection of image, noise removal, normalization, conversion of integer value to double, grayscale image into a binary image, dimensionality reduction, and vector conversation are performed. Then, the pre-processed image is segmented using line segmentation, character segmentation and word segmentation. Finally, the data are classified using a Convolutional neural network (CNN). The kappa and FPR (False Positive Rate) values achieved by the CNN are 0.981 and 0.189.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

A New Comprehensive Dataset and Deep Learning Approach for Devanagari Handwritten Character Recognition with Special Attention to Compound Characters

Bangla Handwritten Character Recognition Using Convolutional Neural Network

Performance Analysis of Gujarati Script Recognition Using Multiclass and Multilabel Classification

Data Availability

Data sharing is not applicable to this article.

References

Sharma, M. K., & Dhaka, V. P. (2016). Segmentation of English offline handwritten cursive scripts using a feedforward neural network. Neural Computing and Applications, 27, 1369–1379.
Article Google Scholar
Gaur, S., Sonkar, S., & Roy, P. P. (2015). Generation of synthetic training data for handwritten Indic script recognition. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 491–495. IEEE.
Rabi, M., Amrouch, M., & Mahani, Z. (2018). Recognition of cursive Arabic handwritten text using embedded training based on hidden Markov models. International Journal of Pattern Recognition and Artificial Intelligence, 32(01), 1860007.
Article Google Scholar
Varga, T., & Bunke, H. (2004). Off-line handwritten text line recognition using a mixture of natural and synthetic training data. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR, 2004(2), pp. 545–549. IEEE.
Grother, P. J. (1995). NIST special database 19. Hand printed forms and characters database, National Institute of Standards and Technology, 10, 69.
Google Scholar
Marti, U. V., & Bunke, H. (2002). The IAM-database: An English sentence database for offline handwriting recognition. International Journal on Document Analysis and Recognition, 5, 39–46.
Article Google Scholar
Al-Ohali, Y., Cheriet, M., & Suen, C. (2003). Databases for recognition of handwritten Arabic cheques. Pattern Recognition, 36(1), 111–121.
Article Google Scholar
Mahmoud, S. A., Ahmad, I., Alshayeb, M., Al-Khatib, W. G., Parvez, M. T., Fink, G. A., & El Abed, H. (2012). Khatt: Arabic offline handwritten text database. In 2012 International conference on frontiers in handwriting recognition, pp. 449–454. IEEE.
Liu, C. L., Yin, F., Wang, D. H., & Wang, Q. F. (2011). CASIA online and offline Chinese handwriting databases. In 2011 international conference on document analysis and recognition, pp. 37–41. IEEE.
Su, T., Zhang, T., & Guan, D. (2007). Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text. International Journal of Document Analysis and Recognition (IJDAR), 10, 27–38.
Article Google Scholar
Rajyagor, B., & Rakholia, R. (2021). Isolated Gujarati Handwritten Character Recognition (HCR) using Deep Learning (LSTM). In 2021 Fourth International Conference on Electrical, Computer and Communication Technologies (ICECCT), pp. 1–6. IEEE.
Rajyagor, B., & Rakholia, R. (2021). Tri-level handwritten text segmentation techniques for Gujarati language. Indian Journal of Science and Technology, 14(7), 618–627.
Article Google Scholar
Jain, A. A., & Arolkar, H. A. (2021). A Study of Gujarati Character Recognition. In Proceedings of International Conference on Communication and Computational Technologies: ICCCT-2019, pp. 229–239. Springer Singapore.
Borad, P., Dethaliya, P., & Mehta, A. (2020). Augmentation based Convolutional Neural Network for recognition of Handwritten Gujarati Characters. In 2020 IEEE International Conference for Innovation in Technology (INOCON), pp. 1–4. IEEE.
Chaudhari, S., & Gulati, R. M. (2016). Script identification using Gabor feature and SVM classifier. Procedia Computer Science, 79, 85–92.
Article Google Scholar
Hassan, E., Garg, R., Chaudhury, S., & Gopal, M. (2011). Script based text identification: a multi-level architecture. In Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data, pp. 1–8.
Manjusha, K., Kumar, M. A., & Soman, K. P. (2019). On developing handwritten character image database for Malayalam language script. Engineering Science and Technology, an International Journal, 22(2), 637–645.
Article Google Scholar
Uddin, I., Ramli, D. A., Khan, A., Bangash, J. I., Fayyaz, N., Khan, A., & Kundi, M. (2021). Benchmark pashto handwritten character dataset and pashto object character recognition (OCR) using deep neural network with rule activation function. Complexity, 2021, 1–16.
Article Google Scholar
Bin Ahmed, S., Naz, S., Swati, S., Razzak, I., Umar, A. I., & Ali Khan, A. (2017). UCOM offline dataset-an Urdu handwritten dataset generation. The international Arab journal of information technology, 14(2), 239–245.
Google Scholar
Singh, P. K., Sarkar, R., Das, N., Basu, S., Kundu, M., & Nasipuri, M. (2018). Benchmark databases of handwritten Bangla-Roman and Devanagari-Roman mixed-script document images. Multimedia Tools and Applications, 77, 8441–8473.
Article Google Scholar
Singh, H., Sharma, R. K., Kumar, R., Verma, K., Kumar, R., & Kumar, M. (2020). A benchmark dataset of online handwritten gurmukhi script words and numerals. In Computer Vision and Image Processing: 4th International Conference, CVIP 2019, Jaipur, India, September 27–29, 2019, Revised Selected Papers, Part II, 4, pp. 457–466. Springer Singapore.
Pareek, J., Singhania, D., Kumari, R. R., & Purohit, S. (2020). Gujarati handwritten character recognition from text images. Procedia Computer Science, 171, 514–523.
Article Google Scholar
Sorathiya, D. R. (2021). Gujarati Handwritten Character Recognition using Convolution Neural Network (Doctoral dissertation, Dublin, National College of Ireland).
Rajyagor, B., & Rakholia, R. (2021). Isolated Gujarati handwritten character recognition (HCR) using deep learning (LSTM). In 2021 Fourth International Conference on Electrical, Computer and Communication Technologies (ICECCT). pp. 1–6. IEEE.

Download references

Funding

No funding is provided for the preparation of manuscript.

Author information

Authors and Affiliations

Department of Information Technology, Chandubhai S. Patel Institute of Technology (CSPIT), Faculty of Technology & Engineering (FTE), Charotar University of Science and Technology (CHARUSAT), Changa, Anand, India
Sanket B. Suthar
Department of Computer Science and Engineering, Chandubhai S. Patel Institute of Technology (CSPIT), Faculty of Technology & Engineering (FTE), Charotar University of Science and Technology (CHARUSAT), Changa, Anand, India
Amit R. Thakkar

Authors

Sanket B. Suthar
View author publications
You can also search for this author in PubMed Google Scholar
Amit R. Thakkar
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Sanket B. Suthar or Amit R. Thakkar.

Ethics declarations

Conflict of interest

Authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent to Participate

All the authors involved have agreed to participate in this submitted article.

Consent to Publish

All the authors involved in this manuscript give full consent for publication of this submitted article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Suthar, S.B., Thakkar, A.R. Dataset Generation for Gujarati Language Using Handwritten Character Images. Wireless Pers Commun 136, 2163–2184 (2024). https://doi.org/10.1007/s11277-024-11369-9

Download citation

Accepted: 11 June 2024
Published: 01 July 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s11277-024-11369-9

Dataset Generation for Gujarati Language Using Handwritten Character Images

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A New Comprehensive Dataset and Deep Learning Approach for Devanagari Handwritten Character Recognition with Special Attention to Compound Characters

Bangla Handwritten Character Recognition Using Convolutional Neural Network

Performance Analysis of Gujarati Script Recognition Using Multiclass and Multilabel Classification

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Ethical Approval

Consent to Participate

Consent to Publish

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Dataset Generation for Gujarati Language Using Handwritten Character Images

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A New Comprehensive Dataset and Deep Learning Approach for Devanagari Handwritten Character Recognition with Special Attention to Compound Characters

Bangla Handwritten Character Recognition Using Convolutional Neural Network

Performance Analysis of Gujarati Script Recognition Using Multiclass and Multilabel Classification

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Ethical Approval

Consent to Participate

Consent to Publish

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now