More Web Proxy on the site http://driver.im/

short-paper

Optimum Deep Learning Method for Document Layout Analysis in Low Resource Languages

Authors:

Md. Mutasim Billah Abu Noman Akanda,

AKM Shahariar Azad Rabby,

Fuad RahmanAuthors Info & Claims

ACMSE '24: Proceedings of the 2024 ACM Southeast Conference

Pages 199 - 204

https://doi.org/10.1145/3603287.3651184

Published: 27 April 2024 Publication History

Abstract

Document Layout Analysis (DLA) has become a crucial process in digitizing documents. Today, has become increasingly important to properly understand a digital document to get insights on the structure and contents of the document. DLA combines different techniques of image processing, computer vision, and natural language processing to help us perform various tasks such as character recognition, document classification, information retrieval, content summarization, document restructuring, etc. Gathering proper insights into the layout of a document is important to detect the identity of each element and its relationship. There have been many major Deep Learning based DLA algorithms invented recently which obtained impressive results in publicly available high-resource languages like English. However, there has been a significant shortage of available information on the effectiveness of Deep Learning based DLA approaches for low-resource languages. This paper investigates these state-of-the-art deep learning-based DLA approaches - DiT, LayoutLMv3, and YOLOv8 [9] to find the optimal approach for low-resource and grapheme-based languages like Bengali. We found out that YOLOv8 [9] performs the best with 8.95% better IoU score than DiT and 38.48% better IoU score than LayoutLMv3 for DLA task in low resource and grapheme-based language.

References

[1]

Galal M Binmakhashen and Sabri A Mahmoud. 2019. Document Layout Analysis: a Comprehensive Survey. ACM Computing Surveys (CSUR) 52, 6 (2019), 1--36.

Digital Library

[2]

Samuele Capobianco, Leonardo Scommegna, and Simone Marinai. 2018. Historical Handwritten Document Segmentation by Using a Weighted Loss. In IAPR Workshop on Artificial Neural Networks in Pattern Recognition. Springer, 395--406.

Digital Library

[3]

Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision 88 (2010), 303--338.

Digital Library

[4]

Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2015. The KITTI Vision Benchmark Suite. URL http://www.cvlibs.net/datasets/kitti 2.5 (2015).

[5]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[6]

Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv preprint arXiv:1704.04861 (2017).

[7]

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. 2017. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[8]

Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, and Furu Wei. 2022. Layoutlmv3: Pre-training for document ai with unified text and image masking. In Proceedings of the 30th ACM International Conference on Multimedia. 4083--4091.

Digital Library

[9]

Glenn Jocher, Ayush Chaurasia, and Jing Qiu. 2023. Ultralytics YOLO. https://github.com/ultralytics/ultralytics

[10]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems 25 (2012).

Digital Library

[11]

Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, et al. 2020. The Open Images Dataset v4: Unified Image Classification, Object Detection, and Visual Relationship Detection at Scale. International Journal of Computer Vision 128, 7 (2020), 1956--1981.

[12]

Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, and Furu Wei. 2022. Dit: Self-supervised Pre-training for Document Image Transformer. In Proceedings of the 30th ACM International Conference on Multimedia. 3530--3539.

Digital Library

[13]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Computer Vision-ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V. Springer International Publishing.

[14]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single Shot MultiBox Detector. In Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part I, Vol. 14. Springer International Publishing.

[15]

Simone Marinai, Marco Gori, and Giovanni Soda. 2005. Artificial Neural Networks for Document Analysis and Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 1 (2005), 23--35.

Digital Library

[16]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[17]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems. Vol. 28.

[18]

Frank Y. Shih and Shy-Shyan Chen. 1996. Adaptive Document Block Segmentation and Classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 26, 5 (1996), 797--802.

Digital Library

[19]

Md Istiak Hossain Shihab, Md Rakibul Hasan, Mahfuzur Rahman Emon, Syed Mobassir Hossen, Md Nazmuddoha Ansary, Intesur Ahmed, Fazle Rabbi Rakib, Shahriar Elahi Dhruvo, Souhardya Saha Dip, Akib Hasan Pavel, et al. 2023. Badlad: A Large Multi-domain Bengali Document Layout Analysis Dataset. In International Conference on Document Analysis and Recognition. Springer, 326--341.

[20]

Juan Terven and Diana Cordova-Esparza. 2023. A Comprehensive Review of YOLO: From YOLOv1 to YOLOv8 and Beyond. arXiv preprint arXiv:2304.00501 (2023).

[21]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. Advances in Neural Information Processing Systems 30 (2017).

[22]

Hao Wei, Micheal Baechler, Fouad Slimane, and Rolf Ingold. 2013. Evaluation of SVM, MLP and GMM Classifiers for Layout Analysis of Historical Documents. In The 12th International Conference on Document Analysis and Recognition. IEEE, 1220--1224.

[23]

Zhiheng Xu, Jie Tang, and Antonio J. Yepes. 2019. PubLayNet: Largest Dataset ever for Document Layout Analysis. arXiv (Cornell University) (2019). https://doi.org/10.48550/arxiv.1908.07836 arXiv:arxiv.1908.07836

[24]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired Image-to-image Translation Using Cycle-consistent Adversarial Networks. (2017), 2223--2232.

[25]

Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2018. Learning Transferable Architectures for Scalable Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

Index Terms

Optimum Deep Learning Method for Document Layout Analysis in Low Resource Languages
1. Applied computing
  1. Document management and text processing
    1. Document capture
      1. Document analysis

Recommendations

DiT: Self-supervised Pre-training for Document Image Transformer
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Image Transformer has recently achieved significant progress for natural image understanding, either using supervised (ViT, DeiT, etc.) or self-supervised (BEiT, MAE, etc.) pre-training techniques. In this paper, we propose DiT, a self-supervised pre-...
Document Layout Analysis: A Comprehensive Survey

Document layout analysis (DLA) is a preprocessing step of document understanding systems. It is responsible for detecting and annotating the physical structure of documents. DLA has several important applications such as document retrieval, content ...
Automatic wordnet development for low-resource languages using cross-lingual WSD

Wordnets are an effective resource for natural language processing and information retrieval, especially for semantic processing and meaning related tasks. So far, wordnets have been constructed for many languages. However, the automatic development of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ACMSE '24: Proceedings of the 2024 ACM Southeast Conference

April 2024

337 pages

ISBN:9798400702372

DOI:10.1145/3603287

Organizing Chair:
Dan Lo,
Program Chair:
Eric Gamess

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

ACM: Association for Computing Machinery

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 April 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Conference

ACM SE '24

Sponsor:

ACM

ACM SE '24: 2024 ACM Southeast Conference

April 18 - 20, 2024

GA, Marietta, USA

Acceptance Rates

ACMSE '24 Paper Acceptance Rate 44 of 137 submissions, 32%;

Overall Acceptance Rate 502 of 1,023 submissions, 49%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
74
Total Downloads

Downloads (Last 12 months)74
Downloads (Last 6 weeks)3

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents