[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Line Graphics Digitization: A Step Towards Full Automation

  • Conference paper
  • First Online:
Document Analysis and Recognition - ICDAR 2023 (ICDAR 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14191))

Included in the following conference series:

Abstract

The digitization of documents allows for wider accessibility and reproducibility. While automatic digitization of document layout and text content has been a long-standing focus of research, this problem in regard to graphical elements, such as statistical plots, has been under-explored. In this paper, we introduce the task of fine-grained visual understanding of mathematical graphics and present the Line Graphics (LG) dataset, which includes pixel-wise annotations of 5 coarse and 10 fine-grained categories. Our dataset covers 520 images of mathematical graphics collected from 450 documents from different disciplines. Our proposed dataset can support two different computer vision tasks, i.e., semantic segmentation and object detection. To benchmark our LG dataset, we explore 7 state-of-the-art models. To foster further research on the digitization of statistical graphs, we will make the dataset, code and models publicly available to the community.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 95.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 119.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.statista.com/statistics/871513/worldwide-data-created/.

  2. 2.

    https://github.com/open-mmlab/mmsegmentation.

References

  1. Amin, A., Shiu, R.: Page segmentation and classification utilizing bottom-up approach. Int. J. Image Graph. 1(02), 345–361 (2001)

    Article  Google Scholar 

  2. Bajić, F., Orel, O., Habijan, M.: A multi-purpose shallow convolutional neural network for chart images. Sensors 22(20), 7695 (2022)

    Article  Google Scholar 

  3. Breuel, T.M.: Robust, simple page segmentation using hybrid convolutional MDLSTM networks. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 733–740. IEEE (2017)

    Google Scholar 

  4. Chen, K., Seuret, M., Hennebert, J., Ingold, R.: Convolutional neural networks for page segmentation of historical document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 965–970. IEEE (2017)

    Google Scholar 

  5. Chen, K., Seuret, M., Liwicki, M., Hennebert, J., Ingold, R.: Page segmentation of historical document images with convolutional autoencoders. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1011–1015. IEEE (2015)

    Google Scholar 

  6. Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with Atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_49

    Chapter  Google Scholar 

  7. Chintalapati, S., Bragg, J., Wang, L.L.: A dataset of alt texts from HCI publications: analyses and uses towards producing more descriptive alt texts of data visualizations in scientific papers. arXiv preprint arXiv:2209.13718 (2022)

  8. Choi, J., Jung, S., Park, D.G., Choo, J., Elmqvist, N.: Visualizing for the non-visual: enabling the visually impaired to use visualization. In: Computer Graphics Forum, pp. 249–260. Wiley Online Library (2019)

    Google Scholar 

  9. Clark, C., Divvala, S.: PDFFigures 2.0: mining figures from research papers. In: Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, pp. 143–152 (2016)

    Google Scholar 

  10. Clausner, C., Antonacopoulos, A., Pletschacher, S.: ICDAR 2017 competition on recognition of documents with complex layouts-RDCL2017. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1404–1410. IEEE (2017)

    Google Scholar 

  11. Dai, W., Wang, M., Niu, Z., Zhang, J.: Chart decoder: generating textual and numeric information from chart images automatically. J. Vis. Lang. Comput. 48, 101–109 (2018)

    Article  Google Scholar 

  12. Davila, K., et al.: ICDAR 2019 competition on harvesting raw tables from infographics (chart-infographics). In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1594–1599. IEEE (2019)

    Google Scholar 

  13. Davila, K., Setlur, S., Doermann, D., Kota, B.U., Govindaraju, V.: Chart mining: a survey of methods for automated chart analysis. IEEE Trans. Pattern Anal. Mach. Intell. 43(11), 3799–3819 (2020)

    Article  Google Scholar 

  14. Davila, K., Tensmeyer, C., Shekhar, S., Singh, H., Setlur, S., Govindaraju, V.: ICPR 2020 - competition on harvesting raw tables from infographics. In: Del Bimbo, A., et al. (eds.) ICPR 2021. LNCS, vol. 12668, pp. 361–380. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68793-9_27

    Chapter  Google Scholar 

  15. Drivas, D., Amin, A.: Page segmentation and classification utilising a bottom-up approach. In: Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 2, pp. 610–614. IEEE (1995)

    Google Scholar 

  16. Guo, M.H., Lu, C.Z., Hou, Q., Liu, Z., Cheng, M.M., Hu, S.M.: SegNeXt: rethinking convolutional attention design for semantic segmentation. arXiv preprint arXiv:2209.08575 (2022)

  17. Ha, J., Haralick, R.M., Phillips, I.T.: Document page decomposition by the bounding-box project. In: Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 2, pp. 1119–1122. IEEE (1995)

    Google Scholar 

  18. Haurilet, M., Al-Halah, Z., Stiefelhagen, R.: SPaSe-multi-label page segmentation for presentation slides. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 726–734. IEEE (2019)

    Google Scholar 

  19. Haurilet, M., Roitberg, A., Martinez, M., Stiefelhagen, R.: Wise-slide segmentation in the wild. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 343–348. IEEE (2019)

    Google Scholar 

  20. Howard, A., et al.: Searching for MobileNetV3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)

    Google Scholar 

  21. Huang, Z., et al.: ICDAR 2019 Competition On Scanned Receipt OCR and information extraction. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1516–1520. IEEE (2019)

    Google Scholar 

  22. Jobin, K., Mondal, A., Jawahar, C.: DocFigure: a dataset for scientific document figure classification. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 1, pp. 74–79. IEEE (2019)

    Google Scholar 

  23. Keefer, R., Bourbakis, N.: From image to XML: monitoring a page layout analysis approach for the visually impaired. Int. J. Monit. Surveill. Technol. Res. (IJMSTR) 2(1), 22–43 (2014)

    Google Scholar 

  24. Li, P., Jiang, X., Shatkay, H.: Figure and caption extraction from biomedical documents. Bioinformatics 35(21), 4381–4388 (2019)

    Article  Google Scholar 

  25. Liu, X., Klabjan, D., NBless, P.: Data extraction from charts via single deep neural network. arXiv preprint arXiv:1906.11906 (2019)

  26. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: ICCV (2021)

    Google Scholar 

  27. Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: TextSnake: a flexible representation for detecting text of arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 19–35. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_2

    Chapter  Google Scholar 

  28. Methani, N., Ganguly, P., Khapra, M.M., Kumar, P.: PlotQA: reasoning over scientific plots. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1527–1536 (2020)

    Google Scholar 

  29. Poco, J., Heer, J.: Reverse-engineering visualizations: recovering visual encodings from chart images. In: Computer Graphics Forum, pp. 353–363. Wiley Online Library (2017)

    Google Scholar 

  30. Seweryn, K., Lorenc, K., Wróblewska, A., Sysko-Romańczuk, S.: What will you tell me about the chart? – automated description of charts. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds.) ICONIP 2021. CCIS, vol. 1516, pp. 12–19. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-92307-5_2

    Chapter  Google Scholar 

  31. Siegel, N., Lourie, N., Power, R., Ammar, W.: Extracting scientific figures with distantly supervised neural networks. In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, pp. 223–232 (2018)

    Google Scholar 

  32. Wang, J., et al.: Deep high-resolution representation learning for visual recognition. TPAMI 43, 3349–3364 (2021)

    Google Scholar 

  33. Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: SegFormer: simple and efficient design for semantic segmentation with transformers. In: NeurIPS (2021)

    Google Scholar 

  34. Yang, X., Yumer, E., Asente, P., Kraley, M., Kifer, D., Lee Giles, C.: Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5315–5324 (2017)

    Google Scholar 

  35. Yoshitake, M., Kono, T., Kadohira, T.: Program for automatic numerical conversion of a line graph (line plot). J. Comput. Chem. Jpn. 19(2), 25–35 (2020)

    Article  Google Scholar 

  36. Zhang, J., Ma, C., Yang, K., Roitberg, A., Peng, K., Stiefelhagen, R.: Transfer beyond the field of view: dense panoramic semantic segmentation via unsupervised domain adaptation. IEEE Trans. Intell. Transp. Syst. 23(7), 9478–9491 (2021)

    Article  Google Scholar 

  37. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)

    Google Scholar 

Download references

Acknowledgments

This work was supported in part by the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie Grant No. 861166, in part by the Ministry of Science, Research and the Arts of Baden-Württemberg (MWK) through the Cooperative Graduate School Accessibility through AI-based Assistive Technology (KATE) under Grant BW6-03, and in part by the Federal Ministry of Education and Research (BMBF) through a fellowship within the IFI programme of the German Academic Exchange Service (DAAD). This work was partially performed on the HoreKa supercomputer funded by the MWK and by the Federal Ministry of Education and Research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Omar Moured .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Moured, O., Zhang, J., Roitberg, A., Schwarz, T., Stiefelhagen, R. (2023). Line Graphics Digitization: A Step Towards Full Automation. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14191. Springer, Cham. https://doi.org/10.1007/978-3-031-41734-4_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-41734-4_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-41733-7

  • Online ISBN: 978-3-031-41734-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics