[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2595188.2595219acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdatechConference Proceedingsconference-collections
research-article

Using ancestral layout models for document digitization

Published: 19 May 2014 Publication History

Abstract

In this article, we show how some concepts found in traditional and old layout practices used to layout text (ruling, grid) can improve document digitization. We will first present these basic layout methods, some used since the Antiquity, and explain how some of their key concepts can be 'translated' and used in today's document digitization. In particular, we will show that the traditional concept of type area is a key notion for modeling document layout. An algorithm to compute type area is detailed. We will then illustrate this work with several practical usages and evaluations, from OCR improvement to high-level logical segmentation.

References

[1]
Brown, M. P., 1994. Understanding Illuminated Manuscripts: a guide to technical terms, Getty publication.
[2]
Shailor, B. A. The Medieval Book: Illustrated from the Beinecke Rare Book and Manuscript Library,
[3]
Parker, R. C. 1988. The Aldus Guide to Basic Design, Aldus Corporation.
[4]
Müller Brockman, J. 1981. Grid Systems in graphic design, Niggli.
[5]
Tschichold, J. 1991. Consistent correlation between book Page and Type Area. The Form of the Book: Essays on the Morality of Good Design, Point Roberts, Wa: Hartley & Marks, 36--93.
[6]
Shafait, F. 2008. Geometric Layout Analysis of Scanned Documents, PhD thesis, Technical University of Kaiserslautern.
[7]
Gander, L., Mühlberger G., 2011. D-EE4.2d functional Extension Parser, IMPACT Deliverable D-EE 4.2d.
[8]
Déjean, H. and Meunier, J-L. 2008. Versatile page number analysis, Document Recognition and Retrieval XV. Edited by Yanikoglu, Berrin A.; Berkner, Kathrin. Proceedings of the SPIE, Volume 6815, pp. 68150K-68150K-9.
[9]
Kazai, G., Doucet, A., Koolen, M. and Landoni, M. 2009. Overview of the INEX 2009 Book Track, INEX Workshop pre-proceedings, 120--129. https://doucet.users.greyc.fr/StructureExtraction
[10]
Déjean, H. 2011. Using Page Breaks for Book Structuring. INEX Workshop proceedings, 57--67.
[11]
Leroy J. 1977. Les types de réglure des manuscrits grecs. Paris.
[12]
Gilissen, L. 1977. Prolégomènes la codicologie. Recherches sur la construction des cahiers et la mise en page des manuscrits médiévaux. Gand, Éditions scientifiques Story-Scientia.
[13]
Baudin F. 1994. L'Effet Gutenberg, Paris, Éditions du Cercle de la librairie.
[14]
Jacobs, C., Li, W., Schrier, E., Bargeron, D. and Salesin, D. 2003. Adaptive grid-based document layout. ACM Trans. Graph., 22(3):838--847.
[15]
Hurst, N., Li, W., and Marriott, K. 2009. Review of automatic document formatting. In: Proceedings of the 9th ACM symposium on Document engineering. ACM, 99--108.
[16]
Prima dataset, http://www.prima.cse.salford.ac.uk:8080/dataset

Index Terms

  1. Using ancestral layout models for document digitization

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    DATeCH '14: Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage
    May 2014
    200 pages
    ISBN:9781450325882
    DOI:10.1145/2595188
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    • Succeed: The Support Action Centre of Competence in Digitisation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 May 2014

    Check for updates

    Author Tags

    1. document layout analysis
    2. layout model
    3. margins
    4. palaeography
    5. spaces
    6. type area

    Qualifiers

    • Research-article

    Conference

    DATeCH 2014
    Sponsor:
    • Succeed

    Acceptance Rates

    DATeCH '14 Paper Acceptance Rate 31 of 49 submissions, 63%;
    Overall Acceptance Rate 60 of 86 submissions, 70%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 85
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media