Abstract
We report on the XRCE participation to the Structure Extraction task of the INEX/ICDAR Book Structure Extraction 2011. We wanted to assess a simple method for structuring a book: using leading and trailing page whitespace. The detection of such large whitespace occurring at the top of leading pages and at the bottom of trailing pages is based on the detection of the type area zone. Evaluation shows as expected a very good precision. Since this approach aims at detecting high level book structures (parts, chapters), structures not marked a page break are not detected (thus a lower recall).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Tschichold, J.: The form of the book: essays on the morality of good design. Hartley & Marks, Point Roberts (1991)
Shafait, F., van Beusekom, J., Keysers, D., Breuel, T.M.: Document cleanup using frame detection. International Journal of Document Analysis and Recognition 11, 81–96 (2008)
Déjean, H., Meunier, J.-L.: A System for Converting PDF Documents into Structured XML Format. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 129–140. Springer, Heidelberg (2006)
Déjean, H., Meunier, J.-L.: Reflections on the INEX structure extraction competition, Boston. In: Document Analysis Systems, pp. 301–308 (2010)
Giguet, E., Baudrillart, A., Lucas, N.: Resurgence for the Book Structure Extraction Competition. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2009 Workshop Pre-Proceedings (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Déjean, H. (2012). Using Page Breaks for Book Structuring. In: Geva, S., Kamps, J., Schenkel, R. (eds) Focused Retrieval of Content and Structure. INEX 2011. Lecture Notes in Computer Science, vol 7424. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35734-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-35734-3_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35733-6
Online ISBN: 978-3-642-35734-3
eBook Packages: Computer ScienceComputer Science (R0)