Abstract
The purpose of document structure analysis is to get the document structure of the source text. Document structure is defined as 3 layers in the paper. A new model of document structure analysis — DLM is proposed. The model is composed of three layers: physical structure layer, logical structure layer and semantic structure layer, which are corresponding to the definition of the document structure. The input, output and operation of each layer are illustrated in details in the paper. The model has the feature of flexible, systematic and extendible. DLM is implemented on the Automatic Summarization System. It shows that the model is feasible and good result can be achieved.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Summers, K.M.: Automatic Discovery of Logical Document Structure. Doctor Dissertation of Cornell University (1998)
Salton, G., Allan, J., Singhal, A.: Automatic Text Decomposition and Structure. Information Processing & Management 32(2), 127–138 (1996)
Salton, G., Singhal, A., Mitra, M., Buckley, C.: Automatic Text Structure and Summarization. Information Processing & Management 33(2), 193–207 (1997)
Hearst, M.A.: TextTiling: A Quantitative Approach to Discourse Segmentation., http://www.sims.berkeley.edu/~hearst/papers/tiling-tr93.ps
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, Z., Wang, Y., Gao, K. (2005). A New Model of Document Structure Analysis. In: Wang, L., Jin, Y. (eds) Fuzzy Systems and Knowledge Discovery. FSKD 2005. Lecture Notes in Computer Science(), vol 3614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11540007_81
Download citation
DOI: https://doi.org/10.1007/11540007_81
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28331-7
Online ISBN: 978-3-540-31828-6
eBook Packages: Computer ScienceComputer Science (R0)