[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Modeling, encoding and querying multi-structured documents

Published: 01 September 2012 Publication History

Abstract

The issue of multi-structured documents became prominent with the emergence of the digital Humanities field of practices. Many distinct structures may be defined simultaneously on the same original content for matching different documentary tasks. For example, a document may have both a structure for the logical organization of content (logical structure), and a structure expressing a set of content formatting rules (physical structure). In this paper, we present MSDM, a generic model for multi-structured documents, in which several important features are established. We also address the problem of efficiently encoding multi-structured documents by introducing MultiX, a new XML formalism based on the MSDM model. Finally, we propose a library of Xquery functions for querying MultiX documents. We will illustrate all the contributions with a use case based on a fragment of an old manuscript.

References

[1]
Modéliser la structuration multiple des documents. In: H2PTM, Hermès, Paris. pp. 253-258.
[2]
Efficient XQuery support for stand-off annotation. In: Proceedings of the 3rd international workshop on XQuery implementation, experience and perspectives, in cooperation with ACM SIGMOD, ACM, Chicago, USA.
[3]
Bański, P. (2010). Why TEI stand-off annotation doesn't quite work. In Balisage: The markup conference 2010. Montréal, Canada. <http://www.balisage.net/Proceedings/vol5/html/Banski01/BalisageVol5-Banski01.html>.
[4]
Bird, S., Buneman, P., &amp; Tan, W.-c. (2000). Towards a query language for Annotation Graphs. In Proceedings of the second international conference on language resources and evaluation (pp. 807-814). Athens, Greece.
[5]
Annotation Graphs as a framework for multidimensional linguistic data analysis. In: Towards standards and tools for discourse tagging: Proceedings of the workshop, Association for Computational Linguistics, Somerset, New Jersey. pp. 1-10.
[6]
Describing and querying hierarchical XML structures defined over the same textual data. In: Proceedings of the 2006 ACM symposium on document engineering, ACM, New York, NY, USA. pp. 147-154.
[7]
Calabretto, S., Bruno, E., &amp; Murisaco, E. (2007). Documents and multiple hierarchies: Towards multi structured XML documents. Internal Research Report.
[8]
Chatti, N. (2006). Documents multi-structurés: de la modélisation í l'exploitation. Thèse de Doctorat en Informatique. Lyon: INSA de Lyon.
[9]
MultiX: An XML-based formalism to encode multi-structured documents. Extreme Markup Languages, Montréal, Canada.
[10]
A framework for management of concurrent XML markup. Data &amp; Knowledge Engineering. v52 i2. 185-208.
[11]
Making CONCUR work. Extreme Markup Languages, Montréal, Canada.
[12]
Huitfeldt, C., &amp; Sperberg-McQueen, M. (2003). TexMECS an experimental markup meta-language for complex documents. Consulté le 01 2011. <http://decentius.aksis.uib.no/mlcd/2003/Papers/texmecs.html>
[13]
ISO 8879. (1986). Standard generalized markup language (SGML). In International organization for standardization (ISO) - Information processing - Text and office systems.
[14]
Colorful XML: One hierarchy isn't enough. In: SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on management of data, ACM, New York, NY, USA. pp. 251-262.
[15]
Describing multistructured XML documents by means of delay nodes. In: DocEng '06: Proceedings of the 2006 ACM symposium on document engineering, ACM, New York, NY, USA. pp. 155-164.
[16]
Adding macroscopic semantics to anchors in knowledge-based hypertext. International Journal of Human-Computer Studies. 363-382.
[17]
Annotations with EARMARK for arbitrary, overlapping and out-of order markup. In: Proceedings of the 9th ACM symposium on document engineering, ACM, Munich, Germany. pp. 171-180.
[18]
Creation and maintenance of multi-structured documents. In: Proceedings of the 9th ACM symposium on document engineering, ACM, Munich, Germany. pp. 181-184.
[19]
DINAH, a philological platform for the construction of multi-structured documents. In: Proceedings of the 14th European conference on research and advanced technology for digital libraries, Springer, Glasgow, UK. pp. 364-375.
[20]
Poullet, l., Pinon, J. -M., &amp; Calabretto, S. (1997). Semantic structuring of documents. In Basque international workshop on information technology (pp. 118).
[21]
TEI Consortium. (2011). TEI P5: Guidelines for electronic text encoding and interchange. <http://www.tei-c.org/Guidelines/P5/>.
[22]
Tennison, J., &amp; Piez, W. (2002). The layered markup and annotation language (LMNL). Extreme Markup Languages. Montréal, Canada.
[23]
Toward textual encoding based on RDF. In: ELPUB, international conference on electronic publishing, Peeters Publishing Leuven, Katholieke Universiteit Leuven in Leuven-Heverlee, Belgium. pp. 57-63.
[24]
W3C. (2010). XQuery 1.0: An XML query language (2nd ed.). <http://www.w3.org/TR/xquery/>

Cited By

View all
  • (2017)Linear Extended Annotation GraphsProceedings of the 2017 ACM Symposium on Document Engineering10.1145/3103010.3103011(9-18)Online publication date: 31-Aug-2017
  • (2016)Schema-aware Extended Annotation GraphsProceedings of the 2016 ACM Symposium on Document Engineering10.1145/2960811.2960816(45-54)Online publication date: 13-Sep-2016
  • (2012)Exploring manuscriptsProceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics10.1145/2254129.2254184(1-12)Online publication date: 13-Jun-2012
  1. Modeling, encoding and querying multi-structured documents

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Information Processing and Management: an International Journal
    Information Processing and Management: an International Journal  Volume 48, Issue 5
    September, 2012
    214 pages

    Publisher

    Pergamon Press, Inc.

    United States

    Publication History

    Published: 01 September 2012

    Author Tags

    1. Multi-structured document
    2. Multi-structured document querying
    3. MultiX
    4. XML
    5. XQuery

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)Linear Extended Annotation GraphsProceedings of the 2017 ACM Symposium on Document Engineering10.1145/3103010.3103011(9-18)Online publication date: 31-Aug-2017
    • (2016)Schema-aware Extended Annotation GraphsProceedings of the 2016 ACM Symposium on Document Engineering10.1145/2960811.2960816(45-54)Online publication date: 13-Sep-2016
    • (2012)Exploring manuscriptsProceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics10.1145/2254129.2254184(1-12)Online publication date: 13-Jun-2012

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media