XML indexing and storage: fulfilling the wish list

Christian Mathis¹,
Theo Härder¹,
Karsten Schmidt¹ &
…
Sebastian Bächle¹

292 Accesses
3 Altmetric
Explore all metrics

Abstract

XML Indexing and Storage (XMIS) techniques are crucial for the functionality and the overall performance of an XML database management system (XDBMS). Because of the complexity of XQuery and performance demands of XML query processing, efficient path processing operators—including those for tree-pattern queries (so-called twigs)—are urgently needed for which tailor-made indexes and their flexible use are indispensable. Although XML indexing and storage are standard problems and, of course, manifold approaches have been proposed in the last decade, adaptive and broad-enough solutions for satisfactory query evaluation support of all path processing operators are missing in the XDBMS context. Therefore, we think that it is worthwhile to take a step back and look at the complete picture to derive a salient and holistic solution. To do so, we first compile an XMIS wish list containing what—in our opinion—are essential functional storage and indexing requirements in a modern XDBMS. With these desiderata in mind, we then develop a new XMIS scheme, which—by reconsidering previous work—can be seen as a practical and general approach to XML storage and indexing. Interestingly, by working on both problems at the same time, we can make the storage and index managers live in a kind of symbiotic partnership, because the document store re-uses ideas originally proposed by the indexing community and vice versa. The XMIS scheme is implemented in XTC, an XDBMS used for empirical tests.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Notes

To avoid an application-specific focus, these properties are not ranked. For a universal approach, all properties should have the same importance.
All variations of document stores and index types are implemented using B*-trees. With code reuse for the base structures, the tree entries only differ in the representation of keys and values.
In the following, we are only interested in the path up to the root for a given PCR. Therefore, the relative order among siblings is not relevant, e.g., all permutations of elements issn, name, publisher, and issue as children of journal may appear in the document.
For uniprot, the structure-related saving of the pc and po formats consists of 465 and 731 Mbytes w.r.t. the naive format. Content compression would reduce the content part by 23–35%, in addition.
Although we simplified the path specification for presentation purposes, their XPath equivalent is used to express twig queries as well by simply combining several path specifications (join) and, thereby, allowing their application for twig operators.
Note, subscript D and type T are omitted where non-ambiguous.

References

Arion A, Bonifati A, Manolescu I, Pugliese A (2008) Path summaries and path partitioning in modern XML databases. World Wide Web 11(1):117–151
Article Google Scholar
Balmin A, Özcan F, Beyer KS, Chochrane RJ, Pirahesh H (2004) A framework for using materialized XPath views in XML query processing. In: Proc VLDB, pp 60–71
Google Scholar
Beyer K, Cochrane R, Josifovski V, Kleewein J, Lapis G, Lohman GM, Lyle R, Özcan F, Pirahesh H, Seemann N, Truong TC, Van der Linden B, Vickery B, Zhang C, System RX (2005) One part relational, one part XML. In: Proc SIGMOD, pp 358–374
Google Scholar
Boncz P, Grust T, van Keulen M, Manegold S, Rittinger J, Teubner J (2006) MonetDB/XQuery: a fast XQuery processor powered by a relational engine. In: Proc SIGMOD, pp 479–490
Google Scholar
Bruno N, Koudas N, Srivastava D (2002) Holistic twig joins: optimal XML pattern matching. In: Proc SIGMOD, pp 310–321
Google Scholar
Chen Q, Lim A, Ong KW (2003) D(k)-index: an adaptive structural summary for graph-structured data. In: Proc SIGMOD, pp 134–144
Google Scholar
Draper D, Frankhauser P, Fernandéz M, Malhotra A, Rose K, Rys M, Siméon J, Wadler P (2004) XQuery 1.0 and XPath 2.0 formal semantics
Fomichev A, Grinev M, Kuznetsov S (2006) Sedna: a native XML DBMS. In: Proc SOFSEM, pp 272–281
Google Scholar
Goldman R, Widom J (1997) DataGuides: enabling query formulation and optimization in semistructured databases. In: Proc VLDB, pp 436–445
Google Scholar
Graefe G, Larson P-A (2001) B-tree indexes and CPU caches. In: Proc ICDE, pp 349–358
Google Scholar
Grust T, van Keulen M, Teubner J (2003) Staircase join: teach a relational DBMS to watch its (axis) steps. In: Proc VLDB, pp 524–525
Google Scholar
Härder T, Haustein MP, Mathis C, Wagner M (2007) Node labeling schemes for dynamic XML documents reconsidered. Data Knowl Eng 60(1):126–149
Article Google Scholar
Härder T, Mathis C, Schmidt K (2007) Comparison of complete and elementless native storage of XML documents. In: Proc IDEAS, pp 102–113
Google Scholar
Haustein MP, Härder T, Mathis C, Wagner M (2005) DeweyIDs—the key to fine-grained management of XML documents. In: Proc 20th Brazilian symposium on databases, pp 85–99
Google Scholar
Haustein MP, Härder T (2007) An efficient infrastructure for native transactional XML processing. Data Knowl Eng 61(3):500–523
Article Google Scholar
Jiang H, Wang W, Lu H, Yu Xu J (2003) Holistic twig joins on indexed XML documents. In: Proc VLDB, pp 273–284
Google Scholar
Kaushik R, Bohannon P, Naughton JF, Korth HF (2002) Covering indexes for branching path queries. In: Proc SIGMOD, pp 133–144
Google Scholar
Kaushik R, Shenoy P, Bohannon P, Gudes E (2002) Exploiting local similarity for indexing paths in graph-structured data. In: Proc ICDE, pp 129–140
Google Scholar
Kaushik R, Krishnamurthy R, Naughton JF, Ramakrishnan R (2004) On the integration of structure indexes and inverted lists. In: Proc SIGMOD, pp 779–790
Chapter Google Scholar
Li H-G, Aghili SA, Agrawal D, El Abbadi A (2006) FLUX: content-and-structure matching of XPath queries with range predicates. In: Proc XSym. LNCS, vol 4156, pp 61–76
Google Scholar
Mathis C (2009) Storing, indexing, and querying XML documents in native XML database management systems. PhD thesis, Verlag Dr Hut
Mathis C, Härder T, Schmidt K (2009) Storing and indexing XML documents upside down. Comput Sci Res Dev 24(1–2):51–68
Article Google Scholar
McHugh J, Abiteboul S (1997) Lore: a database management system for semistructured data. SIGMOD Rec 26:54–66
Article Google Scholar
Meier W (2002) eXist: an open source native xml database. Proc Web, Web-services, and database systems. Lect Notes Comput Sci 2593:169–183
Article Google Scholar
Miklau G. XML data repository. www.cs.washington.edu/research/xmldatasets
Milo T, Suciu D (1999) Index structures for path expressions. In: Proc ICDT, pp 277–295
Google Scholar
O’Neil PE, Pal S, Cseri I, Schaller G, Westbury N (2004) ORDPATHs: insert-friendly XML node labels. In: Proc SIGMOD, pp 903–908
Chapter Google Scholar
Prakash S, Bhowmick SS, Madria S (2006) Efficient recursive XML query processing using relational database systems. Data Knowl Eng 58(3):207–242
Article Google Scholar
Prasad KH, Kumar PS (2005) Efficient indexing and querying of XML data using modified prüfer sequences. In: Proc CIKM, pp 397–404
Google Scholar
Sample N, Cooper BF, Franklin MJ, Hjaltason GR, Shadmon M, Cohen L (2002) Managing complex and varied data with the IndexFabric(tm). In: Proc ICDE, pp 492–493
Google Scholar
Schmidt AR, Waas F, Kersten ML, Carey MJ, Manolescu I, Busse R (2002) XMark: a benchmark for XML data management. In: Proc VLDB, pp 974–985
Google Scholar
Document Object Model (DOM) Level 3 core specification, W3C recommendations (Jan 2004)
Brownell D (2002) SAX2. O’Reilly Media
Wang H, Park S, Fan W, PS Yu (2003) ViST: a dynamic index method for querying XML data by tree structures. In: Proc SIGMOD, pp 110–121
Google Scholar
Wang W, Jiang H, Wang H, Lin X, Lu H, Li J (2005) Efficient processing of XML path queries using the disk-based F&B-index. In: Proc VLDB, pp 145–165
Google Scholar
XQuery 1.0 (2007) An XML query language. W3C recommendation (Jan 2007)
XQuery Update Facility 1.0 (2011) W3C recommendation (17 March 2011)
Yoshikawa M, Amagasa T, Shimura T, Uemura S (2001) XRel: a path-based approach to storage and retrieval of XML documents using relational databases. ACM TOIT 1(1):110–141
Article Google Scholar
Zhang N, Kacholia V, Özsu T (2004) A succinct physical storage scheme for efficient evaluation of path queries in XML. In: Proc ICDE, pp 54–63
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, University of Kaiserslautern, P.O. Box 3049, 67663, Kaiserslautern, Germany
Christian Mathis, Theo Härder, Karsten Schmidt & Sebastian Bächle

Authors

Christian Mathis
View author publications
You can also search for this author in PubMed Google Scholar
Theo Härder
View author publications
You can also search for this author in PubMed Google Scholar
Karsten Schmidt
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Bächle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Theo Härder.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mathis, C., Härder, T., Schmidt, K. et al. XML indexing and storage: fulfilling the wish list. Comput Sci Res Dev 30, 51–68 (2015). https://doi.org/10.1007/s00450-012-0204-6

Download citation

Received: 09 September 2010
Accepted: 07 December 2011
Published: 01 February 2012
Issue Date: February 2015
DOI: https://doi.org/10.1007/s00450-012-0204-6

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient XML Keyword Search Based on DAG-Compression

D2-Index: A Dynamic Index Method for Querying XML and Semi-Structured Data

A Dual-Index Based Representation for Processing XPath Queries on Very Large XML Documents

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

XML indexing and storage: fulfilling the wish list

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient XML Keyword Search Based on DAG-Compression

D2-Index: A Dynamic Index Method for Querying XML and Semi-Structured Data

A Dual-Index Based Representation for Processing XPath Queries on Very Large XML Documents

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation