More Web Proxy on the site http://driver.im/

column

A Survey on XML Fragmentation

Authors:

Vanessa Braganholo,

Marta MattosoAuthors Info & Claims

ACM SIGMOD Record, Volume 43, Issue 3

Pages 24 - 35

https://doi.org/10.1145/2694428.2694434

Published: 04 December 2014 Publication History

Abstract

Efficient document processing is a must when large volumes of XML data are involved. In such critical scenarios, a well-known solution to this problem is to distribute (map) the data among several processing nodes, and then distribute the processing accordingly, taking advantage of parallelism. This is the approach taken by distributed databases and MapReduce environments. Fragmentation techniques play an important role in these scenarios. They provide a way to "cut" the database into pieces and distribute the pieces over a network. This way, queries can also be "cut" into sub-queries that run in parallel, thus achieving better performance when compared to the centralized environment. However, there is no consensus in the database community as to what an XML fragment is. In fact, several approaches in literature present definitions of XML fragments. In addition to query processing, using XML fragmentation techniques may also be helpful when managing XML documents distributed along the web or clouds. This paper surveys the existing XML fragmentation approaches in literature, comparing their features and highlighting their drawbacks. Our contribution resides in establishing a map of the area.

References

[1]

S. Abiteboul, A. Bonifati, G. Cobena, C. Cremarenco, F. Dragan, I. Manolescu, T. Milo, and N. Preda. Managing distributed workspaces with active XML. In VLDB, pages 1061--1064, 2003.

Digital Library

[2]

S. Abiteboul, A. Bonifati, G. Cobena, I. Manolescu, and T. Milo. Dynamic XML documents with distribution and replication. In SIGMOD, pages 527--538, 2003.

Digital Library

[3]

A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi, A. Silberschatz, and A. Rasin. HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. PVLDB, 2(1):922--933, 2009.

Digital Library

[4]

A. Andrade, G. Ruberg, F. Bai¿ao, V. Braganholo, and M. Mattoso. Efficiently processing XML queries over fragmented repositories with PartiX. In DATAX, pages 150--163, 2006.

Digital Library

[5]

C. Baru, A. Gupta, B. Ludascher, R. Marciano, Y. Papakonstantinou, P. Velikhov, and V. Chu. XML-based information mediation with MIX. SIGMOD Record, 28(2):597--599, 1999.

Digital Library

[6]

L. Birhanu, S. Atnafu, and F. Getahun. Native XML document fragmentation model. In SITIS, pages 233--240, 2010.

Digital Library

[7]

A. Bonifati and A. Cuzzocrea. Storing and retrieving XPath fragments in structured P2P networks. DKE, 59(2):247--269, 2006.

Digital Library

[8]

A. Bonifati and A. Cuzzocrea. Efficient fragmentation of large XML documents. In DEXA, pages 539--550, 2007.

Digital Library

[9]

A. Bonifati, U. Matrangolo, A. Cuzzocrea, and M. Jain. XPath lookup queries in P2P networks. In WIDM, pages 48--55, 2004.

Digital Library

[10]

S. Bose and L. Fegaras. XFrag: a query processing framework for fragmented XML data. In WebDB, pages 97--102, 2005.

[11]

S. Bose, L. Fegaras, D. Levine, and V. Chaluvadi. A query algebra for fragmented XML stream data. In DBPL, pages 195--215, 2003.

[12]

J.-M. Bremer and M. Gertz. On distributing XML repositories. In WebDB, pages 73--78, 2003.

[13]

J.-M. Bremer and M. Gertz. Integrating document and data retrieval based on XML. The VLDB Journal, 15(1):53--83, 2006.

Digital Library

[14]

S. Chaudhuri, M. Datar, and V. Narasayya. Index selection for databases: a hardness study and a principled heuristic solution. IEEE TKDE, 16(11):1313--1323, 2004.

Digital Library

[15]

D. Che, K. Aberer, and T. Ozsu. Query optimization in XML structured-document databases. The VLDB Journal, 15(3):263--289, 2006.

Digital Library

[16]

D.-R. Che. Accomplishing deterministic XML query optimization. Journal of Computer Science and Technology, 20(3):357--366, 2005.

Digital Library

[17]

H. Choi, K.-H. Lee, S.-H. Kim, Y.-J. Lee, and B. Moon. HadoopXML: a suite for parallel processing of massive XML data with multiple twig pattern queries. In CIKM, pages 2737--2739, 2012.

Digital Library

[18]

C.-W. Chung, J.-K. Min, and K. Shim. APEX: an adaptive path index for XML data. In SIGMOD, pages 121--132, 2002.

Digital Library

[19]

G. Cong, W. Fan, A. Kementsietsidis, J. Li, and X. Liu. Partial evaluation for distributed XPath query processing and beyond. ACM TODS, 37(4):32:1--32:43, 2012.

Digital Library

[20]

B. F. Cooper, N. Sample, M. J. Franklin, G. R. Hjaltason, and M. Shadmon. A fast index for semistructured data. In VLDB, pages 341--350, 2001.

Digital Library

[21]

D. Dash, N. Polyzotis, and A. Ailamaki. CoPhy: a scalable, portable, and interactive index advisor for large workloads. PVLDB, 4(6):362--372, 2011.

Digital Library

[22]

J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In OSDI, pages 137--150, 2004.

Digital Library

[23]

J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. CACM, 51(1):107--113, 2008.

Digital Library

[24]

J. Dean and S. Ghemawat. MapReduce: a flexible data processing tool. CACM, 53(1):72--77, 2010.

Digital Library

[25]

J. Dittrich, J.-A. Quiané-Ruiz, A. Jindal, Y. Kargin, V. Setty, and J. Schad. Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). PVLDB, 3(1-2):515--529, 2010.

Digital Library

[26]

J. Dittrich, J.-A. Quiané-Ruiz, S. Richter, S. Schuh, A. Jindal, and J. Schad. Only aggressive elephants are fast elephants. PVLDB, 5(11):1591--1602, 2012.

Digital Library

[27]

L. Fegaras. Supporting bulk synchronous parallelism in map-reduce queries. In SC Companion: High Performance Computing, Networking Storage and Analysis, pages 1068--1077, 2012.

Digital Library

[28]

L. Fegaras, C. Li, and U. Gupta. An optimization framework for map-reduce queries. In EDBT, pages 26--37, 2012.

Digital Library

[29]

L. Fegaras, C. Li, U. Gupta, and J. J. Philip. XML query optimization in map-reduce. In WebDB, pages 1--6, 2011.

[30]

M. Fernandez, J. Simeon, and P. Wadler. An algebra for XML query. In FST TCS, pages 11--45, 2000.

Digital Library

[31]

G. Figueiredo, V. Braganholo, and M. Mattoso. Processing queries over distributed XML databases. JIDM, 1(3):455--470, 2010.

[32]

F. Frasincar, G.-J. Houben, and C. Pau. XAL: an algebra for XML query optimization. Australasian Computer Science Communications, 24(2):49--56, 2002.

Digital Library

[33]

G. Gardarin, A. Mensch, T.-T. Dang-Ngoc, and L. Smit. Integrating heterogeneous data sources with XML and XQuery. In DEXA, pages 839--846, 2002.

Digital Library

[34]

R. Goldman and J. Widom. DataGuides: enabling query formulation and optimization in semistructured databases. In VLDB, pages 436--445, 1997.

Digital Library

[35]

G. Gou and R. Chirkova. Efficiently querying large XML data repositories: A survey. IEEE TKDE, 19(10):1381--1403, 2007.

Digital Library

[36]

H. Huo, G. Wang, X. Hui, R. Zhou, B. Ning, and C. Xiao. Efficient query processing for streamed XML fragments. In Database Systems for Advanced Applications, volume 3882 of Lecture Notes in Computer Science, pages 468--482. 2006.

Digital Library

[37]

H. V. Jagadish, L. V. S. Lakshmanan, D. Srivastava, and K. Thompson. TAX: a tree algebra for XML. In DBPL, pages 149--164, 2001.

Digital Library

[38]

C.-H. Jeong, Y. Choi, D.-S. Jin, M. Lee, S.-P. Choi, K. Kim, M.-H. Cho, W.-K. Joo, H.-M. Yoon, J.-H. Seo, and J. Kim. Service-centric object fragmentation for efficient retrieval and management of huge XML documents. In PDCAT, pages 118--124, 2007.

Digital Library

[39]

K. Kido, T. Amagasa, and H. Kitagawa. Processing XPath queries in PC-Clusters using XML data partitioning. In ICDE Workshops, pages 114--119, 2006.

Digital Library

[40]

J. Kim and H.-J. Kim. A partition index for XML and semi-structured data. DKE, 51(3):349--368, 2004.

Digital Library

[41]

P. Kling, M. Özsu, and K. Daudjee. Generating efficient execution plans for vertically partitioned XML databases. PVLDB, 4(1):1--11, 2010.

Digital Library

[42]

P. Kling, M. Özsu, and K. Daudjee. Scaling XML query processing: distribution, localization and pruning. Distributed and Parallel Databases, 29(5):445--490, 2011.

Digital Library

[43]

H. Kurita, K. Hatano, J. Miyazaki, and S. Uemura. Efficient query processing for large XML data in distributed environments. In AINA, pages 317--322, 2007.

Digital Library

[44]

K. Lee, J. Min, and K. Park. A design and implementation of XML-Based mediation framework (XMF) for integration of internet information resources. In HICSS, pages 202--202, 2002.

Digital Library

[45]

S. Lee, J. Kim, and H. Kang. Memory-efficient query processing over XML fragment stream with fragment labeling. Computing and Informatics, 29(5):757--782, 2010.

[46]

A. Lima, M. Mattoso, and P. Valduriez. Adaptive virtual partitioning for OLAP query processing in a database cluster. JIDM, 1(1):75--88, 2010.

[47]

H. Ma and K.-D. Schewe. Fragmentation of XML documents. In SBBD, pages 200--214, 2003.

[48]

H. Ma and K.-D. Schewe. Heuristic horizontal XML fragmentation. In CAISE, pages 131--136, 2005.

[49]

H. Ma and K.-D. Schewe. Fragmentation of XML documents. JIDM, 1(1):21--34, 2010.

[50]

H. Ma and K.-D. Schewe. Revisiting "Fragmentation of XML documents". JIDM, 1(1):35--36, 2010.

[51]

H. Ma, K.-D. Schewe, and Q. Wang. A heuristic approach to cost-efficient fragmentation and allocation of complex value databases. In ADC, pages 183--192, 2006.

Digital Library

[52]

H. Ma, K.-D. Schewe, and Q. Wang. A heuristic approach to cost-efficient derived horizontal fragmentation of complex value databases. In ADC, pages 103--111, 2007.

Digital Library

[53]

I. Machdi, T. Amagasa, and H. Kitagawa. XML data partitioning strategies to improve parallelism in parallel holistic twig joins. In ICUIMC, pages 471--480, 2009.

Digital Library

[54]

M. Mattoso. Virtual partitioning. In L. Liu and M. T. Ozsu, editors, Encyclopedia of Database Systems, pages 3340--3341. 2009.

[55]

J. McHugh and J. Widom. Query optimization for XML. In VLDB, pages 315--326, 1999.

Digital Library

[56]

M. M. Moro, V. Braganholo, C. F. Dorneles, D. Duarte, R. Galante, and R. S. Mello. XML: some papers in a haystack. SIGMOD Record, 38(2):29--34, 2009.

Digital Library

[57]

W. Ng and J. Cheng. An efficient index lattice for XML query evaluation. In DASFAA, pages 753--767, 2007.

Digital Library

[58]

M. T. Ozsu and P. Valduriez. Principles of Distributed Database Systems. 3 edition, 2011.

Digital Library

[59]

S. Paparizos, Y. Wu, L. V. S. Lakshmanan, and H. V. Jagadish. Tree logical classes for efficient evaluation of XQuery. In SIGMO, pages 71--82, 2004.

Digital Library

[60]

Paul Grosso and Daniel Veillard. XML fragment interchange. W3C candidate recommendation 12 february 2001., 2001. W3C Candidate Recommendation 12 February 2001.

[61]

A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In SIGMOD, pages 165--178, 2009.

Digital Library

[62]

C. Rodrigues, V. Braganholo, and M. Mattoso. Virtual partitioning ad-hoc queries over distributed XML databases. JIDM, 2(3):495--510, 2011.

[63]

C. Sartiani and A. Albano. Yet another query algebra for XML data. In Database Engineering and Applications Symposium, pages 106--115, 2002.

Digital Library

[64]

K.-D. Schewe. Fragmentation of object oriented and semistructured data. In BalticDB, pages 253--266, 2002.

Digital Library

[65]

L. Silva, L. Silva, M. Mattoso, and V. Braganholo. On the performance of the position() XPath function. In DocEng, 2013.

[66]

T. Silva, F. Baiäo, J. Sampaio, M. Mattoso, and V. Braganholo. Towards recommendations for horizontal XML fragmentation. JIDM, 4(1):27--36, 2013.

[67]

M. Stonebraker, D. Abadi, D. J. DeWitt, S. Madden, E. Paulson, A. Pavlo, and A. Rasin. MapReduce and parallel DBMSs: friends or foes? CACM, 53:64--71, 2010.

Digital Library

[68]

D. Suciu. Distributed query evaluation on semistructured data. ACM TODS, 27(1):1--62, 2002.

Digital Library

[69]

B. Surjanto, N. Ritter, and H. Loeser. XML content management based on object-relational database technology. In WISE, pages 70--79, 2000.

Digital Library

[70]

I. Tatarinov, E. Viglas, K. Beyer, J. Shanmugasundaram, and E. Shekita. Storing and querying ordered XML using a relational database system. In SIGMOD, 2002.

Digital Library

[71]

A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu, and R. Murthy. Hive - a petabyte scale data warehouse using hadoop. In ICDE, pages 996--1005, 2010.

[72]

Z. Vagena, M. Moro, and V. Tsotras. Efficient processing of XML containment queries using partition-based schemes. In IDEAS, pages 161--170, 2004.

Digital Library

[73]

M. Waldvogel, M. Kramis, and S. Graf. Distributing XML with focus on parallel evaluation. In DBISP2P, pages 55--67, 2008.

[74]

Y. Wu, J. M. Patel, and H. V. Jagadish. Structural join order selection for XML query optimization. In ICDE, pages 443--454, 2003.

[75]

B. B. Yao, M. T. Özsu, and J. Keenleyside. XBench - a family of benchmarks for XML DBMSs. In Efficiency and Effectiveness of XML Tools and Techniques and Data Integration over the Web-Revised Papers, pages 162--164, 2003.

Digital Library

[76]

M. Zhang and J. T. Yao. XML algebras for data mining. In Data Mining and Knowledge Discovery: theory, tools and technology, pages 209--217, 2004.

[77]

X. Zhang, B. Pielech, and E. A. Rundesnteiner. Honey, i shrunk the XQuery!: an XML algebra optimization approach. In WIDM, pages 15--22, 2002.

Digital Library

Cited By

Chen RCai GChen JHong Y(2023)Integrated method for distributed processing of large XML dataCluster Computing10.1007/s10586-023-04010-027:2(1375-1399)Online publication date: 13-May-2023
https://doi.org/10.1007/s10586-023-04010-0
Bao LYang JWu CQi HZhang XCai S(2022)XML2HBaseJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.11.003161:C(83-99)Online publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1016/j.jpdc.2021.11.003
Terletskyi DYershov S(2022)Decomposition of Fuzzy Homogeneous Classes of ObjectsInformation and Software Technologies10.1007/978-3-031-16302-9_4(43-63)Online publication date: 6-Oct-2022
https://doi.org/10.1007/978-3-031-16302-9_4
Show More Cited By

A Survey on XML Fragmentation

Recommendations

A Survey Study on XML Functional Dependencies
ISDPE '07: Proceedings of the The First International Symposium on Data, Privacy, and E-Commerce

There are two major kinds of XML functional dependency (FD) definitions. The first kind of XML FD includes Tree-tuple-based XML FD (tFD) and Path-based XML FD (pFD), and the second kind of XML FD includes Extended-path-based XML FD (epFD), Sub-graph-...
XML: Visual QuickStart Guide
Native XML Document Fragmentation Model
SITIS '10: Proceedings of the 2010 Sixth International Conference on Signal-Image Technology and Internet Based Systems

As XML document is distributed across the web, it can be considered like a distributed repository of XML documents and is subjected to distribution design. However, there is no adequate works on XML document distribution design. To address the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record

ACM SIGMOD Record Volume 43, Issue 3

September 2014

70 pages

ISSN:0163-5808

DOI:10.1145/2694428

Editors:
Yanlei Diao
University of Massachusetts Amherst
,
Pablo Barceló
Universidad de Chile
,
Vanessa Braganholo
Universidade Federal Fluminense
,
Marco Brambilla
Politecnico di Milano
,
Chee Yong Chan
National University of Singapore
,
Rada Chirkova
North Carolina State University
,
Anish Das Sarma
Google Research
,
Alkis Simitsis
HP Labs
,
Nesime Tatbul
ETH Zurich
,
Marianne Winslett
University of Illinois

Issue’s Table of Contents

Copyright © 2014 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 December 2014

Published in SIGMOD Volume 43, Issue 3

Check for updates

Qualifiers

Column

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
141
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen RCai GChen JHong Y(2023)Integrated method for distributed processing of large XML dataCluster Computing10.1007/s10586-023-04010-027:2(1375-1399)Online publication date: 13-May-2023
https://doi.org/10.1007/s10586-023-04010-0
Bao LYang JWu CQi HZhang XCai S(2022)XML2HBaseJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.11.003161:C(83-99)Online publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1016/j.jpdc.2021.11.003
Terletskyi DYershov S(2022)Decomposition of Fuzzy Homogeneous Classes of ObjectsInformation and Software Technologies10.1007/978-3-031-16302-9_4(43-63)Online publication date: 6-Oct-2022
https://doi.org/10.1007/978-3-031-16302-9_4
Gomes dos Santos FMachado LPinheiro RPaes ABraganholo V(2019)Querying XML documents using Prolog enginesInformation Processing and Management: an International Journal10.1016/j.ipm.2019.05.01156:5(1753-1770)Online publication date: 1-Sep-2019
https://dl.acm.org/doi/10.1016/j.ipm.2019.05.011
Handoko Getta J(2017)Online Integration of Fragmented XML DocumentsIntelligent Information and Database Systems10.1007/978-3-319-54472-4_2(13-23)Online publication date: 26-Feb-2017
https://doi.org/10.1007/978-3-319-54472-4_2

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents