[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
column

A Survey on XML Fragmentation

Published: 04 December 2014 Publication History

Abstract

Efficient document processing is a must when large volumes of XML data are involved. In such critical scenarios, a well-known solution to this problem is to distribute (map) the data among several processing nodes, and then distribute the processing accordingly, taking advantage of parallelism. This is the approach taken by distributed databases and MapReduce environments. Fragmentation techniques play an important role in these scenarios. They provide a way to "cut" the database into pieces and distribute the pieces over a network. This way, queries can also be "cut" into sub-queries that run in parallel, thus achieving better performance when compared to the centralized environment. However, there is no consensus in the database community as to what an XML fragment is. In fact, several approaches in literature present definitions of XML fragments. In addition to query processing, using XML fragmentation techniques may also be helpful when managing XML documents distributed along the web or clouds. This paper surveys the existing XML fragmentation approaches in literature, comparing their features and highlighting their drawbacks. Our contribution resides in establishing a map of the area.

References

[1]
S. Abiteboul, A. Bonifati, G. Cobena, C. Cremarenco, F. Dragan, I. Manolescu, T. Milo, and N. Preda. Managing distributed workspaces with active XML. In VLDB, pages 1061--1064, 2003.
[2]
S. Abiteboul, A. Bonifati, G. Cobena, I. Manolescu, and T. Milo. Dynamic XML documents with distribution and replication. In SIGMOD, pages 527--538, 2003.
[3]
A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi, A. Silberschatz, and A. Rasin. HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. PVLDB, 2(1):922--933, 2009.
[4]
A. Andrade, G. Ruberg, F. Bai¿ao, V. Braganholo, and M. Mattoso. Efficiently processing XML queries over fragmented repositories with PartiX. In DATAX, pages 150--163, 2006.
[5]
C. Baru, A. Gupta, B. Ludascher, R. Marciano, Y. Papakonstantinou, P. Velikhov, and V. Chu. XML-based information mediation with MIX. SIGMOD Record, 28(2):597--599, 1999.
[6]
L. Birhanu, S. Atnafu, and F. Getahun. Native XML document fragmentation model. In SITIS, pages 233--240, 2010.
[7]
A. Bonifati and A. Cuzzocrea. Storing and retrieving XPath fragments in structured P2P networks. DKE, 59(2):247--269, 2006.
[8]
A. Bonifati and A. Cuzzocrea. Efficient fragmentation of large XML documents. In DEXA, pages 539--550, 2007.
[9]
A. Bonifati, U. Matrangolo, A. Cuzzocrea, and M. Jain. XPath lookup queries in P2P networks. In WIDM, pages 48--55, 2004.
[10]
S. Bose and L. Fegaras. XFrag: a query processing framework for fragmented XML data. In WebDB, pages 97--102, 2005.
[11]
S. Bose, L. Fegaras, D. Levine, and V. Chaluvadi. A query algebra for fragmented XML stream data. In DBPL, pages 195--215, 2003.
[12]
J.-M. Bremer and M. Gertz. On distributing XML repositories. In WebDB, pages 73--78, 2003.
[13]
J.-M. Bremer and M. Gertz. Integrating document and data retrieval based on XML. The VLDB Journal, 15(1):53--83, 2006.
[14]
S. Chaudhuri, M. Datar, and V. Narasayya. Index selection for databases: a hardness study and a principled heuristic solution. IEEE TKDE, 16(11):1313--1323, 2004.
[15]
D. Che, K. Aberer, and T. Ozsu. Query optimization in XML structured-document databases. The VLDB Journal, 15(3):263--289, 2006.
[16]
D.-R. Che. Accomplishing deterministic XML query optimization. Journal of Computer Science and Technology, 20(3):357--366, 2005.
[17]
H. Choi, K.-H. Lee, S.-H. Kim, Y.-J. Lee, and B. Moon. HadoopXML: a suite for parallel processing of massive XML data with multiple twig pattern queries. In CIKM, pages 2737--2739, 2012.
[18]
C.-W. Chung, J.-K. Min, and K. Shim. APEX: an adaptive path index for XML data. In SIGMOD, pages 121--132, 2002.
[19]
G. Cong, W. Fan, A. Kementsietsidis, J. Li, and X. Liu. Partial evaluation for distributed XPath query processing and beyond. ACM TODS, 37(4):32:1--32:43, 2012.
[20]
B. F. Cooper, N. Sample, M. J. Franklin, G. R. Hjaltason, and M. Shadmon. A fast index for semistructured data. In VLDB, pages 341--350, 2001.
[21]
D. Dash, N. Polyzotis, and A. Ailamaki. CoPhy: a scalable, portable, and interactive index advisor for large workloads. PVLDB, 4(6):362--372, 2011.
[22]
J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In OSDI, pages 137--150, 2004.
[23]
J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. CACM, 51(1):107--113, 2008.
[24]
J. Dean and S. Ghemawat. MapReduce: a flexible data processing tool. CACM, 53(1):72--77, 2010.
[25]
J. Dittrich, J.-A. Quiané-Ruiz, A. Jindal, Y. Kargin, V. Setty, and J. Schad. Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). PVLDB, 3(1-2):515--529, 2010.
[26]
J. Dittrich, J.-A. Quiané-Ruiz, S. Richter, S. Schuh, A. Jindal, and J. Schad. Only aggressive elephants are fast elephants. PVLDB, 5(11):1591--1602, 2012.
[27]
L. Fegaras. Supporting bulk synchronous parallelism in map-reduce queries. In SC Companion: High Performance Computing, Networking Storage and Analysis, pages 1068--1077, 2012.
[28]
L. Fegaras, C. Li, and U. Gupta. An optimization framework for map-reduce queries. In EDBT, pages 26--37, 2012.
[29]
L. Fegaras, C. Li, U. Gupta, and J. J. Philip. XML query optimization in map-reduce. In WebDB, pages 1--6, 2011.
[30]
M. Fernandez, J. Simeon, and P. Wadler. An algebra for XML query. In FST TCS, pages 11--45, 2000.
[31]
G. Figueiredo, V. Braganholo, and M. Mattoso. Processing queries over distributed XML databases. JIDM, 1(3):455--470, 2010.
[32]
F. Frasincar, G.-J. Houben, and C. Pau. XAL: an algebra for XML query optimization. Australasian Computer Science Communications, 24(2):49--56, 2002.
[33]
G. Gardarin, A. Mensch, T.-T. Dang-Ngoc, and L. Smit. Integrating heterogeneous data sources with XML and XQuery. In DEXA, pages 839--846, 2002.
[34]
R. Goldman and J. Widom. DataGuides: enabling query formulation and optimization in semistructured databases. In VLDB, pages 436--445, 1997.
[35]
G. Gou and R. Chirkova. Efficiently querying large XML data repositories: A survey. IEEE TKDE, 19(10):1381--1403, 2007.
[36]
H. Huo, G. Wang, X. Hui, R. Zhou, B. Ning, and C. Xiao. Efficient query processing for streamed XML fragments. In Database Systems for Advanced Applications, volume 3882 of Lecture Notes in Computer Science, pages 468--482. 2006.
[37]
H. V. Jagadish, L. V. S. Lakshmanan, D. Srivastava, and K. Thompson. TAX: a tree algebra for XML. In DBPL, pages 149--164, 2001.
[38]
C.-H. Jeong, Y. Choi, D.-S. Jin, M. Lee, S.-P. Choi, K. Kim, M.-H. Cho, W.-K. Joo, H.-M. Yoon, J.-H. Seo, and J. Kim. Service-centric object fragmentation for efficient retrieval and management of huge XML documents. In PDCAT, pages 118--124, 2007.
[39]
K. Kido, T. Amagasa, and H. Kitagawa. Processing XPath queries in PC-Clusters using XML data partitioning. In ICDE Workshops, pages 114--119, 2006.
[40]
J. Kim and H.-J. Kim. A partition index for XML and semi-structured data. DKE, 51(3):349--368, 2004.
[41]
P. Kling, M. Özsu, and K. Daudjee. Generating efficient execution plans for vertically partitioned XML databases. PVLDB, 4(1):1--11, 2010.
[42]
P. Kling, M. Özsu, and K. Daudjee. Scaling XML query processing: distribution, localization and pruning. Distributed and Parallel Databases, 29(5):445--490, 2011.
[43]
H. Kurita, K. Hatano, J. Miyazaki, and S. Uemura. Efficient query processing for large XML data in distributed environments. In AINA, pages 317--322, 2007.
[44]
K. Lee, J. Min, and K. Park. A design and implementation of XML-Based mediation framework (XMF) for integration of internet information resources. In HICSS, pages 202--202, 2002.
[45]
S. Lee, J. Kim, and H. Kang. Memory-efficient query processing over XML fragment stream with fragment labeling. Computing and Informatics, 29(5):757--782, 2010.
[46]
A. Lima, M. Mattoso, and P. Valduriez. Adaptive virtual partitioning for OLAP query processing in a database cluster. JIDM, 1(1):75--88, 2010.
[47]
H. Ma and K.-D. Schewe. Fragmentation of XML documents. In SBBD, pages 200--214, 2003.
[48]
H. Ma and K.-D. Schewe. Heuristic horizontal XML fragmentation. In CAISE, pages 131--136, 2005.
[49]
H. Ma and K.-D. Schewe. Fragmentation of XML documents. JIDM, 1(1):21--34, 2010.
[50]
H. Ma and K.-D. Schewe. Revisiting "Fragmentation of XML documents". JIDM, 1(1):35--36, 2010.
[51]
H. Ma, K.-D. Schewe, and Q. Wang. A heuristic approach to cost-efficient fragmentation and allocation of complex value databases. In ADC, pages 183--192, 2006.
[52]
H. Ma, K.-D. Schewe, and Q. Wang. A heuristic approach to cost-efficient derived horizontal fragmentation of complex value databases. In ADC, pages 103--111, 2007.
[53]
I. Machdi, T. Amagasa, and H. Kitagawa. XML data partitioning strategies to improve parallelism in parallel holistic twig joins. In ICUIMC, pages 471--480, 2009.
[54]
M. Mattoso. Virtual partitioning. In L. Liu and M. T. Ozsu, editors, Encyclopedia of Database Systems, pages 3340--3341. 2009.
[55]
J. McHugh and J. Widom. Query optimization for XML. In VLDB, pages 315--326, 1999.
[56]
M. M. Moro, V. Braganholo, C. F. Dorneles, D. Duarte, R. Galante, and R. S. Mello. XML: some papers in a haystack. SIGMOD Record, 38(2):29--34, 2009.
[57]
W. Ng and J. Cheng. An efficient index lattice for XML query evaluation. In DASFAA, pages 753--767, 2007.
[58]
M. T. Ozsu and P. Valduriez. Principles of Distributed Database Systems. 3 edition, 2011.
[59]
S. Paparizos, Y. Wu, L. V. S. Lakshmanan, and H. V. Jagadish. Tree logical classes for efficient evaluation of XQuery. In SIGMO, pages 71--82, 2004.
[60]
Paul Grosso and Daniel Veillard. XML fragment interchange. W3C candidate recommendation 12 february 2001., 2001. W3C Candidate Recommendation 12 February 2001.
[61]
A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In SIGMOD, pages 165--178, 2009.
[62]
C. Rodrigues, V. Braganholo, and M. Mattoso. Virtual partitioning ad-hoc queries over distributed XML databases. JIDM, 2(3):495--510, 2011.
[63]
C. Sartiani and A. Albano. Yet another query algebra for XML data. In Database Engineering and Applications Symposium, pages 106--115, 2002.
[64]
K.-D. Schewe. Fragmentation of object oriented and semistructured data. In BalticDB, pages 253--266, 2002.
[65]
L. Silva, L. Silva, M. Mattoso, and V. Braganholo. On the performance of the position() XPath function. In DocEng, 2013.
[66]
T. Silva, F. Baiäo, J. Sampaio, M. Mattoso, and V. Braganholo. Towards recommendations for horizontal XML fragmentation. JIDM, 4(1):27--36, 2013.
[67]
M. Stonebraker, D. Abadi, D. J. DeWitt, S. Madden, E. Paulson, A. Pavlo, and A. Rasin. MapReduce and parallel DBMSs: friends or foes? CACM, 53:64--71, 2010.
[68]
D. Suciu. Distributed query evaluation on semistructured data. ACM TODS, 27(1):1--62, 2002.
[69]
B. Surjanto, N. Ritter, and H. Loeser. XML content management based on object-relational database technology. In WISE, pages 70--79, 2000.
[70]
I. Tatarinov, E. Viglas, K. Beyer, J. Shanmugasundaram, and E. Shekita. Storing and querying ordered XML using a relational database system. In SIGMOD, 2002.
[71]
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu, and R. Murthy. Hive - a petabyte scale data warehouse using hadoop. In ICDE, pages 996--1005, 2010.
[72]
Z. Vagena, M. Moro, and V. Tsotras. Efficient processing of XML containment queries using partition-based schemes. In IDEAS, pages 161--170, 2004.
[73]
M. Waldvogel, M. Kramis, and S. Graf. Distributing XML with focus on parallel evaluation. In DBISP2P, pages 55--67, 2008.
[74]
Y. Wu, J. M. Patel, and H. V. Jagadish. Structural join order selection for XML query optimization. In ICDE, pages 443--454, 2003.
[75]
B. B. Yao, M. T. Özsu, and J. Keenleyside. XBench - a family of benchmarks for XML DBMSs. In Efficiency and Effectiveness of XML Tools and Techniques and Data Integration over the Web-Revised Papers, pages 162--164, 2003.
[76]
M. Zhang and J. T. Yao. XML algebras for data mining. In Data Mining and Knowledge Discovery: theory, tools and technology, pages 209--217, 2004.
[77]
X. Zhang, B. Pielech, and E. A. Rundesnteiner. Honey, i shrunk the XQuery!: an XML algebra optimization approach. In WIDM, pages 15--22, 2002.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 43, Issue 3
September 2014
70 pages
ISSN:0163-5808
DOI:10.1145/2694428
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 December 2014
Published in SIGMOD Volume 43, Issue 3

Check for updates

Qualifiers

  • Column

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Integrated method for distributed processing of large XML dataCluster Computing10.1007/s10586-023-04010-027:2(1375-1399)Online publication date: 13-May-2023
  • (2022)XML2HBaseJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.11.003161:C(83-99)Online publication date: 1-Mar-2022
  • (2022)Decomposition of Fuzzy Homogeneous Classes of ObjectsInformation and Software Technologies10.1007/978-3-031-16302-9_4(43-63)Online publication date: 6-Oct-2022
  • (2019)Querying XML documents using Prolog enginesInformation Processing and Management: an International Journal10.1016/j.ipm.2019.05.01156:5(1753-1770)Online publication date: 1-Sep-2019
  • (2017)Online Integration of Fragmented XML DocumentsIntelligent Information and Database Systems10.1007/978-3-319-54472-4_2(13-23)Online publication date: 26-Feb-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media