[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1516360.1516400acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article
Free access

A sampling approach for XML query selectivity estimation

Published: 24 March 2009 Publication History

Abstract

As the Extensible Markup Language (XML) rapidly establishes itself as the de facto standard for presenting, storing, and exchanging data on the Internet, large volume of XML data and their supporting facilities start to surface. A fast and accurate selectivity estimation mechanism is of practical importance because selectivity estimation plays a fundamental role in XML query optimization. Recently proposed techniques are all based on some forms of structure synopses that could be time-consuming to build and not effective for summarizing complex structure relationships. In this research, we propose an innovative sampling method that can capture the tree structures and intricate relationships among nodes in a simple and effective way. The derived sample tree is stored as a synopsis for selectivity estimation. Extensive experimental results show that, in comparison with the state-of-the-art structure synopses, specifically the TreeSketch and Xseed synopses, our sample tree synopsis applies to a broader range of query types, requires several orders of magnitude less construction time, and generates estimates with considerably better precision for complex datasets.

References

[1]
Dblp data set. http://www.informatik.unitrier.de/ley/db/index.html.
[2]
N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: optimal xml pattern matching. Proceedings of the 2002 ACM SIGMOD international conf. on Management of data, pages 310--321, 2002.
[3]
S. Chaudhuri, R. Motwani, and V. Narasayya. On random sampling over joins. Proc. ACM SIGMOD Conf., June, 1999, pages 263--274, 1999.
[4]
T. Chen, J. Lu, and T. W. Ling. On boosting holism in xml twig pattern matching using structural indexing techniques. Proceedings of the 2005 ACM SIGMOD international conf. on Management of data, 2005.
[5]
Z. Chen, H. Jagadish, F. Korn, N. Koudas, S. Muthukrishnan, R. T. Ng, and D. Srivastava. Couting twig matches in a tree. Proceedings of the 17th International Conference on Data Engineering, pages 595--604, 2001.
[6]
W. G. Cochran. Sampling Techniques. Wiley, 1977.
[7]
J. Freire, J. R. Haritsa, M. Ramanath, P. Roy, and J. Siméon. Statix: making xml count. Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pages 181--191, 2002.
[8]
S. Ganguly, P. Gibbons, Y. Matias, and A. Silberschatz. Bifocal sampling for skew-resistant join size estimation. Proceedings of the 1996 ACM SIGMOD international conf. on Management of data, pages 271--281, 1996.
[9]
P. Hass, J. Naughton, S. Seshadri, and L. Stokes. Sampling-based estimation of the number of distinct values of an attribute. Proc. 21st Intl. Conf. on Very Large Data Bases, pages 311--322, September 1995.
[10]
P. Hass, J. Naughton, S. Seshadri, and A. Swami. Fixed-precision estimation of join selectivity. Proc. 12th ACM Symp. on Principles of Database Systems, pages 190--201, May 1993.
[11]
W.-C. Hou, G. Ozsoyoglu, and B. Taneja. Statistical estimators for relational algebra expression. Proc. 7th ACM Symp. on Principles of Database Systems, pages 276--287, 1988.
[12]
W.-C. Hou, G. Ozsoyoglu, and B. Taneja. Processing aggregate relational queries with hard time constraints. Proc. ACM SIGMOD International Conf. on Management of Data, pages 68--77, June 1989.
[13]
H. Jiang, H. Lu, and W. Wang. Efficient processing of xml twig queries with or-predicates. Proceedings of the 2004 ACM SIGMOD international conf. on Management of data, 2004.
[14]
R. J. Lipton, J. F. Naughton, and D. A. Schneider. Practical selectivity estimation through adaptive sampling. Proceedings 1990 ACM SIGMOD Intl. Conf. Managment of Data, pages 1--11, 1990.
[15]
J. Lu, T. W. Ling, C.-Y. Chan, and T. Chen. From region encoding to extended dewey: on efficient processing of xml twig pattern matching. Proceedings of the 31st international conf. on very large data bases, 2005.
[16]
C. Luo, Z. Jiang, W.-C. Hou, F. Yan, and C.-F. Wang. Estimating xml structural join size quickly and economically. Proc. 22nd Intl. Conf. on Data Engineering, 2006.
[17]
M. M. Moro, Z. Vagena, and V. J. Tsotras. Tree-pattern queries on a lightweight xml processor. Proceedings of the 31st international conf. on very large data bases, 2005.
[18]
N. Polyzotis and M. Garofalakis. Structure and value synopses for xml data graphs. Proceedings of the 28th Intl. Conf. on Very Large Data Bases, 2002.
[19]
N. Polyzotis and M. N. Garofalakis. Statistical synopses for graph-structured xml databases. Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pages 358--369, 2002.
[20]
N. Polyzotis, M. N. Garofalakis, and Y. Ioannidis. Approximate xml query answers. Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pages 263--274, 2004.
[21]
N. Polyzotis, M. N. Garofalakis, and Y. Ioannidis. Selectivity estimation for xml twigs. Proceedings of the 20th International Conference on Data Engineering, 2004.
[22]
S. Ross. Introduction to Probability Models, 2nd Ed. Academic Press, 1980.
[23]
A. Schmidt, F. Waas, M. Kersten, D. Florescu, L. Manolescu, M. J. Carey, and R. Busse. The XML benchmark project. Technical report, CWI, 2001.
[24]
W. Wang, H. Jiang, H. Lu, and J. X. Yu. Containment join size estimation: models and methods. Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pages 358--369, 2003.
[25]
N. Zhang, M. T. Ozsu, A. Aboulnaga, and I. F. Ilyas. Xseed: Accurate and fast cardinality estimation for xpath queries. Proc. 22nd Intl. Conf. on Data Engineering, 2006.

Cited By

View all
  • (2021)Demythization of Structural XML Query Processing: Comparison of Holistic and Binary ApproachesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.294615733:4(1439-1452)Online publication date: 1-Apr-2021
  • (2018)Querying GraphsSynthesis Lectures on Data Management10.2200/S00873ED1V01Y201808DTM05110:3(1-184)Online publication date: Oct-2018
  • (2017)Structural XML Query ProcessingACM Computing Surveys10.1145/309579850:5(1-41)Online publication date: 26-Sep-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
March 2009
1180 pages
ISBN:9781605584225
DOI:10.1145/1516360
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 March 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

EDBT/ICDT '09
EDBT/ICDT '09: EDBT/ICDT '09 joint conference
March 24 - 26, 2009
Saint Petersburg, Russia

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)42
  • Downloads (Last 6 weeks)13
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Demythization of Structural XML Query Processing: Comparison of Holistic and Binary ApproachesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.294615733:4(1439-1452)Online publication date: 1-Apr-2021
  • (2018)Querying GraphsSynthesis Lectures on Data Management10.2200/S00873ED1V01Y201808DTM05110:3(1-184)Online publication date: Oct-2018
  • (2017)Structural XML Query ProcessingACM Computing Surveys10.1145/309579850:5(1-41)Online publication date: 26-Sep-2017
  • (2016)A distributed selectivity-driven search strategy for semi-structured data over DHT-based networksJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.03.01593:C(10-29)Online publication date: 1-Jul-2016
  • (2016)XHQEInformation Systems Frontiers10.1007/s10796-015-9561-618:6(1233-1249)Online publication date: 1-Dec-2016
  • (2016)A Framework for Sampling-Based XML Data PricingSpecial Issue on Database- and Expert-Systems Applications on Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIV - Volume 951010.1007/978-3-662-49214-7_4(116-138)Online publication date: 1-Jan-2016
  • (2015)Cost-based holistic twig joinsInformation Systems10.1016/j.is.2015.03.00452:C(21-33)Online publication date: 1-Aug-2015
  • (2015)Improved selectivity estimator for XML queries based on structural synopsisWorld Wide Web10.1007/s11280-014-0311-318:4(1123-1144)Online publication date: 1-Jul-2015
  • (2014)A tool for Internet-scale cardinality estimation of XPath queries over distributed semistructured data2014 IEEE 30th International Conference on Data Engineering10.1109/ICDE.2014.6816758(1270-1273)Online publication date: Mar-2014
  • (2014)A gossip-based approach for Internet-scale cardinality estimation of XPath queries over distributed semistructured dataThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-013-0314-123:1(51-76)Online publication date: 1-Feb-2014
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media