[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1247480.1247537acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Distributed query evaluation with performance guarantees

Published: 11 June 2007 Publication History

Abstract

Partial evaluation has recently proven an effective technique for evaluating Boolean XPath queries over a fragmented tree that is distributed over a number of sites. What left open is whether or not the technique is applicable to generic data-selecting XPath queries. In contrast to Boolean queries that return a single truth value, a generic XPath query returns a set of elements, and its evaluation introduces difficulties to avoiding excessive data shipping. This paper settles this question in positive by providing evaluation algorithms and optimizations for generic XPath queries in the same distributed and fragmented setting. These algorithms explore parallelism and retain the performance guarantees of their counterpart for Boolean queries, regardless of how the tree is fragmented and distributed. First, each site is visited at most three times, and down to at most twice when optimizations are in place. Second, the network traffic is determined by the final answer of the query, rather than the size of the tree, without incurring unnecessary data shipping. Third, the total computation is comparable to that of centralized algorithms on the tree stored in a single site. We show both analytically and experimentally that our algorithms and optimizations are scalable and efficient on large trees and complex XPath queries.

References

[1]
S. Abiteboul, A. Bonifati, G. Cobena, I. Manolescu, and T. Milo. Dynamic XML documents with distribution and replication. In SIGMOD, 2003.
[2]
S. Amer-Yahia, D. Srivastava, and D. Suciu. Distributed evaluation of network directory queries. TKDE, 16(4):474--486, 2004.
[3]
J. M. Bremer and M. Gertz. On distributing XML repositories. In WebDB, 2003.
[4]
N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: optimal XML pattern matching. In SIGMOD, 2002.
[5]
P. Buneman, G. Cong, W. Fan, and A. Kementsietsidis. Using partial evaluation in distributed query evaluation. In VLDB, 2006.
[6]
E. Colen, H. Kaplan, and T. Milo. Labeling dynamic XML tree. In PODS, 2002.
[7]
A. Crainiceanu, P. Linga, J. Gehrke, and J. Shanmugasundaram. Querying peer-to-peer networks using P-Trees. In WebDB, 2004.
[8]
D. J. DeWitt and J. Gray. Parallel database systems: The future of high performance database systems. Commun. ACM, 35(6), 1992.
[9]
P. Ganesan, M. Bawa, and H. Garcia-Molina. Online balancing of range-partitioned data with applications to Peer-to-Peer systems. In VLDB, 2004.
[10]
P. Godfrey and J. Gryz. A strategy for partial evaluation of views. In Intelligent Information Systems, 2000.
[11]
G. Gottlob, C. Koch, and R. Pichler. Efficient algorithms for processing XPath queries. In VLDB, 2002.
[12]
A. Gupta, Y. Sagiv, J. D. Ullman, and J. Widom. Constraint checking with partial information. In PODS, 1994.
[13]
A. Y. Halevy, Z. G. Ives, J. Madhavan, P. Mork, D. Suciu, and I. Tatarinov. The Piazza peer data management system. TKDE, 16(7):787--798, 2004.
[14]
H. Hsiao and D. J. DeWitt. Chained Declustering: A New Availability Strategy for Multiprocessor Database Machines. In ICDE, 1990.
[15]
Z. G. Ives, A. Y. Halevy, and D. S. Weld. An XML query engine for network-bound data. The VLDB Journal, 11(4):380--402, 2002.
[16]
H. V. Jagadish, L. V. S. Lakshmanan, T. Milo, D. Srivastava, and D. Vista. Querying network directories. In SIGMOD, 1999.
[17]
H. V. Jagadish, B. C. Ooi, and Q. H. Vu. Baton: A balanced tree structure for peer-to-peer networks. In SIGMOD, 2005.
[18]
N. D. Jones. An introduction to partial evaluation. ACM Computing Surveys, 28(3), 1996.
[19]
C. C. Kanne, M. Brantner, and G. Moerkotte. Cost-sensitive reordering of navigational primitives. In SIGMOD, 2005.
[20]
C. Koch. Efficient processing of expressive node-selecting queries on XML data in secondary storage: A tree automata-based approach. In VLDB, 2003.
[21]
D. Kossman. The State of the Art in Distributed Query Processing. ACM Computing Surveys, 32(4):422--469, 2000.
[22]
V. Papadimos and D. Maier. Distributed queries without distributed state. In WebDB, 2002.
[23]
P. Ramanan. Efficient algorithms for minimizing tree pattern queries. In SIGMOD, 2002.
[24]
Saxon. http://saxon.sourceforge.net/.
[25]
A. Schmidt, F. Waas, M. Kersten, M. Carey, I. Manolescu, and R. Busse. XMark: A benchmark for XML data management. In VLDB, 2002.
[26]
M. Smith and T. A. Howes. LDAP : Programming Directory-Enabled Apps. Sams, 1997.
[27]
D. Suciu. Distributed query evaluation on semistructured data. TODS, 27(1):1--62, 2002.
[28]
A. Tomasic, L. Raschid, and P. Valduriez. Scaling heterogeneous databases and the design of Disco. In ICDCS, pages 449--457, 1996.
[29]
Xerces and Xalan. http://xalan.apache.org.

Cited By

View all
  • (2021)Distributed Company Control in Company Shareholding Graphs2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00294(2637-2648)Online publication date: Apr-2021
  • (2021)Lambda calculus with algebraic simplification for reduction parallelisation: Extended studyJournal of Functional Programming10.1017/S095679682100005831Online publication date: 5-Apr-2021
  • (2021)Efficient distributed path computation on RDF knowledge graphs using partial evaluationWorld Wide Web10.1007/s11280-021-00965-5Online publication date: 4-Nov-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data
June 2007
1210 pages
ISBN:9781595936868
DOI:10.1145/1247480
  • General Chairs:
  • Lizhu Zhou,
  • Tok Wang Ling,
  • Program Chair:
  • Beng Chin Ooi
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Xpath queries
  2. distributed XML documents
  3. parallel query processing

Qualifiers

  • Article

Conference

SIGMOD/PODS07
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Distributed Company Control in Company Shareholding Graphs2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00294(2637-2648)Online publication date: Apr-2021
  • (2021)Lambda calculus with algebraic simplification for reduction parallelisation: Extended studyJournal of Functional Programming10.1017/S095679682100005831Online publication date: 5-Apr-2021
  • (2021)Efficient distributed path computation on RDF knowledge graphs using partial evaluationWorld Wide Web10.1007/s11280-021-00965-5Online publication date: 4-Nov-2021
  • (2020)A data distribution model for RDFDistributed and Parallel Databases10.1007/s10619-020-07296-wOnline publication date: 16-May-2020
  • (2019)Lambda calculus with algebraic simplification for reduction parallelization by equational reasoningProceedings of the ACM on Programming Languages10.1145/33416443:ICFP(1-25)Online publication date: 26-Jul-2019
  • (2019)Graph pattern matching with counting quantifiers and label-repetition constraintsCluster Computing10.1007/s10586-019-02977-3Online publication date: 30-Aug-2019
  • (2018)Using partial evaluation in holistic subgraph searchFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-016-5522-612:5(966-983)Online publication date: 1-Oct-2018
  • (2018)Estimating searching cost of regular path queries on large graphs by exploiting unit-subqueriesJournal of Heuristics10.1007/s10732-018-9402-028:2(149-169)Online publication date: 30-Nov-2018
  • (2018)Graph Pattern Matching Preserving Label-Repetition ConstraintsModel and Data Engineering10.1007/978-3-030-00856-7_17(268-281)Online publication date: 13-Sep-2018
  • (2016)Processing SPARQL queries over distributed RDF graphsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-015-0415-025:2(243-268)Online publication date: 1-Apr-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media