[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1065167.1065195acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

Buffering in query evaluation over XML streams

Published: 13 June 2005 Publication History

Abstract

All known algorithms for evaluating advanced XPath queries (e.g., ones with predicates or with closure axes) on XML streams employ buffers to temporarily store fragments of the document stream. In many cases, these buffers grow very large and constitute a major memory bottleneck. In this paper, we identify two broad classes of evaluation problems that independently necessitate the use of large memory buffers in evaluation of queries over XML streams: (1) full-fledged evaluation (as opposed to just filtering) of queries with predicates; (2) evaluation (whether full-fledged or filtering) of queries with "multi-variate" predicates.We prove quantitative lower bounds on the amount of memory required in each of these scenarios. The bounds are stated in terms of novel document properties that we define. We show that these scenarios, in combination with query evaluation over recursive documents, cover the cases in which large buffers are required. Finally, we present algorithms that match the lower bounds for an important fragment of XPath.

References

[1]
N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 58(1):137--147, 1999.
[2]
M. Altinel and M. J. Franklin. Efficient filtering of XML documents for selective dissemination of information. In Proc. 26th VLDB, pages 53--64, 2000.
[3]
A. Arasu, B. Babcock, S. Babu, J. McAlister, and J. Widom. Characterizing memory requirements for queries over continuous data streams. In Proc. 21st PODS, pages 221--232, 2002.
[4]
I. Avila-Campillo, T. J. Green, A. Gupta, M. Onizuka, D. Raven, and D. Suciu. XMLTK: An XML toolkit for scalable XML stream processing. In Proc. 1st Workshop on Programming Languages for XML (PLAN-X), 2002.
[5]
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proc. 21st PODS, pages 1--16, 2002.
[6]
Z. Bar-Yossef, M. Fontoura, and V. Josifovski. On the memory requirements of XPath evaluation over XML streams. In Proc. 23rd PODS, pages 177--188, 2004.
[7]
Z. Bar-Yossef, T. S. Jayram, R. Kumar, and D. Sivakumar. An information statistics approach to data stream and communication complexity. In Proc. 43rd FOCS, pages 209--218, 2002.
[8]
A. Berglund, S. Boag, D. Chamberlin, M. F. Fernández, M. Kay, J. Robie, and J. Siméon. XML Path Language (XPath) 2.0. W3C, http://www.w3.org/TR/xpath20, 2004.
[9]
C. Y. Chan, P. Felber, M. N. Garofalakis, and R. Rastogi. Efficient filtering of XML documents with XPath expressions. In Proc. 18th ICDE, pages 235--244, 2002.
[10]
Y. Diao, P. M. Fischer, M. J. Franklin, and R. To. YFilter: Efficient and scalable filtering of XML documents. In Proc. 18th ICDE, pages 341--342, 2002.
[11]
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences, 66(4):614--656, 2003.
[12]
M. Fernández, A. Malhotra, J. Marsh, M. Nagy, and N. Walsh. XQuery 1.0 and XPath 2.0 Data Model. W3C, http://www.w3.org/TR/xpath-datamodel, 2004.
[13]
G. Gottlob, C. Koch, and R. Pichler. The complexity of XPath query evaluation. In Proc. 22nd PODS, pages 179--190, 2003.
[14]
T. Green, G. Miklau, M. Onizuka, and D. Suciu. Processing XML streams with deterministic automata. In Proc. 9th ICDT, pages 173--189, 2003.
[15]
A. K. Gupta and D. Suciu. Stream processing of XPath queries with predicates. In Proc. 22nd SIGMOD, pages 419--430, 2003.
[16]
M. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. In DIMACS series in Discrete Mathematics and Theoretical Computer Science, volume 50, pages 107--118, 1999.
[17]
P. Indyk and D. Woodruff. Tight lower bounds for the distinct elements problem. In Proc. 44th FOCS, pages. 283--289, 2003.
[18]
Z. Ives, A. Levy, and D. Weld. Efficient evaluation of regular path expressions on streaming XML data. Technical report, University of Washington, 2000.
[19]
V. Josifovski, M. Fontoura, and A. Barta. Querying XML streams. The VLDB Journal, 2004. to appear.
[20]
C. Koch, S. Scherzinger, N. Schweikardt, and B. Stegmaier. Schema-based scheduling of event processors and buffer minimization for queries on structured data streams. In Proc. 30th VLDB, pages 228--239, 2004.
[21]
E. Kushilevitz and N. Nisan. Communication Complexity. Cambridge University Press, 1997.
[22]
A. Malhotra, J. Melton, and N. Walsh. XQuery 1.0 and XPath 2.0 Functions and Operators. W3C, http://www.w3.org/TR/xquary-operators, 2004.
[23]
A. Marian and J. Siméon. Projecting XML documents. In Proc. 29th VLDB, pages 213--224, 2003.
[24]
S. Muthukrishnan. Data streams: Algorithms and applications. In Proc. 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), page 413, 2003. http://www.cs.rutgers.edu/~muthu/stream-1-1.ps.
[25]
D. Olteanu, T. Kiesling, and F. Bry. An evaluation of regular path expressions with qualifiers against XML streams. In Proc. 19th ICDE, pages 702--704, 2003.
[26]
F. Peng and S. S. Chawathe. XPath queries on streaming data. In Proc. 22nd SIGMOD, pages 431--442, 2003.
[27]
M. Saks and X. Sun. Space lower bounds for distance approximation in the data stream model. In Proc. 34th STOC, pages 360--369, 2002.
[28]
L. Segoufin. Typing and querying XML documents: Some complexity bounds. In Proc. 22nd PODS, pages 167--178, 2003.
[29]
D. Woodruff. Optimal space lower bounds for all frequency moments. In Proc. 15th SODA, pages 167--175, 2004.
[30]
A. C.-C. Yao. Some complexity questions related to distributive computing. In Proc. 11th STOC, pages 209--213, 1979.

Cited By

View all
  • (2024)Streaming enumeration on nested documentsACM Transactions on Database Systems10.1145/3701557Online publication date: 25-Oct-2024
  • (2013)Querying Streaming XML Big Data with Multiple Filters on CloudProceedings of the 2013 IEEE 16th International Conference on Computational Science and Engineering10.1109/CSE.2013.163(1121-1127)Online publication date: 3-Dec-2013
  • (2013)A survey on XML streaming evaluation techniquesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-012-0281-y22:2(177-202)Online publication date: 1-Apr-2013
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '05: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2005
388 pages
ISBN:1595930620
DOI:10.1145/1065167
  • General Chair:
  • Georg Gottlob,
  • Program Chair:
  • Foto Afrati
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2005

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS05

Acceptance Rates

Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Streaming enumeration on nested documentsACM Transactions on Database Systems10.1145/3701557Online publication date: 25-Oct-2024
  • (2013)Querying Streaming XML Big Data with Multiple Filters on CloudProceedings of the 2013 IEEE 16th International Conference on Computational Science and Engineering10.1109/CSE.2013.163(1121-1127)Online publication date: 3-Dec-2013
  • (2013)A survey on XML streaming evaluation techniquesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-012-0281-y22:2(177-202)Online publication date: 1-Apr-2013
  • (2013)Right-Universality of Visibly Pushdown AutomataRuntime Verification10.1007/978-3-642-40787-1_5(76-93)Online publication date: 2013
  • (2011)Streamable fragments of forward XPathProceedings of the 16th international conference on Implementation and application of automata10.5555/2032366.2032369(3-15)Online publication date: 13-Jul-2011
  • (2011)Scalable XML Filtering for Content SubscriptionsTheoretical and Practical Advances in Information Systems Development10.4018/978-1-60960-521-6.ch007(120-152)Online publication date: 2011
  • (2011)Memory lower bounds for XPath evaluation over XML streamsJournal of Computer and System Sciences10.1016/j.jcss.2010.10.00477:6(1120-1140)Online publication date: 1-Nov-2011
  • (2011)Queries on Xml streams with bounded delay and concurrencyInformation and Computation10.1016/j.ic.2010.08.003209:3(409-442)Online publication date: 1-Mar-2011
  • (2011)Streamable Fragments of Forward XPathImplementation and Application of Automata10.1007/978-3-642-22256-6_2(3-15)Online publication date: 2011
  • (2010)Processing XPath queries with forward and downward axes over XML streamsProceedings of the 13th International Conference on Extending Database Technology10.1145/1739041.1739048(27-38)Online publication date: 22-Mar-2010
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media