[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Scalable approximate query processing with the DBO engine

Published: 12 December 2008 Publication History

Abstract

This article describes query processing in the DBO database system. Like other database systems designed for ad hoc analytic processing, DBO is able to compute the exact answers to queries over a large relational database in a scalable fashion. Unlike any other system designed for analytic processing, DBO can constantly maintain a guess as to the final answer to an aggregate query throughout execution, along with statistically meaningful bounds for the guess's accuracy. As DBO gathers more and more information, the guess gets more and more accurate, until it is 100% accurate as the query is completed. This allows users to stop the execution as soon as they are happy with the query accuracy, and thus encourages exploratory data analysis.

References

[1]
Acharya, S., Gibbons, P. B., Poosala, V., and Ramaswamy, S. 1999. Join synopses for approximate query answering. SIGMOD Rec. 28, 2, 275--286.
[2]
Antoshenkov, G. 1992. Random sampling from pseudo-ranked b+ trees. In Proceedings of the 18th International Conference on Very Large Data Bases. (VLDB'92), Vancouver, Canada. Morgan Kaufmann, San Francisco, CA, 375--382.
[3]
Chaudhuri, S., Motwani, R., and Narasayya, V. R. 1999. On random sampling over joins. In Proceedings of the 1999 SIGMOD International Conference (SIGMOD'99). Philadelphia, PA. ACM, New York, NY, 263--274.
[4]
Cochran, W. G. 1977. Sampling Techniques, 3rd Edition. John Wiley.
[5]
Dittrich, J.-P., Seeger, B., Taylor, D. S., and Widmayer, P. 2002. Progressive merge join: a generic and non-blocking sort-based join algorithm. In Proceedings of the 28th International Conference on Very Large Databases (VLDB'02), Hong Kong, China, VLDB Endowment, 299--310.
[6]
Dittrich, J.-P., Seeger, B., Taylor, D. S., and Widmayer, P. 2003. On producing join results early. In Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS). San Diego, CA. ACM, New York, NY, 134--142.
[7]
Dobra, A. 2005. Histograms revisited: when are histograms the best approximation method for aggregates over joins? In Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS). Baltimore, MD. ACM, New York, NY, 228--237.
[8]
Haas, P. J. 1997. Large-sample and deterministic confidence intervals for online aggregation. In Proceedings of the 9th International Conference on Scientific and Statistical Database Management (SSDBM). IEEE Computer Society, 51--63.
[9]
Haas, P. J. and Hellerstein, J. M. 1999. Ripple joins for online aggregation. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). Philadelphia, PA, ACM, New York, NY, 287--298.
[10]
Haas, P. J., Naughton, J. F., Seshadri, S., and Swami, A. N. 1996. Selectivity and cost estimation for joins based on random sampling. J. Comput. Syst. Sci. 52, 3, 550--569.
[11]
Hardy, G., Littlewood, J., and Polya, G. 1988. Inequalities. Cambridge University Press.
[12]
Hellerstein, J. M., Avnur, R., Chou, A., Hidber, C., Olston, C., Raman, V., Roth, T., and Haas, P. J. 1999. Interactive data analysis: the control project. Computer 32, 8, 51--59.
[13]
Hellerstein, J. M., Haas, P. J., and Wang, H. J. 1997. Online aggregation. (SIGMOD). Rec. 26, 2, 171--182.
[14]
Jermaine, C., Arumugam, S., Pol, A., and Dobra, A. 2007. Scalable approximate query processing with the DBO engine. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). Beijing, China. ACM, New York, NY, 725--736.
[15]
Jermaine, C., Dobra, A., Arumugam, S., Joshi, S., and Pol, A. 2005a. A disk-based join with probabilistic guarantees. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). Baltimore, MD. ACM, New York, NY, 563--574.
[16]
Jermaine, C., Dobra, A., Pol, A., and Joshi, S. 2005b. Online estimation for subset-based sql queries. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB). Trondheim, Norway. VLDB Endowment, 745--756.
[17]
Luo, G., Ellmann, C., Haas, P. J., and Naughton, J. F. 2002. A scalable hash ripple join algorithm. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, NY, 252--262.
[18]
Olken, F. 1993. Random sampling from databases. Ph.D. thesis, U. of California, Berkeley.
[19]
Olken, F. and Rotem, D. 1989. Random sampling from b+ trees. In Proceedings of the 15th International Conference on Very Large Data Bases (VLDB). Amsterdam, The Netherlands. Morgan Kaufmann, San Francisco, CA, 269--277.
[20]
Özsoyoglu, G., Du, K., Swamy, S. G., and Hou, W.-C. 1992. Processing real-time, non-aggregate queries with time-constraints in case-db. In Proceedings of the 8th International Conference on Data Engineering (ICDE), IEEE Computer Society, 410--417.
[21]
Shao, J. 1999. Mathematical Statistics. Springer-Verlag.
[22]
Shapiro, L. D. 1986. Join processing in database systems with large main memories. ACM Trans. Database Syst. 11, 3, 239--264.
[23]
Stefanov, S. 2001. Separable Programming. Applied Optimization, vol. 53. Kluwer Academic Publishers.
[24]
Vitter, J. S. and Wang, M. 1999. Approximate computation of multidimensional aggregates of sparse data using wavelets. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, NY, 193--204.

Cited By

View all
  • (2023)A Step Toward Deep Online AggregationProceedings of the ACM on Management of Data10.1145/35892691:2(1-28)Online publication date: 20-Jun-2023
  • (2023)Active Sampling for Sparse Table by Bayesian Optimization with Adaptive Resolution2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00068(816-828)Online publication date: Apr-2023
  • (2022)High-Dimensional Data CubesProceedings of the VLDB Endowment10.14778/3565838.356583915:13(3828-3840)Online publication date: 1-Sep-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 33, Issue 4
November 2008
379 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1412331
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 December 2008
Accepted: 01 July 2008
Revised: 01 April 2008
Received: 01 October 2007
Published in TODS Volume 33, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Online aggregation
  2. randomized algorithms
  3. sampling

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)1
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A Step Toward Deep Online AggregationProceedings of the ACM on Management of Data10.1145/35892691:2(1-28)Online publication date: 20-Jun-2023
  • (2023)Active Sampling for Sparse Table by Bayesian Optimization with Adaptive Resolution2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00068(816-828)Online publication date: Apr-2023
  • (2022)High-Dimensional Data CubesProceedings of the VLDB Endowment10.14778/3565838.356583915:13(3828-3840)Online publication date: 1-Sep-2022
  • (2022)Impact of Cognitive Biases on Progressive VisualizationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.305101328:9(3093-3112)Online publication date: 1-Sep-2022
  • (2022)Salvaging failing and straggling queries2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00108(1382-1395)Online publication date: May-2022
  • (2022)A random walk sampling on knowledge graphs for semantic-oriented statistical tasksData & Knowledge Engineering10.1016/j.datak.2022.102024140:COnline publication date: 1-Jul-2022
  • (2022)ProS: data series progressive k-NN similarity search and classification with probabilistic quality guaranteesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-022-00771-z32:4(763-789)Online publication date: 30-Nov-2022
  • (2021)DavosProceedings of the VLDB Endowment10.14778/3476311.347637014:12(2893-2905)Online publication date: 28-Oct-2021
  • (2021)PGMJoins: Random Join Sampling with Graphical ModelsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457302(1610-1622)Online publication date: 9-Jun-2021
  • (2021)At-the-time and Back-in-time Persistent SketchesProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452802(1623-1636)Online publication date: 9-Jun-2021
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media