[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/253260.253291acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article
Free access

Online aggregation

Published: 01 June 1997 Publication History

Abstract

Aggregation in traditional database systems is performed in batch mode: a query is submitted, the system processes a large volume of data over a long period of time, and, eventually, the final answer is returned. This archaic approach is frustrating to users and has been abandoned in most other areas of computing. In this paper we propose a new online aggregation interface that permits users to both observe the progress of their aggregation queries and control execution on the fly. After outlining usability and performance requirements for a system supporting online aggregation, we present a suite of techniques that extend a database system to meet these requirements. These include methods for returning the output in random order, for providing control over the relative rate at which different aggregates are computed, and for computing running confidence intervals. Finally, we report on an initial implementation of online aggregation in POSTGRES.

References

[1]
A. Agarwal, R. Agrawal, P. hi. Deshpande, A. Gupta, J. F. Naughton, F{. F{amakrishnan. and S. Sarawagi. On the computation of multidimensional aggregates. In Proc. 22nd Intl. Conf. Very Large Data Bases, Mumbai(Bombay), September 1996.
[2]
A. Aiken, J. Chen, M. Stonebraker, and A. Woodruff. Tioga-2: A direct manipulation database visualization environment. In Proc. of the I 2th Intl. Conf. Data Engineering, pages 208-217, New Orleans, February 1996.
[3]
G. Antoshenkov and M Ziauddin. Query processing and optimization in Oracle Rdb. VLDB Journal, 5(4):229-237. 1996.
[4]
P. Billingsley. Probability and Measure. Wiley, New York, second edition, 1986.
[5]
R. J. Bayardo, Jr. and D. P. Miranker. Processing queries for first-few answers. In Fifth Intl. Con}. Information and Knowledge Management, pages 45-52, Rockville, Maryland, 1996.
[6]
K. Bratbergsengen. Hashing methods and relational algebra operations. In Proc. I Oth Intl. Conf. Very Large Data Bases, pages 323-333, Singapore, August 1984.
[7]
E. F. Codd, S. B. Codd, and C. T. Sal- Icy. Providing OLAP (on-line analyticalprocessing) to user-analysts: an IT mandate.URL http://www.arborsoft.com/papers/coddTOC.htmI., I993.
[8]
T. F. Chan, G. H. Golub, and R. J. LeVeque. Algorithms for computing the sample variance: Analysis and recommendation. Amer. Statist., 37:242-247, 1983.
[9]
S. Chaudhuri and K. Shim. Optimizing queries with aggregate views. In P. M. G. Apers, M. Bouzeghoub, and G. Gardarin, editors, Advances in Database Technology- EDBT'96 5th Intl. Conj. on Extending Database Technology, volume 1057 of Lecture Notes in Computer Science, pages 167-182. Springer-Verlag, New York, 1996.
[10]
D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. R. Stonebraker, and D. Wood. Implementation techniques for main memory database systems. In Proc. ACM-SIGMOD Intl. Conf. Management of Data, pages 1-8, Boston, June 1984.
[11]
J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data cube: A relational aggregation operator generalizing Group-By, Cross-Tab, and Sub-Totals. In Proc. of the I2th Intl. Conf. Data Engineering, pages 152-159, 1996.
[12]
A. Gupta, V. Harinarayan, and D. Quass. Aggregatequery processing in data warehousing environments. In Proc. 21st Intl. Con}. Very Large Data Bases, Zurich, September 1995, pages 358-369.
[13]
P. J. Haas. Hoeffding inequalities for join-selectivity estimation and online aggregation. IBM Research Report RJ 10040, IBM Almaden Research Center, San Jose, CA, 1996.
[14]
P. J. Haas. Large-sample and deterministic confidence intervals for online aggregation. IBM Research Report RJ 10050, IBM Almaden Research Center, San Jose, CA, 1996.
[15]
J. M. Hellerstein and J. F. Naughton. Query execution techniques for caching expensive methods. In Proc. ACM- SIGMOD Intl. Con}. Management of Data, Montreal, June 1996, pages 423-424.
[16]
P. J. Haas, J. F. Naughton, S. Seshadri, and A. N. Swami. Selectivity and cost estimation for joins based on random sampling. J. Comput. System Sci., 52:550-569, 1996.
[17]
W.-C. Hou, G. ()zsoyoglu, and E. Dogdu. Errorconstrained count query evaluation in relational databases. In Proceedir~gs, 1991 ACM-SIGMOD Intl. CoT~.f. Managrnent of Data, pages 278-287. ACM Press, 1991.
[18]
W. Hoeffding. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc., 58:13- 30, 1963.
[19]
W.-C. Hou, G. Ozsoyoglu, and B. K. Taneja. Statistical estimators for relational algebra expressions. In Proc. 7th .4CM SIGACT-SIGMOD-SIGART Symposium on Principles o/Database Systems, pages 276-287, Austin, March 1988.
[20]
W.-C. Hou, G. Ozsoyoglu, and B. K. Taneja. Processing aggregate relational queries with hard time constraints. In Proc. A CM-SIGMOD Intl. Con}. Management of Data, pages 68-77, Portland, May-June 1989.
[21]
R. J. Lipton, J. F. Naughton, D. A. Schneider, and S. Seshadri. Efficient sampling strategies for relational database operations. Theoretical Computer Science, 116:195- 226, 1993.
[22]
B. A. Myers. The importance of percent-done progress indicators for computer-human interfaces. In Proc. SIG CHI '85: Human Factors in Computing Systems, pages 11-17, April 1985.
[23]
F. Olken. Random Sampling from Databases. PhD thesis, University of California, Berkeley, 1993.
[24]
Postgres95 home page,1995. UFI.L http://www, ki.net / postgres95.
[25]
P. G. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, and T. Price. Access path selection in a relational database management system. In Proc. A CM-SIGMOD Intl. Conf. Management o.f Data, pages 22-34, Boston, June 1979.
[26]
P. Seshadri, J. M. Hellerstein, H. Pirahesh, T. C. Leung, It. Ramakrishnan, D. Srivastava, P. J. Stuckey, and S. Sudarshan. Cost-based optimization for magic: Algebra and implementation. In Proc. A CM-SIGMOD intl. Conf. Management o/Data, Montreal, June 1996, pages 435-446.
[27]
P. Seshadri, H. Pirahesh, and T. C. Leung. Complex query decorrelation. In Proc. I2th IEEE Intl. Conf. Data Engineering, New Orleans, February 1996.
[28]
M. Vetterli and K. M. Uz. Multiresolution coding techniques for digital video: A review. Multidimensional Systems and Signal Processing, 3:161-187, 1992.
[29]
S. V. Vrbsky and J. W. S. Liu. APPROXIMATE m A query processor that produces monotonically improving approximate answers. IEEE Transactions on Knowledge and Data Engineering, 5(6):1056-1068, 1993.
[30]
A. Wilschut and P. Apers. Dataflow query execution in a parallel main-memory environment. In Proc. First Intl. Conf. Parallel and Distributed Information Systems, pages 68-77, Dec 1991.
[31]
W. P. Yan and P.-A. Larson. Eager aggregation and lazy aggregation. In Proc. 21st Intl. Conf. Very Large Data Bases, Zurich, September 1995, pages 345-357.

Cited By

View all
  • (2024)GAN-Based Tabular Data Generator for Constructing Synopsis in Approximate Query Processing: Challenges and SolutionsMachine Learning and Knowledge Extraction10.3390/make60100106:1(171-198)Online publication date: 16-Jan-2024
  • (2024)Biathlon: Harnessing Model Resilience for Accelerating ML Inference PipelinesProceedings of the VLDB Endowment10.14778/3675034.367505217:10(2631-2640)Online publication date: 1-Jun-2024
  • (2024)Accurate Sampling-Based Cardinality Estimation for Complex Graph QueriesACM Transactions on Database Systems10.1145/368920949:3(1-46)Online publication date: 17-Sep-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data
June 1997
594 pages
ISBN:0897919114
DOI:10.1145/253260
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1997

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS97
Sponsor:

Acceptance Rates

SIGMOD '97 Paper Acceptance Rate 42 of 202 submissions, 21%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)502
  • Downloads (Last 6 weeks)59
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)GAN-Based Tabular Data Generator for Constructing Synopsis in Approximate Query Processing: Challenges and SolutionsMachine Learning and Knowledge Extraction10.3390/make60100106:1(171-198)Online publication date: 16-Jan-2024
  • (2024)Biathlon: Harnessing Model Resilience for Accelerating ML Inference PipelinesProceedings of the VLDB Endowment10.14778/3675034.367505217:10(2631-2640)Online publication date: 1-Jun-2024
  • (2024)Accurate Sampling-Based Cardinality Estimation for Complex Graph QueriesACM Transactions on Database Systems10.1145/368920949:3(1-46)Online publication date: 17-Sep-2024
  • (2024)SpatialSSJP: QoS-Aware Adaptive Approximate Stream-Static Spatial Join ProcessorIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.333066935:1(73-88)Online publication date: Jan-2024
  • (2024)When Quantum Computing Meets Database: A Hybrid Sampling Framework for Approximate Query ProcessingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348027836:12(9532-9546)Online publication date: Dec-2024
  • (2024)Learning-Based Sample Tuning for Approximate Query Processing in Interactive Data ExplorationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.334145136:11(6532-6546)Online publication date: Nov-2024
  • (2024)Generalized Measure-Biased Sampling and Priority SamplingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.334067336:11(6251-6265)Online publication date: Nov-2024
  • (2024)DCLA: Towards Distributed Cooperative Learning Analytics for Developing CommunitiesHCI International 2024 – Late Breaking Papers10.1007/978-3-031-76815-6_8(94-106)Online publication date: 11-Dec-2024
  • (2023)Efficient Complex Aggregate Queries with Accuracy Guarantee Based on Execution Cost Model over Knowledge GraphsMathematics10.3390/math1118390811:18(3908)Online publication date: 14-Sep-2023
  • (2023)Cache-Efficient Top-k Aggregation over High Cardinality Large DatasetsProceedings of the VLDB Endowment10.14778/3636218.363622217:4(644-656)Online publication date: 1-Dec-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media