More Web Proxy on the site http://driver.im/

Article

Free access

Online aggregation

Authors:

Joseph M. Hellerstein,

Helen J. WangAuthors Info & Claims

SIGMOD '97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data

Pages 171 - 182

https://doi.org/10.1145/253260.253291

Published: 01 June 1997 Publication History

Abstract

Aggregation in traditional database systems is performed in batch mode: a query is submitted, the system processes a large volume of data over a long period of time, and, eventually, the final answer is returned. This archaic approach is frustrating to users and has been abandoned in most other areas of computing. In this paper we propose a new online aggregation interface that permits users to both observe the progress of their aggregation queries and control execution on the fly. After outlining usability and performance requirements for a system supporting online aggregation, we present a suite of techniques that extend a database system to meet these requirements. These include methods for returning the output in random order, for providing control over the relative rate at which different aggregates are computed, and for computing running confidence intervals. Finally, we report on an initial implementation of online aggregation in POSTGRES.

References

[1]

A. Agarwal, R. Agrawal, P. hi. Deshpande, A. Gupta, J. F. Naughton, F{. F{amakrishnan. and S. Sarawagi. On the computation of multidimensional aggregates. In Proc. 22nd Intl. Conf. Very Large Data Bases, Mumbai(Bombay), September 1996.

Digital Library

[2]

A. Aiken, J. Chen, M. Stonebraker, and A. Woodruff. Tioga-2: A direct manipulation database visualization environment. In Proc. of the I 2th Intl. Conf. Data Engineering, pages 208-217, New Orleans, February 1996.

Digital Library

[3]

G. Antoshenkov and M Ziauddin. Query processing and optimization in Oracle Rdb. VLDB Journal, 5(4):229-237. 1996.

Digital Library

[4]

P. Billingsley. Probability and Measure. Wiley, New York, second edition, 1986.

[5]

R. J. Bayardo, Jr. and D. P. Miranker. Processing queries for first-few answers. In Fifth Intl. Con}. Information and Knowledge Management, pages 45-52, Rockville, Maryland, 1996.

Digital Library

[6]

K. Bratbergsengen. Hashing methods and relational algebra operations. In Proc. I Oth Intl. Conf. Very Large Data Bases, pages 323-333, Singapore, August 1984.

Digital Library

[7]

E. F. Codd, S. B. Codd, and C. T. Sal- Icy. Providing OLAP (on-line analyticalprocessing) to user-analysts: an IT mandate.URL http://www.arborsoft.com/papers/coddTOC.htmI., I993.

[8]

T. F. Chan, G. H. Golub, and R. J. LeVeque. Algorithms for computing the sample variance: Analysis and recommendation. Amer. Statist., 37:242-247, 1983.

[9]

S. Chaudhuri and K. Shim. Optimizing queries with aggregate views. In P. M. G. Apers, M. Bouzeghoub, and G. Gardarin, editors, Advances in Database Technology- EDBT'96 5th Intl. Conj. on Extending Database Technology, volume 1057 of Lecture Notes in Computer Science, pages 167-182. Springer-Verlag, New York, 1996.

Digital Library

[10]

D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. R. Stonebraker, and D. Wood. Implementation techniques for main memory database systems. In Proc. ACM-SIGMOD Intl. Conf. Management of Data, pages 1-8, Boston, June 1984.

Digital Library

[11]

J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data cube: A relational aggregation operator generalizing Group-By, Cross-Tab, and Sub-Totals. In Proc. of the I2th Intl. Conf. Data Engineering, pages 152-159, 1996.

Digital Library

[12]

A. Gupta, V. Harinarayan, and D. Quass. Aggregatequery processing in data warehousing environments. In Proc. 21st Intl. Con}. Very Large Data Bases, Zurich, September 1995, pages 358-369.

Digital Library

[13]

P. J. Haas. Hoeffding inequalities for join-selectivity estimation and online aggregation. IBM Research Report RJ 10040, IBM Almaden Research Center, San Jose, CA, 1996.

[14]

P. J. Haas. Large-sample and deterministic confidence intervals for online aggregation. IBM Research Report RJ 10050, IBM Almaden Research Center, San Jose, CA, 1996.

[15]

J. M. Hellerstein and J. F. Naughton. Query execution techniques for caching expensive methods. In Proc. ACM- SIGMOD Intl. Con}. Management of Data, Montreal, June 1996, pages 423-424.

Digital Library

[16]

P. J. Haas, J. F. Naughton, S. Seshadri, and A. N. Swami. Selectivity and cost estimation for joins based on random sampling. J. Comput. System Sci., 52:550-569, 1996.

Digital Library

[17]

W.-C. Hou, G. ()zsoyoglu, and E. Dogdu. Errorconstrained count query evaluation in relational databases. In Proceedir~gs, 1991 ACM-SIGMOD Intl. CoT~.f. Managrnent of Data, pages 278-287. ACM Press, 1991.

[18]

W. Hoeffding. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc., 58:13- 30, 1963.

[19]

W.-C. Hou, G. Ozsoyoglu, and B. K. Taneja. Statistical estimators for relational algebra expressions. In Proc. 7th .4CM SIGACT-SIGMOD-SIGART Symposium on Principles o/Database Systems, pages 276-287, Austin, March 1988.

Digital Library

[20]

W.-C. Hou, G. Ozsoyoglu, and B. K. Taneja. Processing aggregate relational queries with hard time constraints. In Proc. A CM-SIGMOD Intl. Con}. Management of Data, pages 68-77, Portland, May-June 1989.

Digital Library

[21]

R. J. Lipton, J. F. Naughton, D. A. Schneider, and S. Seshadri. Efficient sampling strategies for relational database operations. Theoretical Computer Science, 116:195- 226, 1993.

Digital Library

[22]

B. A. Myers. The importance of percent-done progress indicators for computer-human interfaces. In Proc. SIG CHI '85: Human Factors in Computing Systems, pages 11-17, April 1985.

Digital Library

[23]

F. Olken. Random Sampling from Databases. PhD thesis, University of California, Berkeley, 1993.

[24]

Postgres95 home page,1995. UFI.L http://www, ki.net / postgres95.

[25]

P. G. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, and T. Price. Access path selection in a relational database management system. In Proc. A CM-SIGMOD Intl. Conf. Management o.f Data, pages 22-34, Boston, June 1979.

Digital Library

[26]

P. Seshadri, J. M. Hellerstein, H. Pirahesh, T. C. Leung, It. Ramakrishnan, D. Srivastava, P. J. Stuckey, and S. Sudarshan. Cost-based optimization for magic: Algebra and implementation. In Proc. A CM-SIGMOD intl. Conf. Management o/Data, Montreal, June 1996, pages 435-446.

Digital Library

[27]

P. Seshadri, H. Pirahesh, and T. C. Leung. Complex query decorrelation. In Proc. I2th IEEE Intl. Conf. Data Engineering, New Orleans, February 1996.

Digital Library

[28]

M. Vetterli and K. M. Uz. Multiresolution coding techniques for digital video: A review. Multidimensional Systems and Signal Processing, 3:161-187, 1992.

Digital Library

[29]

S. V. Vrbsky and J. W. S. Liu. APPROXIMATE m A query processor that produces monotonically improving approximate answers. IEEE Transactions on Knowledge and Data Engineering, 5(6):1056-1068, 1993.

Digital Library

[30]

A. Wilschut and P. Apers. Dataflow query execution in a parallel main-memory environment. In Proc. First Intl. Conf. Parallel and Distributed Information Systems, pages 68-77, Dec 1991.

Digital Library

[31]

W. P. Yan and P.-A. Larson. Eager aggregation and lazy aggregation. In Proc. 21st Intl. Conf. Very Large Data Bases, Zurich, September 1995, pages 345-357.

Digital Library

Cited By

Clementi AGualà LPepè Sciarria LStraziota ANejdl WAuer SKarras OCha MMoens MNajork M(2025)Maintaining k-MinHash Signatures over Fully-Dynamic Data Streams with RecoveryProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703491(79-87)Online publication date: 10-Mar-2025
https://dl.acm.org/doi/10.1145/3701551.3703491
Fallahian MDorodchi MKreth K(2024)GAN-Based Tabular Data Generator for Constructing Synopsis in Approximate Query Processing: Challenges and SolutionsMachine Learning and Knowledge Extraction10.3390/make60100106:1(171-198)Online publication date: 16-Jan-2024
https://doi.org/10.3390/make6010010
Chang CLo EYe C(2024)Biathlon: Harnessing Model Resilience for Accelerating ML Inference PipelinesProceedings of the VLDB Endowment10.14778/3675034.367505217:10(2631-2640)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.14778/3675034.3675052
Show More Cited By

Index Terms

Online aggregation

Recommendations

Ripple joins for online aggregation

We present a new family of join algorithms, called ripple joins, for online processing of multi-table aggregation queries in a relational database management system (DBMS). Such queries arise naturally in interactive exploratory decision-support ...
Online aggregation

Aggregation in traditional database systems is performed in batch mode: a query is submitted, the system processes a large volume of data over a long period of time, and, eventually, the final answer is returned. This archaic approach is frustrating to ...
The Case for Online Aggregation

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data

June 1997

594 pages

ISBN:0897919114

DOI:10.1145/253260

Editors:
Joan M. Peckman
Univ. of Rhode Island, Kingston
,
Sudha Ram
Univ. of Arizona, Tucson
,
Michael Franklin

ACM SIGMOD Record Volume 26, Issue 2
June 1997
583 pages
ISSN:0163-5808
DOI:10.1145/253262
Chairman:
Sudha Ram
Univ. of Arizona, Tucson
,
Editor:
Joan M. Peckham
Univ. of Rhode Island, Kingston
Issue’s Table of Contents

Copyright © 1997 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1997

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

SIGMOD/PODS97

Sponsor:

SIGMOD

SIGMOD/PODS97: Joint ACM SIGMOD International Conference on Management of Data and ACM Symposium on Principles of Database Systems

May 11 - 15, 1997

Arizona, Tucson, USA

Acceptance Rates

SIGMOD '97 Paper Acceptance Rate 42 of 202 submissions, 21%;

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

819
Total Citations
View Citations
4,369
Total Downloads

Downloads (Last 12 months)441
Downloads (Last 6 weeks)57

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Clementi AGualà LPepè Sciarria LStraziota ANejdl WAuer SKarras OCha MMoens MNajork M(2025)Maintaining k-MinHash Signatures over Fully-Dynamic Data Streams with RecoveryProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703491(79-87)Online publication date: 10-Mar-2025
https://dl.acm.org/doi/10.1145/3701551.3703491
Fallahian MDorodchi MKreth K(2024)GAN-Based Tabular Data Generator for Constructing Synopsis in Approximate Query Processing: Challenges and SolutionsMachine Learning and Knowledge Extraction10.3390/make60100106:1(171-198)Online publication date: 16-Jan-2024
https://doi.org/10.3390/make6010010
Chang CLo EYe C(2024)Biathlon: Harnessing Model Resilience for Accelerating ML Inference PipelinesProceedings of the VLDB Endowment10.14778/3675034.367505217:10(2631-2640)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.14778/3675034.3675052
Hu PMotik B(2024)Accurate Sampling-Based Cardinality Estimation for Complex Graph QueriesACM Transactions on Database Systems10.1145/368920949:3(1-46)Online publication date: 17-Sep-2024
https://dl.acm.org/doi/10.1145/3689209
Jawarneh IBellavista PCorradi AFoschini LMontanari R(2024)SpatialSSJP: QoS-Aware Adaptive Approximate Stream-Static Spatial Join ProcessorIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.333066935:1(73-88)Online publication date: Jan-2024
https://doi.org/10.1109/TPDS.2023.3330669
Wu SShi MZhang DZhao JYuan GChen G(2024)When Quantum Computing Meets Database: A Hybrid Sampling Framework for Approximate Query ProcessingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348027836:12(9532-9546)Online publication date: Dec-2024
https://doi.org/10.1109/TKDE.2024.3480278
Zhang HJing YHe ZZhang KWang X(2024)Learning-Based Sample Tuning for Approximate Query Processing in Interactive Data ExplorationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.334145136:11(6532-6546)Online publication date: Nov-2024
https://doi.org/10.1109/TKDE.2023.3341451
Chang ZLi FShen Y(2024)Generalized Measure-Biased Sampling and Priority SamplingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.334067336:11(6251-6265)Online publication date: Nov-2024
https://doi.org/10.1109/TKDE.2023.3340673
Konomi SGao LMushi DRen B(2024)DCLA: Towards Distributed Cooperative Learning Analytics for Developing CommunitiesHCI International 2024 – Late Breaking Papers10.1007/978-3-031-76815-6_8(94-106)Online publication date: 11-Dec-2024
https://doi.org/10.1007/978-3-031-76815-6_8
Ye SXu XWang YFu T(2023)Efficient Complex Aggregate Queries with Accuracy Guarantee Based on Execution Cost Model over Knowledge GraphsMathematics10.3390/math1118390811:18(3908)Online publication date: 14-Sep-2023
https://doi.org/10.3390/math11183908
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten