[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/375663.375670acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Space-efficient online computation of quantile summaries

Published: 01 May 2001 Publication History

Abstract

An ∈-approximate quantile summary of a sequence of N elements is a data structure that can answer quantile queries about the sequence to within a precision of ∈N.
We present a new online algorithm for computing∈-approximate quantile summaries of very large data sequences. The algorithm has a worst-case space requirement of Ο(1÷∈ log(∈N)). This improves upon the previous best result of Ο(1÷∈ log2(∈N)). Moreover, in contrast to earlier deterministic algorithms, our algorithm does not require a priori knowledge of the length of the input sequence.
Finally, the actual space bounds obtained on experimental data are significantly better than the worst case guarantees of our algorithm as well as the observed space requirements of earlier algorithms.

References

[1]
Rakesh Agrawal and Arun Swami. A one-pass space-efficient algorithm for finding quantiles. Proc. 7th Int. Conf. Management of Data, COMAD, 28-30 December 1995.
[2]
Khaled Alsabti, Sanjay Ranka, and Vineet Singh. A one-pass algorithm for accurately estimating quantiles for disk-resident data. Proceedings of the 23rd Intl. Conference onVery Large Data Bases, Athens, Greece, 26-29 August 1997, pages 346-355, Los Altos, CA 94022, USA, 1997. Morgan Kaufmann Publishers.
[3]
Surajit Chaudhuri, Rajeev Motwani, and Vivek Narasayya. Random sampling for histogram construction: how much is enough? In ACM SIGMOD '98, volume 28, pages 436-447, Seattle, WA, June 1-4, 1998.
[4]
Phillip B. Gibbons, Yossi Matias, and Viswanath Poosala. Fast incremental maintenance of approximate histograms. In Proceedings of the 23rd Intl. Conf. Very Large Data Bases, VLDB, pages 466-475. Morgan Kaufmann, 25-27 August 1997.
[5]
Michael B. Greenwald. Practical algorithms for self scaling histograms or better than average data collection. Performance Evaluation, 27&28:19-40, October 1996.
[6]
R. Jain and I. Chlamtac. The P2 algorithm for dynamic calculation of quantile and histograms without storing observations. Communications of the ACM, 28(10):1076-1085, October 1986.
[7]
I. Pohl. A minimum storage algorithm for computing the median. IBM Research Report RC 2701, November 1969.
[8]
Gurmeet Singh Manku, Sridhar Rajagopalan, and Bruce G. Lindsay. Approximate medians and other quantiles in one pass and with limited memory. ACM SIGMOD '98, volume 28, pages 426-435, Seattle, WA, June 1998.
[9]
Gurmeet Singh Manku, Sridhar Rajagopalan, and Bruce G. Lindsay. Random sampling techniques for space efficient online computation of order statistics of large datasets. In ACM SIGMOD '99, volume 29, pages 251-262. Philadelphia, PA, June 1999.
[10]
J. I. Munro and M.S. Paterson. Selection and sorting with limited storage. Theoretical Computer Science, vol. 12: 315-323; 1980.
[11]
M.S. Paterson. Progress in selection. Technical Report, University ofWarwick, Coventry, UK, 1997.
[12]
Viswanath Poosala, Venkatesh Ganti, and Yannis E. Ioannidis. Approximate query answering using histograms. Bulletin of the IEEE Technical Committee on Data Engineering, 22(4):6-15, December 1999.
[13]
Viswanath Poosala, Peter J. Haas, Yannis E. Ioannidis, and Eugene J. Shekita. Improved histograms for selectivity estimation of range predicates. In ACM SIGMOD 96, volume 26, pages 294-305, Montreal, Quebec, Canada, June 4-6, 1996.

Cited By

View all
  • (2024)Parallel and Distributed Frugal Tracking of a QuantileFuture Internet10.3390/fi1609033516:9(335)Online publication date: 13-Sep-2024
  • (2024)Parallel and Distributed Frugal Tracking of a QuantileProceedings of the Seventh International Workshop on Systems and Network Telemetry and Analytics10.1145/3660320.3660332(1-6)Online publication date: 3-Jun-2024
  • (2024)Determining Exact Quantiles with Randomized SummariesProceedings of the ACM on Management of Data10.1145/36392802:1(1-26)Online publication date: 26-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of data
May 2001
630 pages
ISBN:1581133324
DOI:10.1145/375663
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2001

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS01
Sponsor:

Acceptance Rates

SIGMOD '01 Paper Acceptance Rate 44 of 293 submissions, 15%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)141
  • Downloads (Last 6 weeks)21
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Parallel and Distributed Frugal Tracking of a QuantileFuture Internet10.3390/fi1609033516:9(335)Online publication date: 13-Sep-2024
  • (2024)Parallel and Distributed Frugal Tracking of a QuantileProceedings of the Seventh International Workshop on Systems and Network Telemetry and Analytics10.1145/3660320.3660332(1-6)Online publication date: 3-Jun-2024
  • (2024)Determining Exact Quantiles with Randomized SummariesProceedings of the ACM on Management of Data10.1145/36392802:1(1-26)Online publication date: 26-Mar-2024
  • (2022)Relative Error Streaming QuantilesACM SIGMOD Record10.1145/3542700.354271751:1(69-76)Online publication date: 1-Jun-2022
  • (2022)A monitoring framework for deployed machine learning models with supply chain examples2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020394(2231-2238)Online publication date: 17-Dec-2022
  • (2022)Locally Differentially Private Quantile Summary Aggregation in Wireless Sensor NetworksIntelligent Information and Database Systems10.1007/978-3-031-21743-2_29(364-376)Online publication date: 9-Dec-2022
  • (2021)Relative Error Streaming QuantilesProceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3452021.3458323(96-108)Online publication date: 20-Jun-2021
  • (2021)Weighted Distinct Sampling: Cardinality Estimation for SPJ QueriesProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452821(1465-1477)Online publication date: 9-Jun-2021
  • (2021)Cluster-ReduceProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467217(2316-2326)Online publication date: 14-Aug-2021
  • (2021)Theory meets Practice at the Median: A Worst Case Comparison of Relative Error Quantile AlgorithmsProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467152(2722-2731)Online publication date: 14-Aug-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media