[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1007568.1007664acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Query sampling in DB2 Universal Database

Published: 13 June 2004 Publication History

Abstract

Executing ad hoc queries against large databases can be prohibitively expensive. Exploratory analysis of data may not require exact answers to queries, however: results based on sampling the data are often satisfactory. Supporting sampling as a primitive SQL operator turns out to be difficult because sampling does not commute with many SQL operators.In this paper, we describe an implementation in IBM® DB2® Universal Database (UDB) of a sampling operator that commutes with some SQL operators. As a result, the query with the sampling operator always returns a random sample of the answers and in many cases runs faster than it would have without such an operator.

References

[1]
Swarup Acharya, Phillip B. Gibbons, and Viswanath Poosala. Congressional samples for approximate answering of group-by queries. In Proceedings of SIGMOD, pages 487--498, 2000.
[2]
Swarup Acharya, Phillip B. Gibbons, Viswanath Poosala, and Sridhar Ramaswamy. Join synopses for approximate query answering. In Proceedings of SIGMOD, pages 275--286, 1999.
[3]
U. Chakravarthy, J. Grant, and J. Minker. Logic-based approach to semantic query optimization. ACM TODS, 15(2):162--207, June 1990.
[4]
S. Chaudhuri and K. Shim. Including group-by in query optimization. In Proc. of VLDB, pages 354--366, 1994.
[5]
Surajit Chaudhuri, Rajeev Motwani, and Vivek R. Narasayya. Random sampling for histogram construction: How much is enough? In Proceedings of SIGMOD, pages 436--447, 1998.
[6]
Surajit Chaudhuri, Rajeev Motwani, and Vivek R. Narasayya. On random sampling over joins. In Proceedings SIGMOD, pages 263--274, 1999.
[7]
Q. Cheng, J. Gryz, F. Koo, C. Leung, L. Liu, X. Qian, and B. Schiefer. Implementation of two semantic query optimization techniques in DB2 UDB. In Proc. of the 25th VLDB, pages 687--698, Edinburgh, Scotland, 1999.
[8]
Sumit Ganguly, Phillip B. Gibbons, Yossi Matias, and Abraham Silberschatz. Bifocal sampling for skew-resistant join size estimation. In Proceedings of SIGMOD, pages 271--281, 1996.
[9]
L.M. Haas et al. Starburst Mid-Flight: As the Dust Clears. IEEE TKDE, pages 143--160, March 1990.
[10]
P. Haas and J. M. Hellerstein. Tutorial: Online query processing. In Proceedings of Sigmod, page 623, 2001.
[11]
Peter J. Haas, Jeffrey F. Naughton, and Arun N. Swami. On the relative cost of sampling for join selectivity estimation. In Proceedings of PODS, pages 14--24, 1994.
[12]
J. Hellerstein, P. Haas, and H. Wang. Online aggregation. In Proceedings of SIGMOD, pages 171--182, 1997.
[13]
A.Y. Levy, I. Mumick, and Y. Sagiv. Query optimization by predicate move-around. In Proc. of VLDB, pages 96--108, 1994.
[14]
I. Mumick and H. Pirahesh. Implementation of magic sets in Starburst. In Proc. SIGMOD, pages 103--114, 1994.
[15]
Jeffrey F. Naughton and S. Seshadri. On estimating the size of projections. In Proceedings of ICDT, pages 499--513, 1990.
[16]
Frank Olken. Random Sampling from Databases. PhD thesis, University of California at Berkeley, 1993.
[17]
G. Paulley and P. Larson. Exploiting uniqueness in query optimization. In Proceedings of ICDE, pages 68--79, 1994.
[18]
H. Pirahesh, J. M. Hellerstein, and W. Hasan. Extensible/rule based query rewrite optimization in Starburst. In Proceedings of SIGMOD, pages 39--48, 1992.
[19]
H. Pirahesh, T. Y. C. Leung, and W. Hasan. A rule engine for query transformation in Starburst and IBM DB2 C/S DBMS. In Proc. ICDE, pages 391--400, 1997.
[20]
Transaction Processing Performance Council, 777 No. First Street, Suite 600, San Jose, CA 95112--6311, www.tpc.org. TPC Benchmark#8482;, 2.1.0 edition.
[21]
Hai Wang and Kenneth C. Sevcik. A multi-dimensional histogram for selectivity estimation and fast approximate query answering. In Proceedings of CASCON, pages 246--260, 2003.

Cited By

View all

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data
June 2004
988 pages
ISBN:1581138598
DOI:10.1145/1007568
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2004

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS04
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Practical Dynamic Extension for Sampling IndexesProceedings of the ACM on Management of Data10.1145/36267441:4(1-26)Online publication date: 12-Dec-2023
  • (2021)MISS: finding optimal sample sizes for approximate analyticsDistributed and Parallel Databases10.1007/s10619-021-07376-5Online publication date: 21-Oct-2021
  • (2016)Interactive Visualization of Large Data SetsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2016.255732428:8(2142-2157)Online publication date: 1-Aug-2016
  • (2013)A sampling algebra for aggregate estimationProceedings of the VLDB Endowment10.14778/2556549.25565636:14(1798-1809)Online publication date: 1-Sep-2013
  • (2010)Sampling dirty data for matching attributesProceedings of the 2010 ACM SIGMOD International Conference on Management of data10.1145/1807167.1807177(63-74)Online publication date: 6-Jun-2010
  • (2009)Histograms for OLAP and Data-Stream QueriesEncyclopedia of Data Warehousing and Mining, Second Edition10.4018/978-1-60566-010-3.ch151(976-981)Online publication date: 2009
  • (2006)Fast approximate computation of statistics on viewsProceedings of the 2006 ACM SIGMOD international conference on Management of data10.1145/1142473.1142564(724-724)Online publication date: 27-Jun-2006
  • (2005)Hierarchisches gruppenbasiertes SamplingInformatik - Forschung und Entwicklung10.1007/s00450-004-0175-320:1-2(45-56)Online publication date: 15-Mar-2005

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media