[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1807167.1807236acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Hierarchically organized skew-tolerant histograms for geographic data objects

Published: 06 June 2010 Publication History

Abstract

Histograms have been widely used for fast estimation of query result sizes in query optimization. In this paper, we propose a new histogram method, called the Skew-Tolerant Histogram (STHistogram) for two or three dimensional geographic data objects that are used in many real-world applications in practice. The proposed method provides a significantly enhanced accuracy in a robust manner even for the data set that has a highly skewed distribution. Our method detects hotspots present in various parts of a data set and exploits them in organizing histogram buckets. For this purpose, we first define the concept of a hotspot, and provide an algorithm that efficiently extracts hotspots from the given data set. Then, we present our histogram construction method that utilizes hotspot information. We also describe how to estimate query result sizes by using the proposed histogram. We show through extensive performance experiments that the proposed method provides better performance than other existing methods.

References

[1]
S. Acharya, V. Poosala, and S. Ramaswamy. Selectivity estimation in spatial databases. In SIGMOD, pages 13--24, 1999.
[2]
N. Bruno, S. Chaudhuri, and L. Gravano. STHoles: a multidimensional workload-aware histogram. In SIGMOD, pages 211--222, 2001.
[3]
N. Bruno, S. Chaudhuri, and L. Gravano. Top-k selection queries over relational databases: mapping strategies and performance evaluation. ACM Trans. Database Syst., 27(2):153--187, 2002.
[4]
S. Chaudhuri, N. Dalvi, and R. Kaushik. Robust cardinality and cost estimation for skyline operator. In ICDE, page 64, 2006.
[5]
S. Chaudhuri and L. Gravano. Evaluating top-k selection queries. In VLDB, pages 397--410, 1999.
[6]
Y. J. Choi and C. W. Chung. Selectivity estimation for spatio-temporal queries to moving objects. In SIGMOD, pages 440--451, 2002.
[7]
I. Clark and W. Harper. Practical Geostatistics 2000. Ecosse North America Llc, Columbus, Ohio, USA, 2000.
[8]
T. Eavis and A. Lopez. RK-Hist: an r-tree based histogram for multi-dimensional selectivity estimation. In CIKM, pages 475--484, 2007.
[9]
P. B. Gibbons, Y. Matias, and V. Poosala. Fast incremental maintenance of approximate histograms. ACM Trans. Database Syst., 27(3):261--298, 2002.
[10]
S. Guha, K. Shim, and J. Woo. REHIST: relative error histogram construction algorithms. In VLDB, pages 300--311, 2004.
[11]
D. Gunopulos, G. Kollios, V. J. Tsotras, and C. Domeniconi. Approximating multi-dimensional aggregate range queries over real attributes. In SIGMOD, pages 463--474, 2000.
[12]
D. Gunopulos, G. Kollios, V. J. Tsotras, and C. Domeniconi. Selectivity estimators for multidimensional range queries over real attributes. VLDB J., 14(2):137--154, 2005.
[13]
P. J. Haas and A. N. Swami. Sequential sampling procedures for query size estimation. In SIGMOD, pages 341--350, 1992.
[14]
Y. E. Ioannidis. The history of histograms. In VLDB, pages 19--30, 2003.
[15]
Y. E. Ioannidis. Query Optimization. ACM Comput. Surv., 28(1):121--123, 1996.
[16]
I. Kamel and C. Faloutsos. On Packing R-trees. In CIKM, pages 490--499, 1993.
[17]
J. Lee, D. Kim, and C. Chung. Multi-dimensional selectivity estimation using compressed histogram information. In SIGMOD, pages 205--214, 1999
[18]
Y. Matias, J. S. Vitter, and M. Wang. Wavelet-based histograms for selectivity estimation. In SIGMOD, pages 448--459, 1998.
[19]
M. Muralikrishna and D. J. DeWitt. Equi-depth histograms for estimating selectivity factors for multidimensional queries. In SIGMOD, pages 28--36, 1988.
[20]
S. Muthukrishnan, V. Poosala, and T. Suel. On Rectangular Partitionings in Two Dimensions: Algorithms, Complexity, and Applications. In Int'l Conf. Database Theory, pages 236--256, 1999.
[21]
D. Papadias, Y. Tao, G. Fu, and B. Seeger. Progressive skyline computation in database systems. ACM Trans. Database Syst., 30(1):41--82, 2005.
[22]
V. Poosala and Y. E. Ioannidis. Estimation of query-result distribution and its application in parallel-join load balancing. In VLDB, pages 448--459, 1996.
[23]
V. Poosala and Y. E. Ioannidis. Selectivity estimation without the attribute value independence assumption. In VLDB, pages 486--495, 1997.
[24]
U. Srivastava, P. J. Haas, V. Markl, N. Megiddo, M. Kutsch, and T. M. Tran. ISOMER: consistent histogram construction using query feedback. In ICDE, page 39, 2006.
[25]
M. Stonebraker, J. Frew, K. Gardels, and J. Meredith. The SEQUOIA 2000 storage benchmark. In SIGMOD, pages 2--11, 1993.
[26]
J. Sun, Y. Tao, D. Papadias, and G. Kollios. Spatio-temporal join selectivity. Inf. Syst., 31(8):793--813, 2006.
[27]
Y. Tao, J. Sun, and D. Papadias. Selectivity estimation for predictive spatio-temporal queries. In ICDE, pages 417--428, 2003.
[28]
N. Thaper, S. Guha, P. Indyk, and N. Koudas. Dynamic multidimensional histograms. In SIGMOD, pages 428--439, 2002.
[29]
J. S. Vitter, M. Wang, and B. R. Iyer. Data cube approximation and histograms via wavelets. In CIKM, pages 96--104, 1998.
[30]
X. Zhou, D. J. Abel, and D. Truffet. Data partitioning for parallel spatial join processing. GeoInformatica, 2(2):175--204, 1998.

Cited By

View all
  • (2023)Scheduling distributed multiway spatial join queries: optimization models and algorithmsInternational Journal of Geographical Information Science10.1080/13658816.2023.217038037:6(1388-1419)Online publication date: 6-Feb-2023
  • (2018)Differentially Private and Skew-Aware Spatial Decompositions for Mobile CrowdsensingSensors10.3390/s1811369618:11(3696)Online publication date: 30-Oct-2018
  • (2017)Parallel Selectivity Estimation for Optimizing Multidimensional Spatial Join Processing on GPUs2017 IEEE 33rd International Conference on Data Engineering (ICDE)10.1109/ICDE.2017.236(1591-1598)Online publication date: Apr-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
June 2010
1286 pages
ISBN:9781450300322
DOI:10.1145/1807167
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2010

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. histograms
  2. query optimization
  3. spatial databases

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '10
Sponsor:
SIGMOD/PODS '10: International Conference on Management of Data
June 6 - 10, 2010
Indiana, Indianapolis, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)2
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Scheduling distributed multiway spatial join queries: optimization models and algorithmsInternational Journal of Geographical Information Science10.1080/13658816.2023.217038037:6(1388-1419)Online publication date: 6-Feb-2023
  • (2018)Differentially Private and Skew-Aware Spatial Decompositions for Mobile CrowdsensingSensors10.3390/s1811369618:11(3696)Online publication date: 30-Oct-2018
  • (2017)Parallel Selectivity Estimation for Optimizing Multidimensional Spatial Join Processing on GPUs2017 IEEE 33rd International Conference on Data Engineering (ICDE)10.1109/ICDE.2017.236(1591-1598)Online publication date: Apr-2017
  • (2016)Workload-Aware Self-tuning Histograms for the Semantic WebTransactions on Large-Scale Data- and Knowledge-Centered Systems XXVIII - Volume 994010.1007/978-3-662-53455-7_6(133-156)Online publication date: 1-Jun-2016
  • (2015)A compression-based framework for the efficient analysis of business process logsProceedings of the 27th International Conference on Scientific and Statistical Database Management10.1145/2791347.2791351(1-12)Online publication date: 29-Jun-2015
  • (2015)Improving Accuracy and Robustness of Self-Tuning Histograms by Subspace ClusteringIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2015.241672527:9(2377-2389)Online publication date: 4-Aug-2015
  • (2015)Workload-Aware Self-Tuning Histograms of String DataProceedings, Part I, of the 26th International Conference on Database and Expert Systems Applications - Volume 926110.1007/978-3-319-22849-5_20(285-299)Online publication date: 1-Sep-2015
  • (2013)Statistics collection in oracle spatial and graphProceedings of the VLDB Endowment10.14778/2536222.25362286:11(1021-1032)Online publication date: 1-Aug-2013
  • (2013)RFID-data compression for supporting aggregate queriesACM Transactions on Database Systems10.1145/2487259.248726338:2(1-45)Online publication date: 4-Jul-2013
  • (2013)STHist-CGeoinformatica10.1007/s10707-012-0154-y17:2(325-352)Online publication date: 1-Apr-2013
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media