[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Multi-dimensional selectivity estimation using compressed histogram information

Published: 01 June 1999 Publication History

Abstract

The database query optimizer requires the estimation of the query selectivity to find the most efficient access plan. For queries referencing multiple attributes from the same relation, we need a multi-dimensional selectivity estimation technique when the attributes are dependent each other because the selectivity is determined by the joint data distribution of the attributes. Additionally, for multimedia databases, there are intrinsic requirements for the multi-dimensional selectivity estimation because feature vectors are stored in multi-dimensional indexing trees. In the 1-dimensional case, a histogram is practically the most preferable. In the multi-dimensional case, however, a histogram is not adequate because of high storage overhead and high error rates.
In this paper, we propose a novel approach for the multi-dimensional selectivity estimation. Compressed information from a large number of small-sized histogram buckets is maintained using the discrete cosine transform. This enables low error rates and low storage overheads even in high dimensions. In addition, this approach has the advantage of supporting dynamic data updates by eliminating the overhead for periodical reconstructions of the compressed information. Extensive experimental results show advantages of the proposed approach.

References

[1]
R. Agrawal, C. Faloutsos, A. Swami. Efficient Similarity Search In Sequence Databases. Foundations of Data Organizations and Algorithms Conference, 1993.
[2]
S. Berchtold, C. Bohm, H. Kriegel. The Pyramid Technique: Towards Breaking the Curse of Dimensiomdity. ACM SIGMOD Conference, pp.142-153, 1998.
[3]
S. Berchtold, D. Keim, H. Kriegel. The X-tree: An index Structure for High-Dimensional Data. 22th VLZ)B Conference, pp. 28-39, 1996
[4]
A. Belussi, C. Faloutsos. Estimating the Selectivir.~ of Spatial Queries Using the 'Correlation' Fractal Dimens.ion. 21th VLDB Conference, 1995.
[5]
C. Chen. N. Roussopoulos. Adaptive Selectivity Estimation Using Query Feedback. ACM SIGMOD Conference, pp. 161 - 172, 1994.
[6]
W. Chang, G. Sheikholeslami, A Zhang, T. Syeda- Mahmood. Efficient Resource Selection in Distributed Visual Information Systems. A CM Multimedia Conference, pp. 203- 213, 1997.
[7]
S. Chaudhuri, L. Gravano. Optimizing Queries over Multimedia Repositories. A CM SIGMOD Conference pp. 91- 102, 1996,.
[8]
S. Christodoulakis. Estimating record selectivities. Information Systems Journal, 8(2), pp 105-115, 1983.
[9]
M. Ester, H. Kriegel, J. Sander, M. Winner, X. Xu. Incremental Clustering for Mining in Data Warehousing Environment. 24th VLDB Conference, pp. 323-333, 1998.
[10]
M. Ester, H. Kriegel, J. Sander, X. Xu. A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proc. 2nd Int. Conf. on Knowledge Discovering and Data Mining, 1996.
[11]
R. Fagin. Combining Fuzzy Information from Multiple Systems. In Proc. of the 5th A CM Symposium on Principles of Database Systems, 1996.
[12]
R. Fagin. Fuzzy Queries in Multimedia Database Systems. In Proc. of the 7th A CM Symposium on Principles of Database Systems, pp. 1-10, 1998.
[13]
S. Guha, R. Rastogi, K. Shim. CURE: An Efficient Clustering Algorithm for Large Databases. A CM SIGMOD Conference, pp. 73-84, 1998.
[14]
EJ. Haas, J.E Naughton, S. Seshadri, and L. Stokes. Sampling based estimation of the number of distinct values of an attribute. 21th VLDB Conference, 1995.
[15]
Y. Ioannidis. Universality of Serial Histograms. 19th VLDB Conference, pp. 256-267, 1993.
[16]
Y. Ioannidis, V. Poosala. Balancing Optimality and Practicality for Query Result Size Estimation. A CM SIGMOD Conference, pp. 233-244, 1995.
[17]
H. Jagadish, N. Kouda, S. Muthukrishnan, V. Poosala, K. Sevcik, T. Suel. Optimal Histograms with Quality Gurantees. 24th VLDB Conference, pp. 275-286, 1998.
[18]
J.S. Lira. Two Dimensional Signal And Image Processing. Prentice Hall, 1990.
[19]
M.V. Mannino, E Chu, and T. Sager. Statistical profile estimation in database systems. A CM Computing Surveys, 20(3), 1988.
[20]
R. Ng, J. Han. Efficient and Effective Clustering Methods for Spatial Data Minig. 20th VLDB Conference, 1994.
[21]
B. Pagel, H. Six, H. Toben, E Widmayer. Towards an Analysis of Range Query Performance in Spatial Data Structures. In Proc. of the 2nd A CM Symposium on Principles of Database Systems, 1993.
[22]
V. Poosala, Y.E. Ioannidis, P.J. Haas, E.J. Shekita. Improved Histograms for Selectivity Estimation of Range Predicates. A CM SIGMOD Conference, pp. 294-305, 1996.
[23]
V. Poosala, Y.E. loannidis. Selectivity Estimation Without the Attribute Value Independence Assumption. 23th VLDB Conference, pp. 486-495, 1997.
[24]
K.R. Rao, P. Yip. Discrete Cosine Transform Algorithms, Advantages, Applications. Academic Press, 1990.
[25]
G. Sheikholeslami, W. Chang, A. Zhang. Semantic Clustering and Querying Heterogeneous Features for Visual Data. A CM Multimedia Conference, pp. 3-12, 1998.
[26]
W. Sun, Y. Ling, N. Rishe, and Y. Deng. An Instant and accurate size estimation method for joins and selections in a retrieval-intensive environment. A CM SIGMOD Conference, 1993.
[27]
K.Y. Whang, S.W. Kim, G. Wiederhold. Dynamic Maintenance of Data Distribution for Selectivity Estimation, VLDB Journal. Vol.3, No. 1, pp. 29-51, 1994.
[28]
T. Zhang, R. Ramakrishnam, M. Linvry. BIRCH: An Efficient Data Clustering Method for Very Large Databases. A CM SIGMOD Conference, pp. 103-114, 1996.

Cited By

View all
  • (2024)FFT-Based Probability Density Imaging of Euler SolutionsEntropy10.3390/e2606051726:6(517)Online publication date: 15-Jun-2024
  • (2024)ROME: Robust Query Optimization via Parallel Multi-Plan ExecutionProceedings of the ACM on Management of Data10.1145/36549732:3(1-25)Online publication date: 30-May-2024
  • (2024)3-D probability density imaging of Euler solutions using gravity data: a case study of Mount Milligan, CanadaActa Geophysica10.1007/s11600-023-01279-y72:5(3371-3391)Online publication date: 31-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 28, Issue 2
June 1999
599 pages
ISSN:0163-5808
DOI:10.1145/304181
Issue’s Table of Contents
  • cover image ACM Conferences
    SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of data
    June 1999
    604 pages
    ISBN:1581130848
    DOI:10.1145/304182
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1999
Published in SIGMOD Volume 28, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)153
  • Downloads (Last 6 weeks)39
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)FFT-Based Probability Density Imaging of Euler SolutionsEntropy10.3390/e2606051726:6(517)Online publication date: 15-Jun-2024
  • (2024)ROME: Robust Query Optimization via Parallel Multi-Plan ExecutionProceedings of the ACM on Management of Data10.1145/36549732:3(1-25)Online publication date: 30-May-2024
  • (2024)3-D probability density imaging of Euler solutions using gravity data: a case study of Mount Milligan, CanadaActa Geophysica10.1007/s11600-023-01279-y72:5(3371-3391)Online publication date: 31-Jan-2024
  • (2023)Kernel Density Derivative Estimation of Euler SolutionsApplied Sciences10.3390/app1303178413:3(1784)Online publication date: 30-Jan-2023
  • (2023)Marigold: Efficient k-Means Clustering in High DimensionsProceedings of the VLDB Endowment10.14778/3587136.358714716:7(1740-1748)Online publication date: 1-Mar-2023
  • (2021)Approximating Multidimensional Range Counts with Maximum Error Guarantees2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00141(1595-1606)Online publication date: Apr-2021
  • (2019)Exact Cardinality Query Optimization with Bounded Execution CostProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3300087(2-17)Online publication date: 25-Jun-2019
  • (2019)CS*: Approximate Query Processing on Big Data using Scalable Join Correlated Sample Synopsis2019 IEEE International Conference on Big Data (Big Data)10.1109/BigData47090.2019.9006440(583-592)Online publication date: Dec-2019
  • (2018)HomerunProceedings of the VLDB Endowment10.14778/3236187.323620111:11(1496-1508)Online publication date: 1-Jul-2018
  • (2017)DigitHistProceedings of the VLDB Endowment10.14778/3137628.313765810:11(1514-1525)Online publication date: 1-Aug-2017
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media