Efficient Selectivity Estimation by Histogram Construction Based on Subspace Clustering

Andranik Khachatryan¹⁹,
Emmanuel Müller¹⁹,
Klemens Böhm¹⁹ &
…
Jonida Kopper¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6809))

Included in the following conference series:

International Conference on Scientific and Statistical Database Management

1560 Accesses
5 Citations

Abstract

Modern databases have to cope with multi-dimensional queries. For efficient processing of these queries, query optimization relies on multi-dimensional selectivity estimation techniques. These techniques in turn typically rely on histograms. A core challenge of histogram construction is the detection of regions with a density higher than the ones of their surroundings. In this paper, we show that subspace clustering algorithms, which detect such regions, can be used to build high quality histograms in multi-dimensional spaces. The clusters are transformed into a memory-efficient histogram representation, while preserving most of the information for the selectivity estimation. We derive a formal criterion for our transformation of clusters into buckets that minimizes the introduced estimation error. In practice, finding optimal buckets is hard, so we propose a heuristic. Our experiments show that our approach is efficient in terms of both runtime and memory usage. Overall, we demonstrate that subspace clustering enables multi-dimensional selectivity estimation with low estimation errors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Selectivity estimation with density-model-based multidimensional histogram

Article 02 February 2021

Efficient locality-sensitive hashing over high-dimensional streaming data

Article 17 September 2020

A meta-heuristic density-based subspace clustering algorithm for high-dimensional data

Article 21 June 2021

References

Aboulnaga, A., Chaudhuri, S.: Self-tuning histograms: building histograms without looking at data. In: SIGMOD 1999 (1999)
Google Scholar
Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. SIGMOD Record 28(2) (1999)
Google Scholar
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Record 27(2) (1998)
Google Scholar
Amenta, N.: Bounded boxes, hausdorff distance, and a new proof of an interesting helly-type theorem. In: SCG 1994 (1994)
Google Scholar
Assent, I., Krieger, R., Müller, E., Seidl, T.: DUSC: Dimensionality unbiased subspace clustering. In: ICDM 2007 (2007)
Google Scholar
Assent, I., Krieger, R., Müller, E., Seidl, T.: INSCY: Indexing subspace clusters with in-process-removal of redundancy. In: ICDM 2008 (2008)
Google Scholar
Baltrunas, L., Mazeika, A., Bohlen, M.: Multi-dimensional histograms with tight bounds for the error
Google Scholar
Beyer, K.S, Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Google Scholar
Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: a multidimensional workload-aware histogram. SIGMOD Record (2001)
Google Scholar
Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2009)
MATH Google Scholar
Fuchs, D., He, Z., Lee, B.S.: Compressed histograms with arbitrary bucket layouts for selectivity estimation. Inf. Sci. 177, 680–702 (2007)
Article MATH Google Scholar
Gunopulos, D., Kollios, G., Tsotras, V.J., Domeniconi, C.: Approximating multi-dimensional aggregate range queries over real attributes. SIGMOD Record (2000)
Google Scholar
Halkidi, M., Vazirgiannis, M.: Clustering validity assessment: finding the optimal partitioning of a data set. In: ICDM 2001 (2001)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)
MATH Google Scholar
Ioannidis, Y.: The history of histograms (abridged). In: VLDB 2003 (2003)
Google Scholar
Jolliffe, I.: Principal Component Analysis. Springer, New York (1986)
Book MATH Google Scholar
Kröger, P., Kriegel, H.P., Kailing, K.: Density-connected subspace clustering for high-dimensional data. In: SDM 2004 (2004)
Google Scholar
Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures
Google Scholar
Luo, J., Zhou, X., Zhang, Y., Shen, H.T., Li, J.: Selectivity estimation by batch-query based histogram and parametric method. In: ADC 2007 (2007)
Google Scholar
Müller, E., Assent, I., Günnemann, S., Krieger, R., Seidl, T.: Relevant Subspace Clustering: mining the most interesting non-redundant concepts in high dimensional data. In: ICDM 2009 (2009)
Google Scholar
Müller, E., Günnemann, S., Assent, I., Seidl, T.: Evaluating clustering in subspace projections of high dimensional data. In: PVLDB, vol. 2(1) (2009)
Google Scholar
Muthukrishnan, S., Poosala, V., Suel, T.: On rectangular partitionings in two dimensions: Algorithms, complexity, and applications
Google Scholar
Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. In: VLDB 1997 (1997)
Google Scholar
Procopiuc, C.M., Jones, M., Agarwal, P.K., Murali, T.M.: A monte carlo algorithm for fast projective clustering. In: SIGMOD 2002 (2002)
Google Scholar
Sequeira, K., Zaki, M.J.: Schism: A new approach for interesting subspace mining. In: ICDM 2004 (2004)
Google Scholar
Srivastava, U., Haas, P., Markl, V., Kutsch, M., Tran, T.: ISOMER: Consistent histogram construction using query feedback. In: ICDE 2006 (2006)
Google Scholar
Wang, H., Sevcik, K.C.: A multi-dimensional histogram for selectivity estimation and fast approximate query answering. In: CASCON 2003 (2003)
Google Scholar
Yiu, M.L., Mamoulis, N.: Frequent-pattern based iterative projected clustering. In: ICDM 2003 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Program Structures and Data Organization (IPD), Karlsruhe Institute of Technology (KIT), Germany
Andranik Khachatryan, Emmanuel Müller, Klemens Böhm & Jonida Kopper

Authors

Andranik Khachatryan
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Müller
View author publications
You can also search for this author in PubMed Google Scholar
Klemens Böhm
View author publications
You can also search for this author in PubMed Google Scholar
Jonida Kopper
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The Evergreen State College, 98505, Olympia, WA, USA
Judith Bayard Cushing
CNRI and University of Virginia, 22908, Charlottesville, VA, USA
James French
Gonzaga University, 99258, Spokane, WA, USA
Shawn Bowers

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khachatryan, A., Müller, E., Böhm, K., Kopper, J. (2011). Efficient Selectivity Estimation by Histogram Construction Based on Subspace Clustering. In: Bayard Cushing, J., French, J., Bowers, S. (eds) Scientific and Statistical Database Management. SSDBM 2011. Lecture Notes in Computer Science, vol 6809. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22351-8_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-22351-8_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22350-1
Online ISBN: 978-3-642-22351-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient Selectivity Estimation by Histogram Construction Based on Subspace Clustering

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Selectivity estimation with density-model-based multidimensional histogram

Efficient locality-sensitive hashing over high-dimensional streaming data

A meta-heuristic density-based subspace clustering algorithm for high-dimensional data

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Efficient Selectivity Estimation by Histogram Construction Based on Subspace Clustering

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Selectivity estimation with density-model-based multidimensional histogram

Efficient locality-sensitive hashing over high-dimensional streaming data

A meta-heuristic density-based subspace clustering algorithm for high-dimensional data

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation