Abstract
Indexing high-dimensional data is a well known problem. Techniques for dimensionality reduction which map D-dimensional objects onto a d-dimensional space (d ≪ D) are often used to speedup similarity queries. In this paper we show that one can further improve query performance by initially overestimating the reduction, i.e., reducing the dimensionality of the space to D′ dimensions, where d < D′ < D, and, at query time, automatically choosing only d′, where d′ < d, dimensions to be used – that is, using only a few good dimensions after the initial reduction of the dimensionality. By incorporating this idea within a recently proposed technique, we can process range queries up to three times faster at the expense of limited storage overhead.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional space. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 420–434. Springer, Heidelberg (2000)
Agrawal, R., Faloutsos, C., Swami, A.N.: Efficient Similarity Search In Sequence Databases. In: Proc. of the 4th Intl. Conf. of Foundations of Data Organization and Algorithms, pp. 69–84 (1993)
Baeza-Yates, R., Cunto, W., Manber, U., Wu, S.: Proximity matching using fixed queries trees. In: Proc. of the 5th Symposium on Combinatorial Pattern Matching, pp. 198–212 (1994)
Berchtold, S., Keim, D.A., Kriegel, H.-P.: The X-tree: An index structure for high-dimensional data. In: Proc. of the 22nd Intl. Conf. on Very Large Databases, pp. 28–39 (1996)
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Bozkaya, T., Ozsoyoglu, M.: Distance-based indexing for high-dimensional metric spaces. In: Proc. of the 1997 ACM SIGMOD Intl. Conf. on Management of Data, pp. 357–368 (1997)
Braunmuller, B., Ester, M., Kriegel, H.-P., Sander, J.: Multiple similarity queries: A basic DBMS operation for mining in metric databases. IEEE Transactions on Knowledge and Data Engineering 13(1), 79–95 (2001)
Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: Proc. of 23rd Intl. Conf. on Very Large Data Bases, pp. 426–435 (1997)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. of the 2nd Intl. Conf. on Knowledge Discovery and Data Mining, pp. 226–231 (1996)
Santos Filho, R.F., Traina, A., Traina Jr., C., Faloutsos, C.: Similarity search without tears: The OMNI family of all-purpose access methods. In: Proc. of the 17th Intl. Conf. on Data Engineering, pp. 623–630 (2001)
Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: Proc. of ACM SIGMOD Intl. Conf. on Management of Data, pp. 47–57 (1984)
Traina Jr., C., Traina, A., Faloutsos, C., Seeger, B.: Fast indexing and visualization for metric data sets using slim-trees. IEEE Transactions on Knowledge and Data Engineering 14(2), 244–260 (2002)
Katayama, N., Satoh, S.: The SR-tree: An index structure for high-dimensional nearest neighbor queries. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, pp. 369–380 (1997)
Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machinesfor remote protein homology detection. In: Proc. of the 6th Intl. Conf. on Research in Computational Molecular Biology, pp. 225–232 (2002)
Lu, G.: Multimedia Database Management Systems. Artech House, Norwood (1999)
Sakurai, Y., Yoshikawa, M., Uemura, S., Kojima, H.: The A-tree: An index structure for high-dimensional spaces using relative approximation. In: Proc. of the 26th Intl. Conf. on Very Large Data Bases, pp. 516–526 (2000)
Stehling, R.O., Nascimento, M.A., Falcão, A.X.: MiCRoM: A metric distance to compare segmented images. In: Proc. of the 2002 Visual Information Systems Conf., pp. 12–23 (2002)
Stehling, R.O., Nascimento, M.A., Falcão, A.X.: A compact and efficient image retrieval approach based on border/interior pixel classification. In: Proc. of the 2002 ACM Intl. Conf. on Information and Knowledge Management, pp. 102–109 (2002)
Swain, M.J., Ballard, D.H.: Color indexing. Intl. Journal of Computer Vision 7(1), 11–32 (1991)
Weber, R., Schek, H.-J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proc. of the 24th Intl. Conf. on Very Large Databases, pp. 194–205 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Digout, C., Nascimento, M.A., Coman, A. (2004). Similarity Search and Dimensionality Reduction: Not All Dimensions Are Equally Useful. In: Lee, Y., Li, J., Whang, KY., Lee, D. (eds) Database Systems for Advanced Applications. DASFAA 2004. Lecture Notes in Computer Science, vol 2973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24571-1_73
Download citation
DOI: https://doi.org/10.1007/978-3-540-24571-1_73
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21047-4
Online ISBN: 978-3-540-24571-1
eBook Packages: Springer Book Archive