[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/645503.656271guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

When Is ''Nearest Neighbor'' Meaningful?

Published: 10 January 1999 Publication History

Abstract

We explore the effect of dimensionality on the "nearest neighbor" problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective, we present empirical results on both real and synthetic data sets that demonstrate that this effect can occur for as few as 10-15 dimensions.
These results should not be interpreted to mean that high-dimensional indexing is never meaningful; we illustrate this point by identifying some high-dimensional workloads for which this effect does not occur. However, our results do emphasize that the methodology used almost universally in the database literature to evaluate high-dimensional indexing techniques is flawed, and should be modified. In particular, most such techniques proposed in the literature are not evaluated versus simple linear scan, and are evaluated over workloads for which nearest neighbor is not meaningful. Often, even the reported experiments, when analyzed carefully, show that linear scan would outperform the techniques being proposed on the workloads studied in high (10-15) dimensionality!

References

[1]
Agrawal, R., Faloutsos, C., Swami, A.: Efficient Similarity Search in Sequence Databases. In Proc. 4th Inter. Conf. on FODO (1993) 69-84
[2]
Altschul, S.F., Gish, W., Miller, W., Myers, E., Lipman, D.J.: Basic Local Alignment Search Tool. In Journal of Molecular Biology, Vol. 215 (1990) 403-410
[3]
Ang, Y.H., Li, Z., Ong, S.H.: Image retrieval based on multidimensional feature properties. In SPIE, Vol. 2420 (1995) 47-57
[4]
Arya, S.: Nearest Neighbor Searching and Applications. Ph.D. thesis, Univ. of Maryland at College Park (1995)
[5]
Arya, S., Mount, D.M., Narayan, O.: Accounting for Boundary Effects in Nearest Neighbors Searching. In Proc. 11th ACM Symposium on Computational Geometry (1995) 336-344
[6]
Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.: An Optimal Algorithm for Nearest Neighbor Searching. In Proc. 5th ACM SIAM Symposium on Discrete Algorithms (1994) 573-582
[7]
Bellman, R.E.: Adaptive Control Processes. Princeton University Press (1961)
[8]
Belussi, A., Faloutsos, C.: Estimating the Selectivity of Spatial Queries Using the 'Correlation' Fractal Dimension. In Proc. VLDB (1995) 299-310
[9]
Bentley, J.L., Weide, B.W., Yao, A.C.: Optimal Expected-time Algorithms for Closest Point Problem", In ACM Transactions on Mathematical Software, Vol. 6, No. 4 (1980) 563-580
[10]
Berchtold, S., Böhm, C., Braunmüller, B., Keim, D.A., Kriegel, H.-P.: Fast Parallel Similarity Search in Multimedia Databases. In Proc. ACM SIGMOD Int. Conf. on Management of Data (1997) 1-12
[11]
Berchtold, S., Böhm, C., B., Keim, D.A., Kriegel, H.-P.: A Cost Model for Nearest Neighbor Search in High-Dimensional Data Space. In Proc. 16th ACM SIGACT-SIGMOD-SIGART Symposium on PODS (1997) 78-86
[12]
Bern, M.: Approximate Closest Point Queries in High Dimensions. In Information Processing Letters, Vol. 45 (1993) 95-99
[13]
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When Is Nearest Neighbors Meaningful? Technical Report No. TR1377, Computer Sciences Dept., Univ. of Wisconsin-Madison, June 1998
[14]
Bozkaya, T., Ozsoyoglu, M.: Distance-Based Indexing for High-Dimensional Metric Spaces. In Proc. 16th ACM SIGACT-SIGMOD-SIGART Symposium on PODS (1997) 357-368
[15]
Faloutsos, C., et al: Efficient and Effective Querying by Image Content. In Journal of Intelligent Information Systems, Vol. 3, No. 3 (1994) 231-262
[16]
Faloutsos, C., Gaede, V.: Analysis of n-Dimensional Quadtrees Using the Housdorff Fractal Dimension. In Proc. ACM SIGMOD Int. Conf. of the Management of Data (1996)
[17]
Faloutsos, C., Kamel, I.: Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension. In Proc. 13th ACM SIGACT-SIGMOD-SIGART Symposium on PODS (1994) 4-13
[18]
Fayyad, U.M., Smyth, P.: Automated Analysis and Exploration of Image Databases: Results, Progress and Challenges. In Journal of intelligent information systems, Vol. 4, No. 1 (1995) 7-25
[19]
Katayama, N., Satoh, S.: The SR-tree: An Index Structure for High-Dimensional Nearest Neighbor Queries. In Proc. 16th ACM SIGACT-SIGMOD-SIGART Symposium on PODS (1997) 369-380
[20]
Lin, K.-I., Jagadish, H.V., Faloutsos, C.: The TV-Tree: An Index Structure for High-Dimensional Data. In VLDB Journal, Vol. 3, No. 4 (1994) 517-542
[21]
Manjunath, B.S., Ma, W.Y.: Texture Features for Browsing and Retrieval of Image Data. In IEEE Trans. on Pattern Analysis and Machine Learning, Vol. 18, No. 8 (1996) 837-842
[22]
Mehrotra, R., Gary, J.E.: Feature-Based Retrieval of Similar Shapes. In 9th Data Engineering Conference (1992) 108-115
[23]
Murase, H., Nayar, S.K.: Visual Learning and Recognition of 3D Objects from Appearance. In Int. J. of Computer Vision, Vol. 14, No. 1 (1995) 5-24
[24]
Nene, S.A., Nayar, S.K.: A Simple Algorithm for Nearest Neighbor Search in High Dimensions. In IEEE Trans. on Pattern Analysis and Machine Learning, Vol. 18, No. 8 (1996) 989-1003
[25]
Pentland, A., Picard, R.W., Scalroff, S.: Photobook: Tools for Content Based Manipulation of Image Databases. In SPIE Vol. 2185 (1994) 34-47
[26]
Scott, D.W.: Multivariate Density Estimation. Wiley Interscience, Chapter 2 (1992)
[27]
Shaft, U., Goldstein, J., Beyer, K.: Nearest Neighbors Query Performance for Unstable Distributions. Technical Report No. TR1388, Computer Sciences Dept., Univ. of Wisconsin-Madison, October 1998
[28]
Swain, M.J., Ballard D.H.: Color Indexing. In Inter. Journal of Computer Vision, Vol. 7, No. 1 (1991) 11-32
[29]
Swets, D.L., Weng, J.: Using Discriminant Eigenfeatures for Image Retrieval. In IEEE Trans. on Pattern Analysis and Machine Learning, Vol. 18, No. 8 (1996) 831-836
[30]
Taubin, G., Cooper, D.B.: Recognition and Positioning of Rigid Objects Using Algebraic Moment Invariants. In SPIE, Vol. 1570 (1991) 318-327
[31]
White, D.A., Jain, R.: Similarity Indexing with the SS-Tree. In ICDE (1996) 516- 523

Cited By

View all
  • (2024)Defending against Adversarial Patches using Dimensionality ReductionProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3656501(1-6)Online publication date: 23-Jun-2024
  • (2023)Unveiling the latent space geometry of push-forward generative modelsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618996(14422-14444)Online publication date: 23-Jul-2023
  • (2023)Hyperdimensional Consumer Pattern Analysis with Quantum Neural Architectures using Non-Hermitian OperatorsProceedings of the 5th International Conference on Information Management & Machine Intelligence10.1145/3647444.3652458(1-5)Online publication date: 23-Nov-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICDT '99: Proceedings of the 7th International Conference on Database Theory
January 1999
487 pages
ISBN:3540654526

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 10 January 1999

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Defending against Adversarial Patches using Dimensionality ReductionProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3656501(1-6)Online publication date: 23-Jun-2024
  • (2023)Unveiling the latent space geometry of push-forward generative modelsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618996(14422-14444)Online publication date: 23-Jul-2023
  • (2023)Hyperdimensional Consumer Pattern Analysis with Quantum Neural Architectures using Non-Hermitian OperatorsProceedings of the 5th International Conference on Information Management & Machine Intelligence10.1145/3647444.3652458(1-5)Online publication date: 23-Nov-2023
  • (2022)Learning contrastive embedding in low-dimensional spaceProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600729(6345-6357)Online publication date: 28-Nov-2022
  • (2022)RTNNProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508409(76-89)Online publication date: 2-Apr-2022
  • (2022)Efficient 3D Spatial Queries for Complex ObjectsACM Transactions on Spatial Algorithms and Systems10.1145/35022218:2(1-26)Online publication date: 12-Feb-2022
  • (2021)New trends in high-D vector similarity searchProceedings of the VLDB Endowment10.14778/3476311.347640714:12(3198-3201)Online publication date: 28-Oct-2021
  • (2021)When is Nearest Neighbor MeaningfulProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482219(3103-3106)Online publication date: 26-Oct-2021
  • (2021)BR-NSProceedings of the Genetic and Evolutionary Computation Conference10.1145/3449639.3459303(172-179)Online publication date: 26-Jun-2021
  • (2020)Simple and scalable sparse k-means clustering via feature rankingProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496575(10148-10160)Online publication date: 6-Dec-2020
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media