An Empirical Study of a New Approach to Nearest Neighbor Searching

Songrit Maneewongvatana⁶ &
David M. Mount⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2153))

Included in the following conference series:

Workshop on Algorithm Engineering and Experimentation

503 Accesses
3 Citations

Abstract

In nearest neighbor searching we are given a set of n data points in real d-dimensional space, ℜ^d, and the problem is to preprocess these points into a data structure, so that given a query point, the nearest data point to the query point can be reported efficiently. Because data sets can be quite large, we are interested in data structures that use optimal O(dn) storage.

In this paper we consider a novel approach to nearest neighbor searching, in which the search returns the correct nearest neighbor with a given probability assuming that the queries are drawn from some known distribution. The query distribution is represented by providing a set of training query points at preprocessing time.

The data structure, called the overlapped split tree, is an augmented BSP-tree in which each node is associated with a cover region, which is used to determine whether the search should visit this node. We use principal component analysis and support vector machines to analyze the structure of the data and training points in order to better adapt the tree structure to the data sets. We show empirically that this new approach provides improved predictability over the kd-tree in average query performance.

The support of the National Science Foundation under grant CCR-9712379 is gratefully acknowledged.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

S. Arya and D. M. Mount. Approximate nearest neighbor queries in fixed dimensions. In Proc. 4th ACM-SIAM Sympos. Discrete Algorithms, pages 271–280, 1993.
Google Scholar
S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Wu. An optimal algorithm for approximate nearest neighbor searching. In Proc. 5th ACM-SIAM Sympos. Discrete Algorithms, pages 573–582, 1994.
Google Scholar
S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Wu. An optimal algorithm for approximate nearest neighbor searching. Journal of the ACM, 45:891–923, 1998.
Article MATH MathSciNet Google Scholar
N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: An efficient and robust access method for points and rectangles. In Proc. ACM SIGMOD Conf. on Management of Data, pages 322–331, 1990.
Google Scholar
S. Berchtold, D. A. Keim, and H.-P. Kriegel. The X-tree: An index structure for high-dimensional data. In Proc. 22nd VLDB Conference, pages 28–39, 1996.
Google Scholar
M. Bern. Approximate closest-point queries in high dimensions. Inform. Process. Lett., 45:95–99, 1993.
Article MATH MathSciNet Google Scholar
C. J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121–167, 1998.
Article Google Scholar
T. Chan. Approximate nearest neighbor queries revisited. In Proc. 13th Annu. ACM Sympos. Comput. Geom., pages 352–358, 1997.
Google Scholar
C. Chang and C. Lin. LIBSVM: Introduction and benchmarks. LIBSVM can be obtained from URL: http://www.csie.ntu.edu.tw/~cjlin/libsvm, 1999.
P. Ciaccia and M. Patella. Using the distance distribution for approximate similarity queries in high-dimensional metric spaces. In Proc. 10th Workshop Database and Expert Systems Applications, pages 200–205, 1999.
Google Scholar
K. L. Clarkson. An algorithm for approximate closest-point queries. In Proc. 10th Annu. ACM Sympos. Comput. Geom., pages 160–164, 1994.
Google Scholar
S. Cost and S. Salzberg. A weighted nearest neighbor algorithm for learning with symbolic features. Machine Learning, 10:57–78, 1993.
Google Scholar
T. M. Cover and P. E. Hart. Nearest neighbor pattern classification. IEEE Trans. Inform. Theory, 13:57–67, 1967.
Article Google Scholar
M. de Berg, M. van Kreveld, M. Overmars, and O. Schwarzkopf. Computational Geometry: Algorithms and Applications. Springer-Verlag, 1997.
Google Scholar
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. J. Amer. Soc. Inform. Sci., 41(6):391–407, 1990.
Article Google Scholar
L. Devroye and T. J. Wagner. Nearest neighbor methods in discrimination. In P. R. Krishnaiah and L. N. Kanal, editors, Handbook of Statistics, volume 2. North-Holland, 1982.
Google Scholar
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. John Wiley & Sons, NY, 1973.
MATH Google Scholar
C. Duncan, M. Goodrich, and S. Kobourov. Balanced aspect ratio trees: Combining the advantages of k-d trees and octrees. In Proc. 10th ACM-SIAM Sympos. Discrete Algorithms, pages 300–309, 1999.
Google Scholar
Christos Faloutsos and Ibrahim Kamel. Beyond uniformity and independence: Analysis of r-trees using the concept of fractal dimension. In Proc. Annu. ACM Sympos. Principles Database Syst., pages 4–13, 1994.
Google Scholar
U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy. Advances in Knowledge Discovery and Data Mining. AAAI Press/MIT Press, 1996.
Google Scholar
M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker. Query by image and video content: The QBIC system. IEEE Computer, 28:23–32, 1995.
Google Scholar
J. H. Friedman, J. L. Bentley, and R. A. Finkel. An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Software, 3(3):209–226, 1977.
Article MATH Google Scholar
K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, 2nd edition, 1990.
Google Scholar
A. Gersho and R. M. Gray. Vector Quantization and Signal Compression. Kluwer Academic, Boston, 1992.
MATH Google Scholar
P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proc. 30th Annu. ACM Sympos. Theory Comput., pages 604–613, 1998.
Google Scholar
N. Katayama and S. Satoh. The SR-tree: An index structure for high-dimensional nearest neighbor queries. In Proc. ACM SIGMOD Conf. on Management of Data, pages 369–380, 1997.
Google Scholar
J. M. Kleinberg. Two algorithms for nearest-neighbor search in high dimension. In Proc. 29th Annu. ACM Sympos. Theory Comput., pages 599–608, 1997.
Google Scholar
E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimemsional spaces. In Proc. 30th Annu. ACM Sympos. Theory Comput., pages 614–623, 1998.
Google Scholar
K. I. Lin, H. V. Jagadish, and C. Faloutsos. The TV-tree: An index structure for high-dimensional data. VLDB Journal, 3(4):517–542, 1994.
Article Google Scholar
D. M. Mount and S. Arya. ANN: A library for approximate nearest neighbor searching. Center for Geometric Computing 2nd Annual Fall Workshop on Computational Geometry, URL: http://www.cs.umd.edu/~mount/ANN, 1997.
F. Murtagh. PC A (principal components analysis): C program. PC A program can be obtained from URL: http://astro.u-strasbg.fr/~fmurtagh/mda-sw/pca.c, 1989.
D. Saupe, 1994. Private communication.
Google Scholar
T. Sellis, N. Roussopoulos, and C. Faloutsos. The R⁺-tree: A dynamic index for multi-dimensional objects. In Proc. 13th VLDB Conference, pages 507–517, 1987.
Google Scholar
V. Vapnik. Statistical Learning Theory. John Wiley & Sons, NY, 1998.
MATH Google Scholar
K. Zatloukal, M. H. Johnson, and R. Ladner. Nearest neighbor search for data compression. (Presented at the 6th DIMACS Implementation Challenge Workshop), URL: http://www.cs.washington.edu/homes/ladner/nns.ps, 1999.

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Maryland, College Park, Maryland
Songrit Maneewongvatana & David M. Mount

Authors

Songrit Maneewongvatana
View author publications
You can also search for this author in PubMed Google Scholar
David M. Mount
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

AT&T Labs Research, 180 Park Ave., P.O. Box 971, Florham Park, NJ, 07932-0000, USA
Adam L. Buchsbaum
Department of Computer Science, The University of North Carolina at Chapel Hill, CB 3175, Sitterson Hall, Chapel Hill, NC, 27599-3175, USA
Jack Snoeyink

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maneewongvatana, S., Mount, D.M. (2001). An Empirical Study of a New Approach to Nearest Neighbor Searching. In: Buchsbaum, A.L., Snoeyink, J. (eds) Algorithm Engineering and Experimentation. ALENEX 2001. Lecture Notes in Computer Science, vol 2153. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44808-X_14

Download citation

DOI: https://doi.org/10.1007/3-540-44808-X_14
Published: 11 September 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42560-1
Online ISBN: 978-3-540-44808-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics