[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

An optimal algorithm for approximate nearest neighbor searching fixed dimensions

Published: 01 November 1998 Publication History

Abstract

Consider a set of S of n data points in real d-dimensional space, Rd, where distances are measured using any Minkowski metric. In nearest neighbor searching, we preprocess S into a data structure, so that given any query point q∈ Rd, is the closest point of S to q can be reported quickly. Given any positive real ϵ, data point p is a (1 +ϵ)-approximate nearest neighbor of q if its distance from q is within a factor of (1 + ϵ) of the distance to the true nearest neighbor. We show that it is possible to preprocess a set of n points in Rd in O(dn log n) time and O(dn) space, so that given a query point q ∈ Rd, and ϵ > 0, a (1 + ϵ)-approximate nearest neighbor of q can be computed in O(cd, ϵ log n) time, where cd,ϵd ⌈1 + 6d/ϵ⌉d is a factor depending only on dimension and ϵ. In general, we show that given an integer k ≥ 1, (1 + ϵ)-approximations to the k nearest neighbors of q can be computed in additional O(kd log n) time.

References

[1]
AGARWAL, P. K., AND MATOUSEK, J. 1993. Ray shooting and parametric search. SlAM J. Comput. 22, 4, 794-806.
[2]
ARYA, S., AND MOUNT, D.M. 1993a. Algorithms for fast vector quantization. In Proceedings of the DCC'93: Data Compression Conference. J. A. Storer and M. Cohn, eds. IEEE Press, New York, pp. 381-390.
[3]
ARYA, S., AND MOUNT, D.M. 1993b. Approximate nearest neighbor queries in fixed dimensions. In Proceedings of the 4th A CM-SIAM Symposium on Discrete Algorithms. ACM, New York, pp. 271-280.
[4]
ARYA, S., AND MOUNT, D. M. 1995. Approximate range searching. In Proceedings of the llth Annual ACM Symposium on Computational Geometry (Vancouver, B.C., Canada, June 5-7). ACM, New York, pp. 172-181.
[5]
ARYA, S., MOUNT, D., AND NARAYAN, O. 1995. Accounting for boundary effects in nearest neighbor searching. In Proceedings of the llth Annual ACM Symposium on Computational Geometry (Vancouver, B.C., Canada, June 5-7). ACM, New York, pp. 336-344.
[6]
ARYA, S., MOUNT, D. M., NETANYAHU, N., SILVERMAN, R., AND WU, A. Y. 1994. An optimal algorithm for approximate nearest neighbor searching in fixed dimensions. In Proceedings of the 5th ACM-SIAM Symposium on Discrete Algorithms. ACM, New York, pp. 573-582.
[7]
BEI, C.-D., AND GRAY, R. M. 1985. An improvement of the minimum distortion encoding algorithm for vector quantization. IEEE Trans. Commun. 33, 1132-1133.
[8]
BENTLEY, J. L., WEIDE, B. W., AND YAO, A.C. 1980. Optimal expected-time algorithms for closest point problems. A CM Trans. Math. Sofiw. 6, 4, 563-580.
[9]
BERCHTOLD, S., B()HM, C., KEIM, D. A., AND KRIEGEL, H.-P. 1997. A cost model for nearest neighbor search in high-dimensional data space. In Proceedings of the 16th Annual ACM SIGACT- SIGMOD-SIGART Symposium on Principles of Database Systems (Tucson, Az., May 12-14). ACM, New York, pp. 78-86.
[10]
BERCHTOLD, S., KEIM, D. A., AND KRIEGEL, H.-P. 1996. The X-tree: An index structure for high dimensional data. In Proceedings of the 22nd VLDB Conference. pp. 28-39.
[11]
BERN, M. 1993. Approximate closest-point queries in high dimensions. Inf. Proc. Lett. 45, 95-99.
[12]
BERN, M., EPPSTEIN, D., AND TENS, S.-H. 1993. Parallel construction of quadtrees and quality triangulations. In Proceedings of the 3rd Workshop Algorithms Data Structures. Lecture Notes in Computer Science, vol. 709. Springer-Verlag, New York, pp. 188-199.
[13]
BESPAMYATNIKH, S.N. 1995. An optimal algorithm for closest pair maintenance. In Proceedings of the llth Annual ACM Symposium on Computational Geometry (Vancouver, B.C., Canada, June 5-7). ACM, New York, pp. 152-161.
[14]
CALLAHAN, P. B., AND KOSARAJU, S.R. 1992. A decomposition of multi-dimensional point-sets with applications to k-nearest-neighbors and n-body potentional fields. In Proceedings of the 24th Annual ACM Symposium on the Theory of Computing (Vancouver, B.C., Canada, May 4-6). ACM, New York, pp. 546-556.
[15]
CALLAHAN, P. B., AND KOSARAJU, S.U. 1995. Algorithms for dynamic closest pair and n-body potential fields. In Proceedings of the 6th Annual A CM-SIAM Symposium on Discrete Algorithms (San Francisco, Calif., Jan. 22-24). ACM, New York, pp. 263-272.
[16]
CHAN, T. 1997. Approximate nearest neighbor queries revisited. In Proceedings of the 13th Annual ACM Symposium on Computational Geometry (Nice, France, June 4-6). ACM, New York, pp. 352-358.
[17]
CHAZELLE, B. 1982. A theorem on polygon cutting with applications. In Proceedings of the 23rd Annual IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., pp. 339-349.
[18]
CLARKSON, K.L. 1983. Fast algorithms for the all nearest neighbors problem. In Proceedings of the 24th Annual IEEE Symposium on the Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., pp. 226-232.
[19]
CLARKSON, K.L. 1988. A randomized algorithm for closest-point queries. SIAM J. Comput. 17, 4, 830-847.
[20]
CLARKSON, K.L. 1994. An algorithm for approximate closest-point queries. In Proceedings of the lOth Annual ACM Symposium on Computational Geometry (Stony Brook, N.Y., June 6-8). ACM, New York, pp. 160-164.
[21]
CLEARY, J.G. 1979. Analysis of an algorithm for finding nearest neighbors in Euclidean space. ACM Trans. Math. Sofiw. 5, 2 (June), 183-192.
[22]
CORMEN, T. H., LEISERSON, C. E., AND RIVEST, R.L. 1990. Introduction to Algorithms. MIT Press, Cambridge, Mass.
[23]
COST, S., AND SALZBERG, S. 1993. Aweighted nearest neighbor algorithm for learning with symbolic features. Mach. Learn. 10, 57-78.
[24]
COVER, T. M., AND HART, P.E. 1967. Nearest neighbor pattern classification. IEEE Trans. Info. Theory 13, 57-67.
[25]
DE BERG, M., VAN KREVELD, M., OVERMARS, M., AND SCHWARZKOPF, O. 1997. Computational Geometry: Algorithms and Applications. Springer-Verlag, New York.
[26]
DEERWESTER, S., DUMALS, S. T., FURNAS, G. W., LANDAUER, T. K., AND HARSHMAN, R. 1990. Indexing by latend semantic analysis. J. Amer. Soc. Inf. Sci. 41,391-407.
[27]
DEVaOYE, L., AND WAGNEa, T.J. 1982. Nearest neighbor methods in discrimination. In Handbook of Statistics, vol. 2. P. R. Krishnaiah and L. N. Kanal, eds. North-Holland, Amsterdam, The Netherlands.
[28]
DUDA, R. O., AND HAST, P.E. 1978. Pattern Classification and Scene Analysis. Wiley, New York.
[29]
EDELSBaUNNEa, H. 1987. Algorithms in Combinatorial Geometry, vol. 10 of EATCS Monographs on Theoretical Computer Science. Springer-Verlag, New York.
[30]
FARVARDIN, N., AND MODESTINO, J.W. 1985. Rate-distortion performance of DPCM schemes for sutoregressive sources. IEEE Trans. Inf. Theory 31, 3 (May), 402-418.
[31]
FAYYAD, U. M., PIATETSKY-SHAPIRO, G., SMYTH, P., AND UTHURUSAMY, R. 1996. Advances in Knowledge Discovery and Data Mining. AAAI Press/MIT Press, Cambridge, Mass.
[32]
FEDER, T., AND GREENE, D. 1988. Optimal algorithms for approximate clustering. In Proceedings of the 20th Annual ACM Symposium on Theory of Computing (Chicago, Ill., May 2-4). ACM, New York, pp. 434-444.
[33]
FLICKNER, M., SAWHNEY, H., NIBLACK, W., ASHLEY, J., HUANG, Q., DOM, B., GORKANI, M., HAFNER, J., LEE, D., PETKOVIC, D., STEELE, D., AND YANKER, P. 1995. Query by image and video content: The QBIC system. IEEE Computer 28, 23-32.
[34]
FREDERICKSON, G.N. 1985. Data structures for on-line updating of minimum spanning trees, with applications. SlAM J. Comput. 14, 781-798.
[35]
FREDERICKSON, G.N. 1993. A data structure for dynamically maintaining rooted trees. In Proceedings of the 4th Annual A CM-SIAM Symposium on Discrete Algorithms. ACM, New York, pp. 175-194.
[36]
FREDMAN, M. L., AND TARJAN, R.E. 1987. Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 34, 3 (July), 596-615.
[37]
FRIEDMAN, J. H., BENTLEY, J. L., AND FINKEL, R.A. 1977. An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3, 3 (Sept.), 209-226.
[38]
FRIEDMAN, J. H., BASKETT, F., AND SHUSTEK, L. J. 1975. An algorithm for finding nearest neighbors. IEEE Trans. Comput. C-24, 10, 1000-1006.
[39]
GALPERIN, I., AND RIVEST, R. L. 1993. Scapegoat trees. In Proceedings of the 4th ACM-SIAM Symposium on Discrete Algorithms. ACM, New York, pp. 165-174.
[40]
GERSHO, A., AND GRAY, R. M. 1991. Vector Quantization and Signal Compression. Kluwer Academic, Boston, Mass.
[41]
GUAN, L., AND KAMEL, M. 1992. Equal-average hyperplane partitioning method for vector quantization of image data. Patt. Recog. Lett. 13, 693-699.
[42]
INDYK, P., AND MOTWANI, R. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30th Annual A CM Symposium on Theory of Computing (Dallas, Tex., May 23-26). ACM, New York, pp. 604-613.
[43]
KLEINBERG, J. M. 1997. Two algorithms for nearest-neighbor search in high dimensions. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing (El Paso, Tex., May 4-6). ACM, New York, pp. 599-608.
[44]
KUSHILEVITZ, E., OSTROVSKY, R., AND RABANI, Y. 1998. Efficient search for approximate nearest neighbor in high dimensional spaces. In Proceedings of the 30th Annual ACM Symposium on the Theory of Computing (Dallas, Tex., May 23-26). ACM, New York, pp. 614-623.
[45]
LEE, C.-H., AND CHEN, L.-H. 1994. Fast closest codeword search algorithm for vector quantisation. IEEE Proc. Vis. Image Sig. Proc. 141, 143-148.
[46]
LIN, K. I., JAGDISH, H. g., AND FALOUTSOS, C. 1994. The TV-tree: An index structure for high dimensional data. VLDB J. 3, 4, 517-542.
[47]
MEISER, S. 1993. Point location on arrangements of hyperplanes. Inf. Comput. 106, 2, 286-303.
[48]
MOUNT, D. M., NETANYAHU, N., SILVERMAN, R., AND WU, A. Y. 1995. Chromatic nearest neighbor searching: A query sensitive approach. In Proceedings of the 7th Canadian Conference on Computer Geometry. (Quebec City, Que., Canada, Aug. 10-13). pp. 261-266.
[49]
PREPARATA, F. P., AND SHAMOS, M.I. 1985. Computational Geometry: An Introduction. Springer- Verlag, New York.
[50]
RIVEST, R.L. 1974. On the optimality of Elais's algorithm for performing best-match searches. In Information Processing. North-Holland Publishing Company, Amsterdam, The Netherlands, pp. 678-681.
[51]
ROUSSOPOULOS, N., KELLEY, S., AND VINCENT, F. 1995. Nearest neighbor queries. In Proceedings of the 1995 ACM SIGMOD Conference on Management of Data (San Jose, Calif., May 23-25). ACM, New York, pp. 71-79.
[52]
SAMET, n. 1990. The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading, Mass.
[53]
SCHWARZ, C., SMID, M., AND SNOEYINK, J. 1994. An optimal algorithm for the on-line closest-pair problem. Algorithmica 12, 18-29.
[54]
SLEATOR, D. D., AND TARJAN, R.E. 1983. A data structure for dynamic trees. J. Comput. Syst. Sci. 26, 362-391.
[55]
SPROULL, R.L. 1991. Refinements to nearest-neighbor searching. Algorithmica 6, 579-589.
[56]
VAIDYA, P. M. 1989. An O(n log n) algorithm for the all-nearest-neighbors problem. Disc. Comput. Geom. 4, 101-115.
[57]
WHITE, D. A., AND JAIN, R. 1996. Similarity indexing with the SS-tree. In Proceedings of the 12th IEEE International Conference on Data Engineering. IEEE Computer Society Press, Los Alamitos, Calif., pp. 516 -523.
[58]
{ref65}YAO, A. C., AND YAO, F.F. 1985. A general approach to d-dimensional queries. In Proceedings of the 17th Annual ACM Symposium on Theory of Computing (Providence, R.I., May 6-8). ACM, New York, pp. 163-168.

Cited By

View all
  • (2025)CGAP: A Hybrid Contrastive and Graph-Based Active Learning Pipeline to Detect Water and Sediment in Multispectral ImagesIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2024.349307318(446-462)Online publication date: 2025
  • (2025)Uncertainty-based multi-objective optimization in twin tunnel design considering fluid-solid couplingReliability Engineering & System Safety10.1016/j.ress.2024.110575253(110575)Online publication date: Jan-2025
  • (2024)Path Planning for Mobile Robots Based on the Improved DAPF-QRRT* StrategyElectronics10.3390/electronics1321423313:21(4233)Online publication date: 29-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of the ACM
Journal of the ACM  Volume 45, Issue 6
Nov. 1998
185 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/293347
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 1998
Published in JACM Volume 45, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. approximation algorithms
  2. box-decomposition trees
  3. closet-point queries
  4. nearest neighbor searching
  5. post-office problem
  6. priority search

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)959
  • Downloads (Last 6 weeks)107
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2025)CGAP: A Hybrid Contrastive and Graph-Based Active Learning Pipeline to Detect Water and Sediment in Multispectral ImagesIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2024.349307318(446-462)Online publication date: 2025
  • (2025)Uncertainty-based multi-objective optimization in twin tunnel design considering fluid-solid couplingReliability Engineering & System Safety10.1016/j.ress.2024.110575253(110575)Online publication date: Jan-2025
  • (2024)Path Planning for Mobile Robots Based on the Improved DAPF-QRRT* StrategyElectronics10.3390/electronics1321423313:21(4233)Online publication date: 29-Oct-2024
  • (2024)Digital Image Resemblance Using Deep LearningSSRN Electronic Journal10.2139/ssrn.4797927Online publication date: 2024
  • (2024)Playlist Search Reinvented: LLMs Behind the CurtainProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688047(813-815)Online publication date: 8-Oct-2024
  • (2024)Bridging Software-Hardware for CXL Memory Disaggregation in Billion-Scale Nearest Neighbor SearchACM Transactions on Storage10.1145/363947120:2(1-30)Online publication date: 19-Feb-2024
  • (2024)FastSimiFeat: A Fast and Generalized Approach Utilizing k-NN for Noisy Data HandlingProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679591(1143-1152)Online publication date: 21-Oct-2024
  • (2024)Automatic fault interpretation based on point cloud fitting and segmentationGeophysical Prospecting10.1111/1365-2478.13523Online publication date: 19-Apr-2024
  • (2024)Curve Segment Neighborhood-based Vector Field Exploration2024 IEEE Visualization and Visual Analytics (VIS)10.1109/VIS55277.2024.00075(326-330)Online publication date: 13-Oct-2024
  • (2024)Toward Cost-Effective Adaptive Random Testing: An Approximate Nearest Neighbor ApproachIEEE Transactions on Software Engineering10.1109/TSE.2024.337959250:5(1182-1214)Online publication date: 21-Mar-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media