More Web Proxy on the site http://driver.im/

research-article

Learning to Prune: General and Efficient Approximate Nearest Neighbor Search with Direction Navigating Graph

Authors:

Jianfeng FanAuthors Info & Claims

DSDE '22: Proceedings of the 2022 5th International Conference on Data Storage and Data Engineering

Pages 50 - 58

https://doi.org/10.1145/3528114.3528123

Published: 24 June 2022 Publication History

Abstract

The approximate k-Nearest Neighbor (kNN) graph has been proved extremely useful in approximate nearest neighbor search (ANNS). However, under the high-dimensional and large-scale circumstance, the performance of nearest neighbor search is challenging due to its high costs of distance calculation. To solve the problem, most approaches focus on pruning the edges during the phase of graph construction. In our observation, searching a given query in an even well pruned approximate kNN graph, a large proportion of edges can be cut-off during the search procedure. In this paper we propose the concept of Direction Navigating Graph (DNG) to offer an online pruning strategy combined with a heuristic search algorithm. To build the DNG we develop a neural network that project queries and vertices into a low-dimensional direction vector space. With the DNG, we can apply online edge pruning with almost all those state- of-the-art graph-based approaches. In a number of experiments on public datasets we demonstrate that our approach outperforms those original methods by a significant margin. In addition, the approach has been integrated into Alibaba's (Alibaba Group) billion-scale search applications for its superior performance.

References

[1]

AndoniandP.Indyk.2006.Near-optimalhashingalgorithmsfornearneighbor problem in high dimension. FOCS 1 (2006), 459–468. https://doi.org/10.1109/ FOCS.2006.49

[2]

P. Kumar Arora, S. Sinha and A. Bhattacharya. 2018. Hd-index: Pushing the scalability-accuracy boundary for approximate knn search in highdimensional spaces. Proceedings of the VLDB Endowment (2018). https://doi.org/10.14778/ 3204028.3204034

[3]

EB Baum and F Wilczek. 1988. Supervised learning of probability distributions by neural networks. Proceedings of the 1987 International Conference on Neural Information Processing SystemsJanuary (1988), 52–61.

[4]

M. Belkin and P. Niyogi. 2001. Laplacian eigenmaps and spectral techniques for embedding and clustering. (2001).

[5]

J. L. Bentley. 1975. Multidimensional binary search trees used for associative searching. Commun. ACM (1975). https://doi.org/10.1145/361002.361007

Digital Library

[6]

N. Gonzales D. Dearholt and G. Kurup. 1988. Monotonic search networks for computer vision databases. (1988). https://doi.org/10.1109/ACSSC.1988.754602

[7]

S. Dasgupta and Y. Freund. 2008. Random projection trees and low dimensional manifolds. Proceedings of the fortieth annual ACM symposium on Theory of computing (2008), 537–546. https://doi.org/10.1145/1374376.1374452

Digital Library

[8]

S. Dasgupta and K. Sinha. 2013. Randomized partition trees for exact nearest neighbor search. Proceedings of the 26th Annual Conference on Learning Theory (2013), 317–337.

[9]

Michael I. Jordan Eric P. Xing, Andrew Y.Ng and Stuart J. Russell. 2002. Distance metric, learning with application to clustering with side-information. Proceedings of the Conference on Advances in Neural Information Processing Systems (2002).

[10]

Andoni 2015. Practical and optimal LSH for angular distance. Proceedings of the 28th International Conference on Neural Information Processing Systems 1 (2015), 1225–1233.

[11]

Burges 2005. Learning to rank using gradient descent. Proceedings of the 22nd international conference on Machine learning (2005), 89–96. https: //doi.org/10.1145/1102351.1102363

Digital Library

[12]

Cong Fu .2019.Fast approximate nearest neighbor search with the navigating spreading-out graph. (2019). https://doi.org/10.14778/3303753.3303754

Digital Library

[13]

Datar 2004. Locality-sensitive hashing scheme based on p-stable distributions. Proceedings of the twentieth annual symposium on Computational geometry, ACM (2004).

Digital Library

[14]

Dmitry Baranchuk 2019. Learning to Route in Similarity Graphs. (2019). https://arxiv.org/abs/1905.10987

[15]

Fabien 2015. Cache locality is not enough: high-performance nearest neighbor search with product quantization fast scan. Proc. International Conference on Very Large DataBases (2015), 288–299. https://doi.org/10.14778/2856318.2856324

Digital Library

[16]

J.B. Tenenbaum 2000. A global geometric framework for nonlinear dimensionality reduction. Science290(2000),2319–2323. https://doi.org/10.1126/science.290.5500.2319

[17]

Jie Zhou 2018. Graph neural networks: A review of methods and applications. (2018). https://arxiv.org/abs/1812.08434

[18]

L. He 2018. Neurally-Guided Semantic Navigation in Knowledge Graph. IEEE Trans (2018),1–1. https://doi.org/10.1109/TBDATA.2018.2805363

[19]

McCartin-Lim 2012. Approximate principal direction trees. 29th International Conference on Machine Learning (2012).

[20]

Prosenjit Bose 2012. Proximity graphs. International Journal of Computational Geometry and Applications (2012), 439–469. https://doi.org/10.1142/S0218195912500112

[21]

Jonathan Ho and Stefano Ermon.2016. Generative adversarial imitation learning. Neural Information Processing Systems (2016), 4565–4573.

[22]

P. Indyk and Motwani. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. Proceedings of the thirtieth annual ACM symposium on Theory of computing (1998),604–613. https://doi.org/10.1145/276698.276876

Digital Library

[23]

J. Song J. Wang, H.T. Shen and J.Ji.2014. Hashing for similarity search: A survey. (2014). https://arxiv.org/abs/1408.2927

[24]

V. de Silva J.B. Tenenbaum and J.C. Langford. 2000. A global geometric framework for nonlinear dimensionality reduction. Science (2000),2319–2323. https://doi.org/10.1126/science.290.5500.2319

[25]

Richard Socher Jeffrey Pennington and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP)(2014),1532–1543. https://doi.org/10.3115/v1/D14- 1162

[26]

Diederik P. Kingma and Jimmy Lei Ba. 2015. Adam: A method for stochastic optimization. (2015). https://arxiv.org/abs/1412.6980

[27]

J. M. Kleinberg. 2000. Navigation in a small world. Nature 406 (2000), 845–845.

[28]

Eric O. Postma Laurens J.P. van der Maaten and H. Jaap van den Herik. 2009. Dimensionality reduction: a comparative review. jmlr (2009), 2579–2605.

[29]

David G. Lowe. 2004. Distinctive Image Features from Scale-Invariant Keypoints. InternationalJournalofComputerVision60(2004),91–110. https://www.cs.ubc.ca/∼lowe/papers/ijcv04.pdf

[30]

D. Krioukov M. Boguna and K. C. Claffy. 2009. Navigation in a small world. Nature Physics 5 (2009), 74–80.

[31]

Vinod Nair and Geoffrey E. Hinton. 2010. Rectified linear units improve restricted Boltzmann machines. Proceedings of the 27th International Conference on International Conference on Machine Learning (2010), 807–814.

Digital Library

[32]

Aude Oliva and Antonio Torralba. 2001. Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision 42 (2001), 145–175.

Digital Library

[33]

J.W. Sammon. 1969. A nonlinear mapping for data structure analysis. (1969), 401–499. https://doi.org/10.1109/T-C.1969.222678

Digital Library

[34]

Stuart C. Shapiro. 1987. Encyclopedia of Artificial Intelligence. Wiley-Interscience.

[35]

R. F. Sproull. 1991. Refinements to nearest-neighbor searching in k-dimensional trees. Algorithmica 6 (1991), 579–589.

Digital Library

[36]

van der Maaten LJ and Hinton GE. 2008. Visualizing data using t-SNE. Mach. Learn. Res (2008), 2579–2605.

[37]

Y. Xiao. 2018. Fast Approximate Nearest Neighbor Search via k-Diverse Nearest Neighbor Graph. Thirty-Second AAAI Conference on Artificial Intelligence (2018).

[38]

D. A. Yashunin Yu. A. Malkov1. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. (2018). https: //doi.org/10.1109/TPAMI.2018.2889473

Digital Library

Recommendations

Complementary hashing for approximate nearest neighbor search
ICCV '11: Proceedings of the 2011 International Conference on Computer Vision

Recently, hashing based Approximate Nearest Neighbor (ANN) techniques have been attracting lots of attention in computer vision. The data-dependent hashing methods, e.g., Spectral Hashing, expects better performance than the data-blind counterparts, e.g.,...
Efficient approximate nearest neighbor search with integrated binary codes
MM '11: Proceedings of the 19th ACM international conference on Multimedia

Nearest neighbor search in Euclidean space is a fundamental problem in multimedia retrieval. The difficulty of exact nearest neighbor search has led to approximate solutions that sacrifice precision for efficiency. Among such solutions, approaches that ...
Order preserving hashing for approximate nearest neighbor search
MM '13: Proceedings of the 21st ACM international conference on Multimedia

In this paper, we propose a novel method to learn similarity-preserving hash functions for approximate nearest neighbor (NN) search. The key idea is to learn hash functions by maximizing the alignment between the similarity orders computed from the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

DSDE '22: Proceedings of the 2022 5th International Conference on Data Storage and Data Engineering

February 2022

124 pages

ISBN:9781450395724

DOI:10.1145/3528114

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

DSDE 2022

DSDE 2022: 2022 the 5th International Conference on Data Storage and Data Engineering

February 25 - 27, 2022

Sanya, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
130
Total Downloads

Downloads (Last 12 months)54
Downloads (Last 6 weeks)4

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents