[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2600212.2600710acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
short-paper

Data filtering for scalable high-dimensional k-NN search on multicore systems

Published: 23 June 2014 Publication History

Abstract

K Nearest Neighbors (k-NN) search is a widely used category of algorithms with applications in domains such as computer vision and machine learning. With the rapidly increasing amount of data available, and their high dimensionality, k-NN algorithms scale poorly on multicore systems because they hit a memory wall. In this paper, we propose a novel data filtering strategy, named Subspace Clustering for Filtering (SCF), for k-NN search algorithms on multicore platforms. By excluding unlikely features in k-NN search, this strategy can reduce memory footprint as well as computation. Experimental results on four k-NN algorithms show that SCF can improve their performance on two modern multicore platforms with insignificant loss of search precision.

References

[1]
K. Bache and M. Lichman. UCI machine learning repository, 2013.
[2]
G. Bradski and A. Kaehler. Learning OpenCV: Computer vision with the OpenCV library. O'Reilly Media, Incorporated, 2008.
[3]
J. Buhler. Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics, 17(5):419--428, 2001.
[4]
L. Cayton. Accelerating nearest neighbour search on manycore systems. In IEEE Int. Parallel and Distributed Processing Symposium (IPDPS), 2012.
[5]
D. L. Donoho et al. High-dimensional data analysis: The curses and blessings of dimensionality. AMS Math Challenges Lecture, pages 1--32, 2000.
[6]
V. Garcia, E. Debreuve, F. Nielsen, and M. Barlaud. K-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching. In Image Processing (ICIP), 2010 17th IEEE International Conference on, pages 3757--3760, 2010.
[7]
A. Heinecke, K. Vaidyanathan, M. Smelyanskiy, A. Kobotov, R. Dubtsov, G. Henry, A. G. Shet, G. Chrysos, and P. Dubey. Design and implementation of the Linpack benchmark for single and multi-node systems based on Intel Xeon Phi coprocessor. Parallel and Distributed Processing Symposium, International, 0:126--137, 2013.
[8]
X. Liu, M. Smelyanskiy, E. Chow, and P. Dubey. Efficient sparse matrix-vector multiplication on x86-based many-core processors. In Proceedings of the 27th international ACM conference on International conference on supercomputing, ICS '13, pages 273--282, New York, NY, USA, 2013. ACM.
[9]
D. Lowe. Object recognition from local scale-invariant features. In Computer Vision The Proceedings of the Seventh IEEE International Conference on, volume 2, pages 1150--1157 vol.2, 1999.
[10]
J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, and A. H. Byers. Big data: The next frontier for innovation, competition, and productivity. Technical report, McKinsey Global Institute, 2011.
[11]
M. Muja and D. G. Lowe. Fast approximate nearest neighbors with automatic algorithm configuration. In International Conference on Computer Vision Theory and Application VISSAPP'09), pages 331--340. INSTICC Press, 2009.
[12]
S. Ramaswamy and K. Rose. Adaptive cluster distance bounding for high-dimensional indexing. Knowledge and Data Engineering, IEEE Transactions on, 23(6):815--830, 2011.
[13]
X. Tang, S. Mills, D. Eyers, K.-C. Leung, Z. Huang, and M. Guo. Performance bottlenecks in manycore systems: A case study on large scale feature matching within image collections. In Proceedings of the 15th IEEE International Conference on High Performance Computing and Communications, 2013. to appear.
[14]
A. Torralba, R. Fergus, and W. T. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(11):1958--1970, 2008.
[15]
R. Weber and K. Böhm. Trading quality for time with nearest-neighbor search. In C. Zaniolo, P. Lockemann, M. Scholl, and T. Grust, editors, Advances in Database Technology (EDBT), volume 1777 of Lecture Notes in Computer Science, pages 21--35. Springer Berlin Heidelberg, 2000.
[16]
C. Zanchettin, B. L. D. Bezerra, and W. W. Azevedo. A KNN-SVN hybrid model for cursive handwriting recognition. In Proceedings of the International Joint Conference on Neural Networks, 2012.

Cited By

View all
  • (2018)Principal Component Analysis Based Filtering for Scalable, High Precision k-NN SearchIEEE Transactions on Computers10.1109/TC.2017.274813167:2(252-267)Online publication date: 1-Feb-2018
  • (2016)Manila: Using a Densely Populated PMC-Space for Power Modelling within Large-Scale Systems2016 45th International Conference on Parallel Processing Workshops (ICPPW)10.1109/ICPPW.2016.41(210-219)Online publication date: Aug-2016
  • (2016)PCAF: Scalable, High Precision k-NN Search Using Principal Component Analysis Based Filtering2016 45th International Conference on Parallel Processing (ICPP)10.1109/ICPP.2016.79(638-647)Online publication date: Aug-2016
  • Show More Cited By

Index Terms

  1. Data filtering for scalable high-dimensional k-NN search on multicore systems

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    HPDC '14: Proceedings of the 23rd international symposium on High-performance parallel and distributed computing
    June 2014
    334 pages
    ISBN:9781450327497
    DOI:10.1145/2600212
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 June 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. high-dimensional space
    2. k nearest neighbors
    3. memory wall
    4. multicore systems
    5. subspace clustering for filtering.

    Qualifiers

    • Short-paper

    Conference

    HPDC'14
    Sponsor:

    Acceptance Rates

    HPDC '14 Paper Acceptance Rate 21 of 130 submissions, 16%;
    Overall Acceptance Rate 166 of 966 submissions, 17%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 11 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Principal Component Analysis Based Filtering for Scalable, High Precision k-NN SearchIEEE Transactions on Computers10.1109/TC.2017.274813167:2(252-267)Online publication date: 1-Feb-2018
    • (2016)Manila: Using a Densely Populated PMC-Space for Power Modelling within Large-Scale Systems2016 45th International Conference on Parallel Processing Workshops (ICPPW)10.1109/ICPPW.2016.41(210-219)Online publication date: Aug-2016
    • (2016)PCAF: Scalable, High Precision k-NN Search Using Principal Component Analysis Based Filtering2016 45th International Conference on Parallel Processing (ICPP)10.1109/ICPP.2016.79(638-647)Online publication date: Aug-2016
    • (2015)Scalable Multicore k-NN Search via Subspace Clustering for FilteringIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2014.237275526:12(3449-3460)Online publication date: 1-Dec-2015
    • (2015)Efficient Selection Algorithm for Fast k-NN Search on GPUsProceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium10.1109/IPDPS.2015.115(397-406)Online publication date: 25-May-2015

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media