More Web Proxy on the site http://driver.im/

research-article

Geometry-oblivious FMM for compressing dense SPD matrices

Authors:

George BirosAuthors Info & Claims

SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Article No.: 53, Pages 1 - 14

https://doi.org/10.1145/3126908.3126921

Published: 12 November 2017 Publication History

Abstract

We present GOFMM (geometry-oblivious FMM), a novel method that creates a hierarchical low-rank approximation, or "compression," of an arbitrary dense symmetric positive definite (SPD) matrix. For many applications, GOFMM enables an approximate matrix-vector multiplication in N log N or even N time, where N is the matrix size. Compression requires N log N storage and work. In general, our scheme belongs to the family of hierarchical matrix approximation methods. In particular, it generalizes the fast multipole method (FMM) to a purely algebraic setting by only requiring the ability to sample matrix entries. Neither geometric information (i.e., point coordinates) nor knowledge of how the matrix entries have been generated is required, thus the term "geometry-oblivious." Also, we introduce a shared-memory parallel scheme for hierarchical matrix computations that reduces synchronization barriers. We present results on the Intel Knights Landing and Haswell architectures, and on the NVIDIA Pascal architecture for a variety of matrices.

References

[1]

Emmanuel Agullo, Eric Darve, Luc Giraud, and Yuval Harness. 2016. Nearly optimal fast preconditioning of symmetric positive definite matrices. Ph.D. Dissertation. Inria Bordeaux Sud-Ouest.

[2]

Sivaram Ambikasaran. 2013. Fast Algorithms for Dense Numerical Linear Algebra and Applications. Ph.D. Dissertation. Stanford University.

[3]

Sivaram Ambikasaran and Eric Darve. 2013. An O(N log N) Fast Direct Solver for Partial Hierarchically Semi-Separable Matrices. Journal of Scientific Computing 57, 3 (2013), 477--501.

Digital Library

[4]

Mario Bebendorf. 2008. Hierarchical matrices. Springer.

[5]

Mario Bebendorf and Sergej Rjasanow. 2003. Adaptive low-rank approximation of collocation matrices. Computing 70, 1 (2003), 1--24.

Digital Library

[6]

Michele Benzi, Gene H Golub, and Jörg Liesen. 2005. Numerical solution of saddle point problems. Acta numerica 14, 1 (2005), 1--137.

[7]

Nicola Cancedda, Eric Gaussier, Cyril Goutte, and Jean Michel Renders. 2003. Word Sequence Kernels. Journal of Machine Learning Research 3 (March 2003), 1059--1082. htttp://dl.acm.org/citation.cfm?id=944919.944963

Digital Library

[8]

S. Chandrasekaran, P. Dewilde, M. Gu, and N. Somasunderam. 2010. On the Numerical Rank of the Off-Diagonal Blocks of Schur Complements of Discretized Elliptic PDEs. SIAM J. Matrix Anal. Appl. 31, 5 (2010), 2261--2290. //

Digital Library

[9]

Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2, 3 (2011), 27.

Digital Library

[10]

H. Cheng, Leslie Greengard, and Vladimir Rokhlin. 1999. A Fast Adaptive Multipole Algorithm in Three Dimensions. J. Comput. Phys. 155 (1999), 468--498.

Digital Library

[11]

D Yu Chenhan, William B March, and George Biros. 2017. An N log N Parallel Fast Direct Solver for Kernel Matrices. In Parallel and Distributed Processing Symposium (IPDPS), 2017 IEEE International. IEEE, 886--896.

[12]

P. Coulier, H. Pouransari, and E. Darve. 2016. The inverse fast multipole method: using a fast approximate direct solver as a preconditioner for dense linear systems. ArXiv e-prints (2016). arXiv:1508.01835

[13]

Ryan R. Curtin, James R. Cline, Neil P. Slagle, William B. March, P. Ram, Nishant A. Mehta, and Alexander G. Gray. 2013. MLPACK: A Scalable C++ Machine Learning Library. Journal of Machine Learning Research 14 (2013), 801--805.

Digital Library

[14]

S. Dasgupta and Y. Freund. 2008. Random projection trees and low dimensional manifolds. In Proceedings of the 40th annual ACM symposium on Theory of computing. ACM, 537--546.

Digital Library

[15]

William Fong and Eric Darve. 2009. The black-box fast multipole method. J. Comput. Phys. 228, 23 (2009), 8712--8725.

Digital Library

[16]

Pieter Ghysels, Xiaoye S. Li, FranÃğois-Henry Rouet, Samuel Williams, and Artem Napov. 2016. An Efficient Multicore Implementation of a Novel HSS-Structured Multifrontal Solver Using Randomized Sampling. SIAM Journal on Scientific Computing 38, 5 (2016), S358--S384.

Digital Library

[17]

Lars Grasedyck, Ronald Kriemann, and Sabine Le Borne. 2008. Parallel black box-LU preconditioning for elliptic boundary value problems. Computing and visualization in science 11, 4 (2008), 273--291.

[18]

A.G. Gray and A.W. Moore. 2001. N-body problems in statistical learning. Advances in neural information processing systems (2001), 521--527.

Digital Library

[19]

Leslie Greengard, Denis Gueyffier, Per-Gunnar Martinsson, and Vladimir Rokhlin. 2009. Fast direct solvers for integral equations in complex three-dimensional domains. Acta Numerica 18, 1 (2009), 243--275.

[20]

Greengard, L. 1994. Fast Algorithms For Classical Physics. Science 265, 5174 (1994), 909--914.

[21]

Wolfgang Hackbusch. 2015. Hierarchical Matrices: Algorithms and Analysis (1 ed.). Springer-Verlag Berlin Heidelberg.

Digital Library

[22]

N. Halko, P.-G. Martinsson, and J.A. Tropp. 2011. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53 (2011), 217--288.

Digital Library

[23]

Kenneth L Ho and Leslie Greengard. 2012. A fast direct solver for structured linear systems by recursive skeletonization. SIAM Journal on Scientific Computing 34, 5 (2012), A2507--A2532.

[24]

Thomas Hofmann, Bernhard Schölkopf, and Alexander J Smola. 2008. Kernel methods in machine learning. The annals of statistics (2008), 1171--1220.

[25]

George Karypis and Vipin Kumar. 1998. Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed computing 48, 1 (1998), 96--129.

Digital Library

[26]

Risi Imre Kondor and John Lafferty. 2002. Diffusion kernels on graphs and other discrete input spaces. In ICML, Vol. 2. 315--322.

Digital Library

[27]

Edo Liberty, Franco Woolfe, Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert. 2007. Randomized algorithms for the low-rank approximation of matrices. Proceedings of the National Academy of Sciences 104, 51 (2007), 20167--20172.

[28]

M. Lichman. 2013. UCI Machine Learning Repository. (2013). http://archive.ics.uci.edu/ml

[29]

M.W. Mahoney and P. Drineas. 2009. CUR matrix decompositions for improved data analysis. Proceedings of the National Academy of Sciences 106, 3 (2009), 697.

[30]

William B. March, Bo Xiao, Sameer Tharakan, Chenhan D. Yu, and George Biros. 2015. A kernel-independent FMM in general dimensions. In Proceedings of SC15 (The SCxy Conference series). ACM/IEEE, Austin, Texas.

Digital Library

[31]

William B. March, Bo Xiao, Sameer Tharakan, Chenhan D. Yu, and George Biros. 2015. Robust treecode approximation for kernel machines. In Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Sydney, Australia, 1--10.

Digital Library

[32]

William B. March, Bo Xiao, Chenhan Yu, and George Biros. 2015. An algebraic parallel treecode in arbitrary dimensions. In Proceedings of IPDPS 2015 (29th IEEE International Parallel and Distributed Computing Symposium). Hyderabad, India.

Digital Library

[33]

William B. March, Bo Xiao, Chenhan D. Yu, and George Biros. 2016. ASKIT: An Efficient, Parallel Library for High-Dimensional Kernel Summations. SIAM Journal on Scientific Computing 38, 5 (2016), S720--S749.

Digital Library

[34]

Per-Gunnar Martinsson. 2016. Compressing Rank-Structured Matrices via Randomized Sampling. SIAM Journal on Scientific Computing 38, 4 (2016), A1959--A1986.

[35]

Per-Gunnar Martinsson and Vladimir Rokhlin. 2007. An accelerated kernel-independent fast multipole method in one dimension. SIAM Journal on Scientific Computing 29, 3 (2007), 1160--1178.

Digital Library

[36]

P.-G. Martinsson, V. Rokhlin, and M. Tygert. 2010. A randomized algorithm for the decomposition of matrices. Applied and Computational Harmonic Analysis (2010).

[37]

K. R. Muske and J. W. Howse. 2001. A Lagrangian Method for Simultaneous Nonlinear Model Predictive Control. In Large-Scale PDE-constrained Optimization: State-of-the-Art, L.T. Biegler, O. Ghattas, M. Heinkenschloss, and B. van Bloemen Waanders (Eds.). Springer-Verlag.

[38]

François-Henry Rouet, Xiaoye S. Li, Pieter Ghysels, and Artem Napov. 2016. A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization. ACM Transactions in Mathematical Software 42, 4, Article 27 (June 2016), 35 pages.

Digital Library

[39]

Bernhard Schölkopf and Alexander J Smola. 2002. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press.

[40]

Avinash Sodani and others. 2016. Knights Landing: Second-Generation Intel Xeon Phi Product. IEEE Micro 36, 2 (2016), 34--46.

Digital Library

[41]

Haluk Topcuoglu, Salim Hariri, and Min-you Wu. 2002. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE transactions on parallel and distributed systems 13, 3 (2002), 260--274.

Digital Library

[42]

Christopher Williams and Matthias Seeger. 2001. Using the Nyström method to speed up kernel machines. In Proceedings of the 14th Annual Conference on Neural Information Processing Systems. 682--688.

Digital Library

[43]

Jianlin Xia, Shivkumar Chandrasekaran, Ming Gu, and Xiaoye S Li. 2010. Fast algorithms for hierarchically semiseparable matrices. Numerical Linear Algebra with Applications 17, 6 (2010), 953--976.

[44]

Bo Xiao and George Biros. 2016. Parallel Algorithms for Nearest Neighbor Search Problems in High Dimensions. SIAM Journal on Scientific Computing 38, 5 (2016), S667--S699.

Digital Library

[45]

Lexing Ying, George Biros, and Denis Zorin. 2004. A kernel-independent adaptive fast multipole method in two and three dimensions. J. Comput. Phys. 196, 2 (2004), 591--626.

Digital Library

[46]

Rio Yokota, Huda Ibeid, and David Keyes. 2016. Fast Multipole Method as a Matrix-Free Hierarchical Low-Rank Approximation. Computing Research Repository abs/1602.02244 (2016). http://arxiv.org/abs/1602.02244

[47]

Chenhan D Yu, Jianyu Huang, Woody Austin, Bo Xiao, and George Biros. 2015. Performance optimization for the k-nearest neighbors kernel on x86 architectures. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 7.

Digital Library

[48]

Chenhan D. Yu, William B. March, Bo Xiao, and George Biros. 2016. INV-ASKIT: A Parallel Fast Direct Solver for Kernel Matrices. In Proceedings of the IPDPS16. Chicago, USA.

Cited By

Harbrecht HMulterer MSchenk OSchwab C(2024)Multiresolution kernel matrix algebraNumerische Mathematik10.1007/s00211-024-01409-8Online publication date: 9-May-2024
https://doi.org/10.1007/s00211-024-01409-8
Gaddameedi KReiz SNeckel TBungartz H(2024)Efficient and Scalable Kernel Matrix Approximations Using Hierarchical DecompositionIntelligent Computers, Algorithms, and Applications10.1007/978-981-97-0065-3_1(3-16)Online publication date: 28-Jan-2024
https://doi.org/10.1007/978-981-97-0065-3_1
Kang MKatzfuss M(2023)Correlation-based sparse inverse Cholesky factorization for fast Gaussian-process inferenceStatistics and Computing10.1007/s11222-023-10231-533:3Online publication date: 28-Mar-2023
https://dl.acm.org/doi/10.1007/s11222-023-10231-5
Show More Cited By

Index Terms

Recommendations

Compressing Rank-Structured Matrices via Randomized Sampling

Randomized sampling has recently been proven a highly efficient technique for computing approximate factorizations of matrices that have low numerical rank. This paper describes an extension of such techniques to a wider class of matrices that are not ...
Distributed-memory hierarchical compression of dense SPD matrices
SC '18: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis

We present a distributed memory algorithm for the hierarchical compression of symmetric positive definite (SPD) matrices. Our method is based on GOFMM, an algorithm that appeared in doi:10.1145/3126908.3126921. For many SPD matrices, GOFMM enables ...
Robust and Effective eSIF Preconditioning for General Dense SPD Matrices

We propose an unconditionally robust and highly effective preconditioner for general dense symmetric positive definite matrices based on structured incomplete factorization (SIF), called the enhanced SIF (eSIF) preconditioner. The original SIF strategy ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

November 2017

801 pages

ISBN:9781450351140

DOI:10.1145/3126908

General Chair:
Bernd Mohr
Jülich Supercomputing Center, Jülich, Germany
,
Program Chair:
Padma Raghavan
Vanderbilt University, Nashville, TN

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SC '17

Sponsor:

SIGHPC

SC '17: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 12 - 17, 2017

Colorado, Denver

Acceptance Rates

SC '17 Paper Acceptance Rate 61 of 327 submissions, 19%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
287
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)0

Reflects downloads up to 07 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Harbrecht HMulterer MSchenk OSchwab C(2024)Multiresolution kernel matrix algebraNumerische Mathematik10.1007/s00211-024-01409-8Online publication date: 9-May-2024
https://doi.org/10.1007/s00211-024-01409-8
Gaddameedi KReiz SNeckel TBungartz H(2024)Efficient and Scalable Kernel Matrix Approximations Using Hierarchical DecompositionIntelligent Computers, Algorithms, and Applications10.1007/978-981-97-0065-3_1(3-16)Online publication date: 28-Jan-2024
https://doi.org/10.1007/978-981-97-0065-3_1
Kang MKatzfuss M(2023)Correlation-based sparse inverse Cholesky factorization for fast Gaussian-process inferenceStatistics and Computing10.1007/s11222-023-10231-533:3Online publication date: 28-Mar-2023
https://dl.acm.org/doi/10.1007/s11222-023-10231-5
Chen CMartinsson P(2022)Solving Linear Systems on a GPU with Hierarchically Off-Diagonal Low-Rank ApproximationsSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00089(1-15)Online publication date: Nov-2022
https://doi.org/10.1109/SC41404.2022.00089
Kriemann RLtaief HLuong MPérez FIm HKeyes D(2022)High-Performance Spatial Data Compression for Scientific ApplicationsEuro-Par 2022: Parallel Processing10.1007/978-3-031-12597-3_25(403-418)Online publication date: 22-Aug-2022
https://dl.acm.org/doi/10.1007/978-3-031-12597-3_25
Liu BCheshmi KSoori SStrout MDehnavi MGupta RShen X(2020)MatRoxProceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3332466.3374548(389-402)Online publication date: 19-Feb-2020
https://dl.acm.org/doi/10.1145/3332466.3374548
Chavez GLiu YGhysels PLi XRebrova E(2020)Scalable and Memory-Efficient Kernel Ridge Regression2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS47924.2020.00102(956-965)Online publication date: May-2020
https://doi.org/10.1109/IPDPS47924.2020.00102
Yamazaki IIda AYokota RDongarra J(2019)Distributed-memory lattice H-matrix factorizationThe International Journal of High Performance Computing Applications10.1177/1094342019861139(109434201986113)Online publication date: Aug-2019
https://doi.org/10.1177/1094342019861139
Yu CReiz SBiros G(2019)Distributed O(N) Linear Solver for Dense Symmetric Hierarchical Semi-Separable Matrices2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC.2019.00008(1-8)Online publication date: Oct-2019
https://doi.org/10.1109/MCSoC.2019.00008
Yu CReiz SBiros G(2018)Distributed-memory hierarchical compression of dense SPD matricesProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291676(1-15)Online publication date: 11-Nov-2018
https://dl.acm.org/doi/10.5555/3291656.3291676
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents