More Web Proxy on the site http://driver.im/

research-article

Exploiting Symmetry in Tensors for High Performance: : Multiplication with Symmetric Tensors

Authors:

Martin D. Schatz,

Robert A. van de Geijn,

Tamara G. KoldaAuthors Info & Claims

SIAM Journal on Scientific Computing, Volume 36, Issue 5

Pages C453 - C479

https://doi.org/10.1137/130907215

Published: 01 January 2014 Publication History

Abstract

Symmetric tensor operations arise in a wide variety of computations. However, the benefits of exploiting symmetry in order to reduce storage and computation is in conflict with a desire to simplify memory access patterns. In this paper, we propose a blocked data structure (blocked compact symmetric storage) wherein we consider the tensor by blocks and store only the unique blocks of a symmetric tensor. We propose an algorithm by blocks, already shown of benefit for matrix computations, that exploits this storage format by utilizing a series of temporary tensors to avoid redundant computation. Further, partial symmetry within temporaries is exploited to further avoid redundant storage and redundant computation. A detailed analysis shows that, relative to storing and computing with tensors without taking advantage of symmetry and partial symmetry, storage requirements are reduced by a factor of $ O( m! )$ and computational requirements by a factor of $O( (m+1)!/2^m )$, where $ m $ is the order of the tensor. However, as the analysis shows, care must be taken in choosing the correct block size to ensure these storage and computational benefits are achieved (particularly for low-order tensors). An implementation demonstrates that storage is greatly reduced and the complexity introduced by storing and computing with tensors by blocks is manageable. Preliminary results demonstrate that computational time is also reduced. The paper concludes with a discussion of how insights in this paper point to opportunities for generalizing recent advances in the domain of linear algebra libraries to the field of multilinear computation.

References

[1]

OpenBLAS, http://xianyi.github.com/OpenBLAS/ (2012).

[2]

E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, H. Ltaief, P. Luszczek, and S. Tomov, Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, J. Phys. Conf. Ser., 180 (2009), 012037.

[3]

E. Anderson, Z. Bai, C. Bischof, L. S. Blackford, J. Demmel, Jack J. Dongarra, J. Du Croz, S. Hammarling, A. Greenbaum, A. McKenney, and D. Sorensen, LAPACK Users' Guide, 3rd ed., Software Environ. Tools 9, SIAM, Philadelphia, 1999.

[4]

B. W. Bader and T. G. Kolda, Algorithm 862: MATLAB tensor classes for fast algorithm prototyping, ACM Trans. Math. Software, 32 (2006), pp. 635--653.

[5]

B. W. Bader, T. G. Kolda, et al., Matlab Tensor Toolbox version 2.5, http://www.sandia.gov/\string tgkolda/TensorToolbox/http://www.sandia.gov/$\sim$tgkolda/TensorToolbox/ (2012).

[6]

G. Ballard, T. G. Kolda, and T. Plantenga, Efficiently computing tensor eigenvalues on a GPU, in Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Ph.D. Forum, IEEE, Piscataway, NJ, 2011, pp. 1340--1348.

[7]

G. Baumgartner, A. Auer, D.E. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, X. Gao, R.J. Harrison, S. Hirata, S. Krishnamoorthy, S. Krishnan, C. Lam, Q. Lu, M. Nooijen, R.M. Pitzer, J. Ramanujam, P. Sadayappan, and A. Sibiryakov, Synthesis of high-performance parallel programs for a class of ab initio quantum chemistry models, in Proc. IEEE, 93, 2005, pp. 276--292.

[8]

C.F Bender, Integral transformations. A bottleneck in molecular quantum mechanical calculations, J. Comput. Phys., 9 (1972), pp. 547--554.

[9]

J. Bilmes, K. Asanovic, C.W. Chin, and J. Demmel, Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology, in Proceedings of the 11th International Conference on Supercomputing, ACM, New York, 1997, pp. 340--347.

[10]

J. Choi, J. J. Dongarra, R. Pozo, and D. W. Walker, ScaLAPACK: A scalable linear algebra library for distributed memory concurrent computers, in Proceedings of the Fourth Symposium on the Frontiers of Massively Parallel Computation, IEEE Computer Society Press, Los Alamitos, CA, 1992, pp. 120--127.

[11]

J. J. Dongarra, J. Du Croz, S. Hammarling, and I. Duff, A set of level $3$ basic linear algebra subprograms, ACM Trans. Math. Software, 16 (1990), pp. 1--17.

[12]

J. J. Dongarra, J. Du Croz, S. Hammarling, and R. J. Hanson, An extended set of FORTRAN basic linear algebra subprograms, ACM Trans. Math. Software, 14 (1988), pp. 1--17.

[13]

Eigenvector Research, Inc., PLS Toolbox, http://www.eigenvector.com/software/pls_toolbox.htmhttp://www.eigenvector.com/software/pls_toolbox.htm (2012).

[14]

E. Epifanovsky, M. Wormit, T. Kus, A. Landau, D. Zuev, K. Khistyaev, P. Manohar, I. Kaliman, A. Dreuw, and A. I. Krylov, New implementation of high-level correlated methods using a general block tensor library for high-performance electronic structure calculations, J. Comput. Chem., 34 (2013), pp. 2293--2309.

[15]

S. F. Cheng, D. M. Reeves, Y. Vorobeychik, and M. P. Wellman, Notes on equilibria in symmetric games, in Proceedings of the 6th International Workshop On Game Theoretic And Decision Theoretic Agents (GTDT), 2004, pp. 71--78.

[16]

K. Goto and R. van de Geijn, High-performance implementation of the level-3 BLAS, ACM Trans. Math. Software, 35 (2008), 4.

[17]

K. Goto and R. A. van de Geijn, Anatomy of high-performance matrix multiplication, ACM Trans. Math. Software, 34 (2008), 12.

[18]

J. A. Gunnels, F. G. Gustavson, G. M. Henry, and R. A. van de Geijn, FLAME: Formal linear algebra methods environment, ACM Trans. Math. Software, 27 (2001), pp. 422--455.

[19]

F. D. Igual, E. Chan, E. S. Quintana-Ortí, G. Quintana-Ortí, R. A. van de Geijn, and F. G. Van Zee, The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations, J. Parallel Distrib. Comput., 72 (2012), pp. 1134 -- 1143.

[20]

F. D. Igual, G. Quintana-Ortí, and R. van de Geijn, Level-3 BLAS on a GPU: Picking the Low Hanging Fruit, FLAME Working Note \#37, Technical report DICC 2009-04-01, Universidad Jaume I, Depto. de Ingenieria y Ciencia de Computadores., Castellón, Spain, 2009.

[21]

M. Ishteva, P. A. Absil, and P. Van Dooren, Jacobi algorithm for the best low multilinear rank approximation of symmetric tensors, SIAM J. Matrix Anal. Appl., 34 (2013), pp. 651--672.

[22]

T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Rev., 51 (2009), pp. 455--500.

[23]

C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh, Basic linear algebra subprograms for Fortran usage, ACM Trans. Math. Software, 5 (1979), pp. 308--323.

[24]

T. M. Low and R. van de Geijn, An API for Manipulating Matrices Stored by Blocks, FLAME Working Note \#12, TR-2004-15, The University of Texas at Austin, Department of Computer Sciences, Austin, TX, May 2004.

[25]

M. Marqués, G. Quintana-Ortí, E. S. Quintana-Ortí, and R. van de Geijn, Solving “large” dense matrix problems on multi-core processors and GPUs, in Proceedings of the 10th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC'09), Rome, Italy, 2009, IEEE, Piscataway, NJ, 2009, pp. 123--132.

[26]

A.-H. Phan, P. Tichavsky, and A. Cichocki, Fast alternating algorithms for high order CANDECOMP/PARAFAC tensor factorizations, IEEE Trans. Signal Process., 61 (2013), pp. 4834--4846.

[27]

J. Poulson, B. Marker, R. A. van de Geijn, J. R. Hammond, and N. A. Romero, Elemental: A new framework for distributed memory dense matrix computations, ACM Trans. Math. Software, 39 (2013), 13.

[28]

M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. W. Singer, J. Xiong, F. Franchetti, A. Gačić, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo, SPIRAL: Code generation for DSP transforms, Proc. IEEE, 93 (2005), pp. 232--275.

[29]

G. Quintana-Ortí, E. S. Quintana-Ortí, R. A. van de Geijn, F. G. Van Zee, and E. Chan, Programming matrix algorithms-by-blocks for thread-level parallelism, ACM Trans. Math. Software, 36 (2009), 14.

[30]

S. Ragnarsson and C. F. Van Loan, Block tensors and symmetric embeddings, Linear Algebra Appl., 438 (2013), pp. 853--874.

[31]

S. Ragnarsson and C. F. Van Loan, Block tensor unfoldings, SIAM J. Matrix Anal. Appl., 33 (2012), pp. 149--169.

[32]

P. A. Regalia, Monotonically convergent algorithms for symmetric tensor approximation, Linear Algebra Appl., 438 (2013), pp. 875--890.

[33]

E. Solomonik, J. Hammond, and J. Demmel, A Preliminary Analysis of Cyclops Tensor Framework, Technical report UCB/EECS-2012-29, EECS Department, University of California, Berkeley, CA, 2012.

[34]

G. Tomasi and R. Bro, Multilinear models: Iterative methods, in Comprehensive Chemometrics, S. Brown, R. Tauler, and R. Walczak, eds., Vol. 2, Elsevier, Oxford, 2009, pp. 411--451.

[35]

R. A. van de Geijn, Using PLAPACK: Parallel Linear Algebra Package, The MIT Press, Cambridge, MA, 1997.

[36]

F. G. Van Zee, E. Chan, R. van de Geijn, E. S. Quintana-Ortí, and G. Quintana-Ortí, The libflame library for dense matrix computations, Comput. Sci. Eng., 11 (2009), pp. 56--62.

[37]

F. G. Van Zee and R. A. van de Geijn, BLIS: A Framework for Generating BLAS-like Libraries, FLAME Working Note \#66, Technical report TR-12-30, Department of Computer Sciences, The University of Texas at Austin, Austin, TX, 2012.

[38]

R. C. Whaley and J. J. Dongarra, Automatically tuned linear algebra software, in Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, ACM, New York, 1998, pp. 1--27.

[39]

Z. Xianyi, W. Qian, and Z. Yunquan, Model-driven level 3 BLAS performance optimization on Loongson 3A processor, in Proceedings of the IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS), IEEE, Piscataway, NJ, 2012, pp. 684--691.

[40]

S. Yamamoto and U. Nagashima, Four-index integral transformation exploiting symmetry, Comput. Phys. Comm., 166 (2005), pp. 58--65.

[41]

F. G. Van Zee, libflame: The Complete Reference, Field G. Van Zee, 2009.

Cited By

Wiedenbeck BBrinkman EAgmon NAn BRicci AYeoh W(2023)Data Structures for Deviation PayoffsProceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems10.5555/3545946.3598698(670-678)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.5555/3545946.3598698
Spampinato DJelovina DZhuang JYzelman AHenriksen TSinkarovs A(2023)Towards Structured Algebraic ProgrammingProceedings of the 9th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming10.1145/3589246.3595373(50-61)Online publication date: 6-Jun-2023
https://dl.acm.org/doi/10.1145/3589246.3595373
Li ZKolla HPhipps ERobinson T(2022)Parallel memory-efficient computation of symmetric higher-order joint moment tensorsProceedings of the Platform for Advanced Scientific Computing Conference10.1145/3539781.3539793(1-11)Online publication date: 27-Jun-2022
https://dl.acm.org/doi/10.1145/3539781.3539793
Show More Cited By

Recommendations

On norm compression inequalities for partitioned block tensors
Abstract
When a tensor is partitioned into subtensors, some tensor norms of these subtensors form a tensor called a norm compression tensor. Norm compression inequalities for tensors focus on the relation of the norm of this compressed tensor to the norm ... $(^{})$
Time Integration of Rank-Constrained Tucker Tensors

Dynamical low-rank approximation in the Tucker tensor format of given large time-dependent tensors and of tensor differential equations is the subject of this paper. In particular, a discrete time integration method for rank-constrained Tucker tensors ...
On a Class of Semi-Positive Tensors in Tensor Complementarity Problem

Recently, the tensor complementarity problem has been investigated in the literature. In this paper, we extend a class of structured matrices to higher-order tensors; the corresponding tensor complementarity problem has a unique solution for any nonzero ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image SIAM Journal on Scientific Computing

SIAM Journal on Scientific Computing Volume 36, Issue 5

^† Special Section on Two Themes: Planet Earth and Big Data

2014

977 pages

ISSN:1064-8275

DOI:10.1137/sjoce3.36.5

Issue’s Table of Contents

© 2014, Society for Industrial and Applied Mathematics.

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 01 January 2014

Author Tags

Author Tag

15A69

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wiedenbeck BBrinkman EAgmon NAn BRicci AYeoh W(2023)Data Structures for Deviation PayoffsProceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems10.5555/3545946.3598698(670-678)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.5555/3545946.3598698
Spampinato DJelovina DZhuang JYzelman AHenriksen TSinkarovs A(2023)Towards Structured Algebraic ProgrammingProceedings of the 9th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming10.1145/3589246.3595373(50-61)Online publication date: 6-Jun-2023
https://dl.acm.org/doi/10.1145/3589246.3595373
Li ZKolla HPhipps ERobinson T(2022)Parallel memory-efficient computation of symmetric higher-order joint moment tensorsProceedings of the Platform for Advanced Scientific Computing Conference10.1145/3539781.3539793(1-11)Online publication date: 27-Jun-2022
https://dl.acm.org/doi/10.1145/3539781.3539793
Domino KGawron P(2019)An algorithm for arbitrary–order cumulant tensor calculation in a sliding window of data streamsInternational Journal of Applied Mathematics and Computer Science10.2478/amcs-2019-001529:1(195-206)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.2478/amcs-2019-0015

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents