[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Exploiting Symmetry in Tensors for High Performance: : Multiplication with Symmetric Tensors

Published: 01 January 2014 Publication History

Abstract

Symmetric tensor operations arise in a wide variety of computations. However, the benefits of exploiting symmetry in order to reduce storage and computation is in conflict with a desire to simplify memory access patterns. In this paper, we propose a blocked data structure (blocked compact symmetric storage) wherein we consider the tensor by blocks and store only the unique blocks of a symmetric tensor. We propose an algorithm by blocks, already shown of benefit for matrix computations, that exploits this storage format by utilizing a series of temporary tensors to avoid redundant computation. Further, partial symmetry within temporaries is exploited to further avoid redundant storage and redundant computation. A detailed analysis shows that, relative to storing and computing with tensors without taking advantage of symmetry and partial symmetry, storage requirements are reduced by a factor of $ O( m! )$ and computational requirements by a factor of $O( (m+1)!/2^m )$, where $ m $ is the order of the tensor. However, as the analysis shows, care must be taken in choosing the correct block size to ensure these storage and computational benefits are achieved (particularly for low-order tensors). An implementation demonstrates that storage is greatly reduced and the complexity introduced by storing and computing with tensors by blocks is manageable. Preliminary results demonstrate that computational time is also reduced. The paper concludes with a discussion of how insights in this paper point to opportunities for generalizing recent advances in the domain of linear algebra libraries to the field of multilinear computation.

References

[1]
OpenBLAS, http://xianyi.github.com/OpenBLAS/ (2012).
[2]
E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, H. Ltaief, P. Luszczek, and S. Tomov, Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, J. Phys. Conf. Ser., 180 (2009), 012037.
[3]
E. Anderson, Z. Bai, C. Bischof, L. S. Blackford, J. Demmel, Jack J. Dongarra, J. Du Croz, S. Hammarling, A. Greenbaum, A. McKenney, and D. Sorensen, LAPACK Users' Guide, 3rd ed., Software Environ. Tools 9, SIAM, Philadelphia, 1999.
[4]
B. W. Bader and T. G. Kolda, Algorithm 862: MATLAB tensor classes for fast algorithm prototyping, ACM Trans. Math. Software, 32 (2006), pp. 635--653.
[5]
B. W. Bader, T. G. Kolda, et al., Matlab Tensor Toolbox version 2.5, http://www.sandia.gov/\string tgkolda/TensorToolbox/http://www.sandia.gov/$\sim$tgkolda/TensorToolbox/ (2012).
[6]
G. Ballard, T. G. Kolda, and T. Plantenga, Efficiently computing tensor eigenvalues on a GPU, in Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Ph.D. Forum, IEEE, Piscataway, NJ, 2011, pp. 1340--1348.
[7]
G. Baumgartner, A. Auer, D.E. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, X. Gao, R.J. Harrison, S. Hirata, S. Krishnamoorthy, S. Krishnan, C. Lam, Q. Lu, M. Nooijen, R.M. Pitzer, J. Ramanujam, P. Sadayappan, and A. Sibiryakov, Synthesis of high-performance parallel programs for a class of ab initio quantum chemistry models, in Proc. IEEE, 93, 2005, pp. 276--292.
[8]
C.F Bender, Integral transformations. A bottleneck in molecular quantum mechanical calculations, J. Comput. Phys., 9 (1972), pp. 547--554.
[9]
J. Bilmes, K. Asanovic, C.W. Chin, and J. Demmel, Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology, in Proceedings of the 11th International Conference on Supercomputing, ACM, New York, 1997, pp. 340--347.
[10]
J. Choi, J. J. Dongarra, R. Pozo, and D. W. Walker, ScaLAPACK: A scalable linear algebra library for distributed memory concurrent computers, in Proceedings of the Fourth Symposium on the Frontiers of Massively Parallel Computation, IEEE Computer Society Press, Los Alamitos, CA, 1992, pp. 120--127.
[11]
J. J. Dongarra, J. Du Croz, S. Hammarling, and I. Duff, A set of level $3$ basic linear algebra subprograms, ACM Trans. Math. Software, 16 (1990), pp. 1--17.
[12]
J. J. Dongarra, J. Du Croz, S. Hammarling, and R. J. Hanson, An extended set of FORTRAN basic linear algebra subprograms, ACM Trans. Math. Software, 14 (1988), pp. 1--17.
[13]
Eigenvector Research, Inc., PLS Toolbox, http://www.eigenvector.com/software/pls_toolbox.htmhttp://www.eigenvector.com/software/pls_toolbox.htm (2012).
[14]
E. Epifanovsky, M. Wormit, T. Kus, A. Landau, D. Zuev, K. Khistyaev, P. Manohar, I. Kaliman, A. Dreuw, and A. I. Krylov, New implementation of high-level correlated methods using a general block tensor library for high-performance electronic structure calculations, J. Comput. Chem., 34 (2013), pp. 2293--2309.
[15]
S. F. Cheng, D. M. Reeves, Y. Vorobeychik, and M. P. Wellman, Notes on equilibria in symmetric games, in Proceedings of the 6th International Workshop On Game Theoretic And Decision Theoretic Agents (GTDT), 2004, pp. 71--78.
[16]
K. Goto and R. van de Geijn, High-performance implementation of the level-3 BLAS, ACM Trans. Math. Software, 35 (2008), 4.
[17]
K. Goto and R. A. van de Geijn, Anatomy of high-performance matrix multiplication, ACM Trans. Math. Software, 34 (2008), 12.
[18]
J. A. Gunnels, F. G. Gustavson, G. M. Henry, and R. A. van de Geijn, FLAME: Formal linear algebra methods environment, ACM Trans. Math. Software, 27 (2001), pp. 422--455.
[19]
F. D. Igual, E. Chan, E. S. Quintana-Ortí, G. Quintana-Ortí, R. A. van de Geijn, and F. G. Van Zee, The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations, J. Parallel Distrib. Comput., 72 (2012), pp. 1134 -- 1143.
[20]
F. D. Igual, G. Quintana-Ortí, and R. van de Geijn, Level-3 BLAS on a GPU: Picking the Low Hanging Fruit, FLAME Working Note \#37, Technical report DICC 2009-04-01, Universidad Jaume I, Depto. de Ingenieria y Ciencia de Computadores., Castellón, Spain, 2009.
[21]
M. Ishteva, P. A. Absil, and P. Van Dooren, Jacobi algorithm for the best low multilinear rank approximation of symmetric tensors, SIAM J. Matrix Anal. Appl., 34 (2013), pp. 651--672.
[22]
T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Rev., 51 (2009), pp. 455--500.
[23]
C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh, Basic linear algebra subprograms for Fortran usage, ACM Trans. Math. Software, 5 (1979), pp. 308--323.
[24]
T. M. Low and R. van de Geijn, An API for Manipulating Matrices Stored by Blocks, FLAME Working Note \#12, TR-2004-15, The University of Texas at Austin, Department of Computer Sciences, Austin, TX, May 2004.
[25]
M. Marqués, G. Quintana-Ortí, E. S. Quintana-Ortí, and R. van de Geijn, Solving “large” dense matrix problems on multi-core processors and GPUs, in Proceedings of the 10th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC'09), Rome, Italy, 2009, IEEE, Piscataway, NJ, 2009, pp. 123--132.
[26]
A.-H. Phan, P. Tichavsky, and A. Cichocki, Fast alternating algorithms for high order CANDECOMP/PARAFAC tensor factorizations, IEEE Trans. Signal Process., 61 (2013), pp. 4834--4846.
[27]
J. Poulson, B. Marker, R. A. van de Geijn, J. R. Hammond, and N. A. Romero, Elemental: A new framework for distributed memory dense matrix computations, ACM Trans. Math. Software, 39 (2013), 13.
[28]
M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. W. Singer, J. Xiong, F. Franchetti, A. Gačić, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo, SPIRAL: Code generation for DSP transforms, Proc. IEEE, 93 (2005), pp. 232--275.
[29]
G. Quintana-Ortí, E. S. Quintana-Ortí, R. A. van de Geijn, F. G. Van Zee, and E. Chan, Programming matrix algorithms-by-blocks for thread-level parallelism, ACM Trans. Math. Software, 36 (2009), 14.
[30]
S. Ragnarsson and C. F. Van Loan, Block tensors and symmetric embeddings, Linear Algebra Appl., 438 (2013), pp. 853--874.
[31]
S. Ragnarsson and C. F. Van Loan, Block tensor unfoldings, SIAM J. Matrix Anal. Appl., 33 (2012), pp. 149--169.
[32]
P. A. Regalia, Monotonically convergent algorithms for symmetric tensor approximation, Linear Algebra Appl., 438 (2013), pp. 875--890.
[33]
E. Solomonik, J. Hammond, and J. Demmel, A Preliminary Analysis of Cyclops Tensor Framework, Technical report UCB/EECS-2012-29, EECS Department, University of California, Berkeley, CA, 2012.
[34]
G. Tomasi and R. Bro, Multilinear models: Iterative methods, in Comprehensive Chemometrics, S. Brown, R. Tauler, and R. Walczak, eds., Vol. 2, Elsevier, Oxford, 2009, pp. 411--451.
[35]
R. A. van de Geijn, Using PLAPACK: Parallel Linear Algebra Package, The MIT Press, Cambridge, MA, 1997.
[36]
F. G. Van Zee, E. Chan, R. van de Geijn, E. S. Quintana-Ortí, and G. Quintana-Ortí, The libflame library for dense matrix computations, Comput. Sci. Eng., 11 (2009), pp. 56--62.
[37]
F. G. Van Zee and R. A. van de Geijn, BLIS: A Framework for Generating BLAS-like Libraries, FLAME Working Note \#66, Technical report TR-12-30, Department of Computer Sciences, The University of Texas at Austin, Austin, TX, 2012.
[38]
R. C. Whaley and J. J. Dongarra, Automatically tuned linear algebra software, in Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, ACM, New York, 1998, pp. 1--27.
[39]
Z. Xianyi, W. Qian, and Z. Yunquan, Model-driven level 3 BLAS performance optimization on Loongson 3A processor, in Proceedings of the IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS), IEEE, Piscataway, NJ, 2012, pp. 684--691.
[40]
S. Yamamoto and U. Nagashima, Four-index integral transformation exploiting symmetry, Comput. Phys. Comm., 166 (2005), pp. 58--65.
[41]
F. G. Van Zee, libflame: The Complete Reference, Field G. Van Zee, 2009.

Cited By

View all
  • (2023)Data Structures for Deviation PayoffsProceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems10.5555/3545946.3598698(670-678)Online publication date: 30-May-2023
  • (2023)Towards Structured Algebraic ProgrammingProceedings of the 9th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming10.1145/3589246.3595373(50-61)Online publication date: 6-Jun-2023
  • (2022)Parallel memory-efficient computation of symmetric higher-order joint moment tensorsProceedings of the Platform for Advanced Scientific Computing Conference10.1145/3539781.3539793(1-11)Online publication date: 27-Jun-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image SIAM Journal on Scientific Computing
SIAM Journal on Scientific Computing  Volume 36, Issue 5
Special Section on Two Themes: Planet Earth and Big Data
2014
977 pages
ISSN:1064-8275
DOI:10.1137/sjoce3.36.5
Issue’s Table of Contents

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 01 January 2014

Author Tags

  1. high performance
  2. multilinear algebra
  3. algorithms

Author Tag

  1. 15A69

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Data Structures for Deviation PayoffsProceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems10.5555/3545946.3598698(670-678)Online publication date: 30-May-2023
  • (2023)Towards Structured Algebraic ProgrammingProceedings of the 9th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming10.1145/3589246.3595373(50-61)Online publication date: 6-Jun-2023
  • (2022)Parallel memory-efficient computation of symmetric higher-order joint moment tensorsProceedings of the Platform for Advanced Scientific Computing Conference10.1145/3539781.3539793(1-11)Online publication date: 27-Jun-2022
  • (2019)An algorithm for arbitrary–order cumulant tensor calculation in a sliding window of data streamsInternational Journal of Applied Mathematics and Computer Science10.2478/amcs-2019-001529:1(195-206)Online publication date: 1-Mar-2019

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media