[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3589246.3595373acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Towards Structured Algebraic Programming

Published: 06 June 2023 Publication History

Abstract

Structured matrices and tensors exhibiting properties such as symmetry and fixed non-zero patterns are known for making algorithms and data storage more efficient. Due to emerging power and efficiency constraints required by the scale of modern scientific, machine learning, and edge computing applications, algorithm experts are revisiting traditional linear algebra algorithms with the goal of making new structures appear. Such structures often result from new numerical approximations that would greatly benefit from a more flexible linear algebra interface than standard BLAS and LAPACK, allowing for mixed precision and data types to appear in place of traditional floating-point operations. Algebraic programming interfaces, like GraphBLAS, while naturally abstracting the algebras of their operations, do not explicitly capture structured, densely stored arrays. In this paper, we present a preliminary design of a new algebraic programming interface for structured containers with template-generic, non-zero patterns. This interface offers to backend implementations the possibility of integrating more compile-time pattern information in the loop-based implementation of primitive operations as well as in the formulation of array accesses. We demonstrate its ability to specify important dense matrix decomposition algorithms and argue its ability to expose high-performance backends.

References

[1]
2020. GraphBLAS Template Library (GBTL), v. 3.0. https://github.com/cmu-sei/gbtl
[2]
2021. ALP/GraphBLAS. https://github.com/Algebraic-Programming/ALP
[3]
2022. OpenBLAS – An optimized BLAS library. https://www.openblas.net/
[4]
2023. ALP/Dense. https://github.com/Algebraic-Programming/ALP/tree/array23-artifact
[5]
Edward Anderson, Zhaojun Bai, Christian Bischof, L Susan Blackford, James Demmel, Jack Dongarra, Jeremy Du Croz, Anne Greenbaum, Sven Hammarling, and Alan McKenney. 1999. LAPACK users’ guide. SIAM.
[6]
Edward Anderson and Jack Dongarra. 1990. Lapack working note 19: Evaluating block algorithm variants in LAPACK. University of Tennessee.
[7]
Henrik Barthels. 2021. Linnea: A Compiler for Mapping Linear Algebra Problems onto High-Performance Kernel Libraries. RWTH Aachen University. https://hpac.cs.umu.se/~barthels/publications/dissertation.pdf
[8]
Uday Bondhugula. 2020. High Performance Code Generation in MLIR: An Early Case Study with GEMM. CoRR, abs/2003.00532 (2020), arXiv:2003.00532.
[9]
Benjamin Brock, Ayd∈ Buluç, Timothy Mattson, Scott McMillan, and José Moreira. 2021. The GraphBLAS C API Specification: Version 2.0.0. https://graphblas.org/docs/GraphBLAS_API_C_v2.0.0.pdf
[10]
Qinglei Cao, Sameh Abdulah, Rabab Alomairy, Yu Pei, Pratik Nag, George Bosilca, Jack Dongarra, Marc G. Genton, David E. Keyes, Hatem Ltaief, and Ying Sun. 2022. Reshaping Geostatistical Modeling and Prediction for Extreme-Scale Environmental Applications. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC ’22). Article 2, 12 pages.
[11]
Timothy A. Davis. 2019. Algorithm 1000: SuiteSparse:GraphBLAS: Graph Algorithms in the Language of Sparse Linear Algebra. ACM Trans. Math. Softw., 45, 4 (2019), Article 44, dec, 25 pages. https://doi.org/10.1145/3322125
[12]
J. J. Dongarra, Jeremy Du Croz, Sven Hammarling, and I. S. Duff. 1990. A Set of Level 3 Basic Linear Algebra Subprograms. ACM Trans. Math. Softw., 16, 1 (1990), mar, 1–17. https://doi.org/10.1145/77626.79170
[13]
Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Richard J. Hanson. 1988. An Extended Set of FORTRAN Basic Linear Algebra Subprograms. ACM Trans. Math. Softw., 14, 1 (1988), mar, 1–17. https://doi.org/10.1145/42288.42291
[14]
D. Fabregat-Traver and P. Bientinesi. 2011. Automatic Generation of Loop-Invariants for Matrix Operations. In 2013 13th International Conference on Computational Science and Its Applications. 82–92. https://doi.org/10.1109/ICCSA.2011.41
[15]
Franz Franchetti, Tze-Meng Low, Thom Popovici, Richard Veras, Daniele G. Spampinato, Jeremy Johnson, Markus Püschel, James C. Hoe, and José M. F. Moura. 2018. SPIRAL: Extreme Performance Portability. Proceedings of the IEEE, special issue on “From High Level Specification to High Performance Code”, 106, 11 (2018).
[16]
Franz Franchetti, Yevgen Voronenko, and Markus Püschel. 2005. Formal Loop Merging for Signal Transforms. In Programming Languages Design and Implementation (PLDI). 315–326.
[17]
Gianluca Frison, Dimitris Kouzoupis, Tommaso Sartor, Andrea Zanelli, and Moritz Diehl. 2018. BLASFEO: Basic Linear Algebra Subroutines for Embedded Optimization. ACM Trans. Math. Softw., 44, 4 (2018), Article 42, jul, 30 pages. https://doi.org/10.1145/3210754
[18]
Roman Gareev, Tobias Grosser, and Michael Kruse. 2018. High-Performance Generalized Tensor Operations: A Compiler-Oriented Approach. ACM Trans. Archit. Code Optim., 15, 3 (2018), Article 34, sep, 27 pages. https://doi.org/10.1145/3235029
[19]
Gene H. Golub and Charles F. Van Loan. 2013. Matrix Computations (fourth ed.). The Johns Hopkins University Press.
[20]
Albert Gu, Karan Goel, and Christopher Re. 2022. Efficiently Modeling Long Sequences with Structured State Spaces. In International Conference on Learning Representations. https://openreview.net/forum?id=uYLFoz1vlAC
[21]
Gaël Guennebaud and Benoît Jacob. 2010. Eigen. https://eigen.tuxfamily.org/
[22]
Charles R. Harris. 2020. Array programming with NumPy. Nature, 585, 7825 (2020), Sept., 357–362. https://doi.org/10.1038/s41586-020-2649-2
[23]
Alexander Heinecke, Greg Henry, Maxwell Hutchinson, and Hans Pabst. 2016. LIBXSMM: Accelerating Small Matrix Multiplications by Runtime Code Generation. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’16). Article 84, 11 pages. https://doi.org/10.5555/3014904.3015017
[24]
Ltd Huawei Technologies Co. 2023. KML Kunpeng Math Library. https://www.hikunpeng.com/developer/boostkit/library/math Accessed: 2032-03-07
[25]
Intel. 2023. Intel oneAPI Math Kernel Library. https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html
[26]
Jeremy Kepner and John Gilbert. 2011. Graph Algorithms in the Language of Linear Algebra. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9780898719918
[27]
Tamara G. Kolda and Brett W. Bader. 2009. Tensor Decompositions and Applications. SIAM Rev., 51, 3 (2009), 455–500. https://doi.org/10.1137/07070111X
[28]
C. Lattner and V. Adve. 2004. LLVM: a compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization (CGO). 75–86. https://doi.org/10.1109/CGO.2004.1281665
[29]
Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques A. Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. 2021. MLIR: Scaling Compiler Infrastructure for Domain Specific Computation. In IEEE/ACM International Symposium on Code Generation and Optimization, CGO. 2–14. https://doi.org/10.1109/CGO51591.2021.9370308
[30]
C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. 1979. Basic Linear Algebra Subprograms for Fortran Usage. ACM Trans. Math. Softw., 5, 3 (1979), sep, 308–323. https://doi.org/10.1145/355841.355847
[31]
Tze Meng Low, Francisco D. Igual, Tyler M. Smith, and Enrique S. Quintana-Orti. 2016. Analytical Modeling Is Enough for High-Performance BLIS. ACM Trans. Math. Softw., 43, 2 (2016), Article 12, aug, 18 pages. https://doi.org/10.1145/2925987
[32]
Johan Mabille, Sylvain Corlay, and Wolf Vollprecht. 2016. xtensor. https://github.com/xtensor-stack/xtensor
[33]
Devin A. Matthews. 2018. High-Performance Tensor Contraction without Transposition. SIAM Journal on Scientific Computing, 40, 1 (2018), C1–C24. https://doi.org/10.1137/16M108968X
[34]
Nvidia. 2023. cuBLAS. https://docs.nvidia.com/cuda/cublas/
[35]
Federico Pizzuti, Michel Steuwer, and Christophe Dubach. 2019. Position-Dependent Arrays and Their Application for High Performance Code Generation. In Proceedings of the 8th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing (FHPNC 2019). 14–26. https://doi.org/10.1145/3331553.3342614
[36]
Christos Psarras, Henrik Barthels, and Paolo Bientinesi. 2022. The Linear Algebra Mapping Problem. Current State of Linear Algebra Languages and Libraries. ACM Trans. Math. Softw., 48, 3 (2022), Article 26, sep, 30 pages. https://doi.org/10.1145/3549935
[37]
Zhen Qin, Xiaodong Han, Weixuan Sun, Bowen He, Dong Li, Dongxu Li, Yuchao Dai, Lingpeng Kong, and Yiran Zhong. 2023. Toeplitz Neural Network for Sequence Modeling. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=IxmWsm4xrua
[38]
Sanil Rao, Anurag Kutuluru, Paul Brouwer, Scott McMillan, and Franz Franchetti. 2020. GBTLX: A First Look. In 2020 IEEE High Performance Extreme Computing Conference (HPEC). 1–7. https://doi.org/10.1109/HPEC43674.2020.9286231
[39]
Conrad Sanderson and Ryan Curtin. 2016. Armadillo: a template-based C++ library for linear algebra. Journal of Open Source Software, 1, 2 (2016), 26. https://doi.org/10.21105/joss.00026
[40]
Martin D. Schatz, Tze Meng Low, Robert A. van de Geijn, and Tamara G. Kolda. 2014. Exploiting Symmetry in Tensors for High Performance: Multiplication with Symmetric Tensors. SIAM Journal on Scientific Computing, 36, 5 (2014), C453–C479. https://doi.org/10.1137/130907215
[41]
Stanislav G. Sedukhin and Marcin Paprzycki. 2012. Generalizing Matrix Multiplication for Efficient Computations on Modern Computers. In Parallel Processing and Applied Mathematics, Roman Wyrzykowski, Jack Dongarra, Konrad Karczewski, and Jerzy Waśniewski (Eds.). 225–234. https://doi.org/10.1007/978-3-642-31464-3_23
[42]
Shruti Shivakumar, Jiajia Li, Ramakrishnan Kannan, and Srinivas Aluru. 2021. Efficient Parallel Sparse Symmetric Tucker Decomposition for High-Order Tensors. 193–204. https://doi.org/10.1137/1.9781611976830.18
[43]
Edgar Solomonik and James Demmel. 2011. Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms. In Euro-Par 2011 Parallel Processing, Emmanuel Jeannot, Raymond Namyst, and Jean Roman (Eds.). 90–109.
[44]
Daniele G. Spampinato and Markus Püschel. 2016. A Basic Linear Algebra Compiler for Structured Matrices. In International Symposium on Code Generation and Optimization (CGO). 117–127.
[45]
Xing Su, Xiangke Liao, and Jingling Xue. 2017. Automatic Generation of Fast BLAS3-GEMM: A Portable Compiler Approach. In Proceedings of the 2017 International Symposium on Code Generation and Optimization (CGO ’17). 122–133.
[46]
Yaohung Tsai. 2020. Mixed-Precision Numerical Linear Algebra Algorithms: Integer Arithmetic Based LU Factorization and Iterative Refinement for Hermitian Eigenvalue Problem. University of Tennessee. https://trace.tennessee.edu/utk_graddiss/6094
[47]
Field G. Van Zee and Robert A. van de Geijn. 2015. BLIS: A Framework for Rapidly Instantiating BLAS Functionality. ACM Trans. Math. Softw., 41, 3 (2015), Article 14, jun, 33 pages. https://doi.org/10.1145/2764454
[48]
Sven Verdoolaege. 2010. isl: An Integer Set Library for the Polyhedral Model. In Mathematical Software – ICMS 2010. Springer Berlin Heidelberg, 299–302.
[49]
Pauli Virtanen. 2020. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17 (2020), 261–272. https://doi.org/10.1038/s41592-019-0686-2
[50]
Weiling Yang, Jianbin Fang, Dezun Dong, Xing Su, and Zheng Wang. 2021. LIBSHALOM: Optimizing Small and Irregular-Shaped Matrix Multiplications on ARMv8 Multi-Cores. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’21). Article 72, 14 pages. https://doi.org/10.1145/3458817.3476217
[51]
A. N. Yzelman, D. Di Nardo, J. M. Nash, and W. J. Suijlen. 2020. A C++ GraphBLAS: specification, implementation, parallelisation, and evaluation. Preprint

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ARRAY 2023: Proceedings of the 9th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming
June 2023
74 pages
ISBN:9798400701696
DOI:10.1145/3589246
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. algebraic programming
  2. dense linear algebra
  3. mathematical software
  4. structured matrices

Qualifiers

  • Research-article

Conference

ARRAY '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 17 of 25 submissions, 68%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 152
    Total Downloads
  • Downloads (Last 12 months)53
  • Downloads (Last 6 weeks)8
Reflects downloads up to 26 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media