Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleApril 2024
Performance improvement of the triangular matrix product in commodity clusters
- Inmaculada Santamaria-Valenzuela,
- Rocío Carratalá-Sáez,
- Yuri Torres,
- Diego R. Llanos,
- Arturo Gonzalez-Escribano
The Journal of Supercomputing (JSCO), Volume 80, Issue 11Pages 16630–16653https://doi.org/10.1007/s11227-024-06097-7AbstractThere are many works devoted to improving the matrix product computation, as it is used in a wide variety of scientific applications arising from many different fields. In this work, we propose alternative data distribution policies and ...
- research-articleNovember 2023
Energy consumption comparison of parallel linear systems solver algorithms on HPC infrastructure
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1839–1848https://doi.org/10.1145/3624062.3624266High-Performance Computing (HPC) systems today are gradually increasing in size and complexity due to the correspondent demand for ever-increasing computing needs, requiring more complicated tasks and higher accuracy. The growing energy needs of HPC ...
- research-articleApril 2021
Improving blocked matrix-matrix multiplication routine by utilizing AVX-512 instructions on intel knights landing and xeon scalable processors
Cluster Computing (KLU-CLUS), Volume 26, Issue 5Pages 2539–2549https://doi.org/10.1007/s10586-021-03274-8AbstractIn high-performance computing, the general matrix-matrix multiplication (xGEMM) routine is the core of the Level 3 BLAS kernel for effective matrix-matrix multiplication operations. The performance of parallel xGEMM (PxGEMM) is significantly ...
- research-articleAugust 2019
A high performance implementation of Zolo-SVD algorithm on distributed memory systems
AbstractThis paper introduces a high performance implementation of the Zolo-SVD algorithm on distributed memory systems, which is based on the polar decomposition (PD) algorithm via the Zolotarev’s function (Zolo-PD), originally proposed by ...
-
- research-articleDecember 2018
An efficient hybrid tridiagonal divide-and-conquer algorithm on distributed memory architectures
Journal of Computational and Applied Mathematics (JCAM), Volume 344, Issue CPages 512–520https://doi.org/10.1016/j.cam.2018.05.051AbstractIn this paper, we propose an efficient divide-and-conquer (DC) algorithm for symmetric tridiagonal matrices based on ScaLAPACK and the hierarchically semiseparable (HSS) matrices. HSS is an important type of rank-structured matrices. ...
- research-articleNovember 2013
Parallel reduction to hessenberg form with algorithm-based fault tolerance
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 88, Pages 1–11https://doi.org/10.1145/2503210.2503249This paper studies the resilience of a two-sided factorization and presents a generic algorithm-based approach capable of making two-sided factorizations resilient. We establish the theoretical proof of the correctness and the numerical stability of the ...
- ArticleNovember 2012
Tight Coupling of R and Distributed Linear Algebra for High-Level Programming with Big Data
SCC '12: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and AnalysisPages 811–815https://doi.org/10.1109/SC.Companion.2012.113We present a new distributed programming extension of the R programming language. By tightly coupling R to the well-known ScaLAPACK and MPI libraries, we are able to achieve highly scalable implementations of common statistical methods, allowing the ...
- articleNovember 2012
Parallel, 'large', dense matrix problems: Application to 3D sequential integrated inversion of seismological and gravity data
To obtain accurate and reliable estimations of the major lithological properties of the rock within a studied volume, geophysics uses the joint information provided by different geophysical datasets (e.g. gravimetric, magnetic, seismic). Representation ...
- ArticleSeptember 2012
Energy Efficient Parallel Matrix-Matrix Multiplication for DVFS-enabled Clusters
ICPPW '12: Proceedings of the 2012 41st International Conference on Parallel Processing WorkshopsPages 239–245https://doi.org/10.1109/ICPPW.2012.36Excessive energy consumption has become one of the major challenges in high performance computing. Reducing the energy consumption of frequently used high performance computing applications not only saves the energy cost but also reduces the greenhouse ...
- ArticleJuly 2012
Design and Performance Issues of Cholesky and LU Solvers Using UPCBLAS
ISPA '12: Proceedings of the 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with ApplicationsPages 40–47https://doi.org/10.1109/ISPA.2012.14Partitioned Global Address Space (PGAS) languages offer programmers a shared memory view that increases their productivity and allow locality exploitation to obtain good performance on current large-scale distributed memory systems. UPCBLAS is a ...
- ArticleSeptember 2011
Incomplete cyclic reduction of banded and strictly diagonally dominant linear systems
PPAM'11: Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part IPages 80–91https://doi.org/10.1007/978-3-642-31464-3_9The ScaLAPACK library contains a pair of routines for solving banded linear systems which are strictly diagonally dominant by rows. Mathematically, the algorithm is complete block cyclic reduction corresponding to a particular block partitioning of the ...
- ArticleJune 2010
Parallel solution of narrow banded diagonally dominant linear systems
PARA'10: Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2Pages 280–290https://doi.org/10.1007/978-3-642-28145-7_28ScaLAPACK contains a pair of routines for solving systems which are narrow banded and diagonally dominant by rows. Mathematically, the algorithm is block cyclic reduction. The ScaLAPACK implementation can be improved using incomplete, rather than ...
- research-articleJanuary 2010
ScaLAPACK's MRRR algorithm
ACM Transactions on Mathematical Software (TOMS), Volume 37, Issue 1Article No.: 1, Pages 1–35https://doi.org/10.1145/1644001.1644002The (sequential) algorithm of Multiple Relatively Robust Representations, MRRR, is a more efficient variant of inverse iteration that does not require reorthogonalization. It solves the eigenproblem of an unreduced symmetric tridiagonal matrix T ∈ Rn × ...
- articleAugust 2009
Interfaces for parallel numerical linear algebra libraries in high level languages
Advances in Engineering Software (ADES), Volume 40, Issue 8Pages 652–658https://doi.org/10.1016/j.advengsoft.2008.11.014In many high performance engineering and scientific applications there is a need to use parallel software libraries. Researchers behind these applications find it difficult to understand the interfaces to these libraries because they carry arguments ...
- ArticleAugust 2008
Parallel Algorithms for Triangular Periodic Sylvester-Type Matrix Equations
Euro-Par '08: Proceedings of the 14th international Euro-Par conference on Parallel ProcessingPages 780–789https://doi.org/10.1007/978-3-540-85451-7_83We present parallel algorithms for triangular periodic Sylves-ter-type matrix equations, conceptually being the third step of a periodic Bartels---Stewart-like solution method for general periodic Sylvester-type matrix equations based on variants of the ...
- ArticleJuly 2008
Heterogeneous PBLAS: Optimization of PBLAS for Heterogeneous Computational Clusters
ISPDC '08: Proceedings of the 2008 International Symposium on Parallel and Distributed ComputingPages 73–80https://doi.org/10.1109/ISPDC.2008.9This paper presents a package, called Heterogeneous PBLAS (HeteroPBLAS), which is built on top of PBLAS and provides optimized parallel basic linear algebra subprograms for heterogeneous computational clusters. We present the user interface and the ...
- ArticleJuly 2008
Scalable Dense Factorizations for Heterogeneous Computational Clusters
ISPDC '08: Proceedings of the 2008 International Symposium on Parallel and Distributed ComputingPages 49–56https://doi.org/10.1109/ISPDC.2008.10This paper discusses the design and the implementation of the LU factorization routines included in the Heterogeneous ScaLAPACK library, which is built on top of ScaLAPACK. These routines are used in the factorization and solution of a dense system of ...
- opinionApril 2008
Biographies
IEEE Annals of the History of Computing (ANHC), Volume 30, Issue 2Pages 74–81https://doi.org/10.1109/MAHC.2008.17Jack Dongarra, a leader of the high-performance computing community, is cocreator of mathematical software packages including EISPACK, LINPACK, LAPACK, and ScaLAPACK. He is also a University Distinguished Professor at the University of Tennessee.
- posterNovember 2007
Block size selection of parallel LU and QR on PVP-based and RISC-based supercomputers
CHINA HPC '07: Proceedings of the 2007 Asian technology information program's (ATIP's) 3rd workshop on High performance computing in China: solution approaches to impediments for high performance computingPages 115–125https://doi.org/10.1145/1375783.1375809In this paper, we proposed a unified framework and tried to address the optimal block size selection problem for parallel blocked LU and QR factorization algorithm used in ScaLAPACK package, since they use two dimensional block cyclic data distribution ...