A Portable and Heterogeneous LU Factorization on IRIS

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13835))

Included in the following conference series:

European Conference on Parallel Processing

483 Accesses
1 Citations

Abstract

Here, the IRIS programming model is evaluated as a method to improve performance portability for heterogeneous systems that use LU matrix factorization. LU (lower-upper) factorization is considered one of the most important numerical linear algebra operations used in multiple high-performance computing and scientific applications. IRIS enables the separation of the algorithm’s definition from the tuning by using tasks + dependencies. This considerably reduces the effort required to achieve performance portability on heterogeneous systems. One IRIS code can use different settings depending on the underlying hardware features. Different configurations are evaluated on two different heterogeneous systems to achieve important speedups for the reference code with minimal changes to the source code.

J. Kim—Now at NVIDIA.

Notice: This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a non-exclusive, paid up, irrevocable, world-wide license to publish or reproduce the published form of the manuscript, or allow others to do so, for U.S. Government purposes. The DOE will provide public access to these results in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 43.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 54.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MatRIS: Addressing the Challenges for Portability and Heterogeneity Using Tasking for Matrix Decomposition (Cholesky)

The LAMA Approach for Writing Portable Applications on Heterogenous Architectures

Transparent Application Acceleration by Intelligent Scheduling of Shared Library Calls on Heterogeneous Systems

Notes

References

Bellavia, S., Morini, B., Porcelli, M.: New updates of incomplete LU factorizations and applications to large nonlinear systems. Optim. Methods Softw. 29(2), 321–340 (2014). https://doi.org/10.1080/10556788.2012.762517
Article MathSciNet MATH Google Scholar
Eickhoff, K.M., Engl, W.L.: Levelized incomplete LU factorization and its application to large-scale circuit simulation. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 14(6), 720–727 (1995). https://doi.org/10.1109/43.387732
Article Google Scholar
Luciani, X., Albera, L.: Joint eigenvalue decomposition of non-defective matrices based on the LU factorization with application to ICA. IEEE Trans. Signal Process. 63(17), 4594–4608 (2015). https://doi.org/10.1109/TSP.2015.2440219
Article MathSciNet MATH Google Scholar
Kudo, S., Nitadori, K., Ina, T., Imamura, T.: Implementation and numerical techniques for one eflop/s HPL-AI benchmark on fugaku. In: 11th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA@SC 2020, Atlanta, GA, USA, 13 November 2020, pp. 69–76. IEEE (2020). https://doi.org/10.1109/ScalA51936.2020.00014
Gan, X., et al.: Customizing the HPL for china accelerator. Sci. China Inf. Sci. 61(4), 042 102:1-042 102:11 (2018). https://doi.org/10.1007/s11432-017-9221-0
Article Google Scholar
Kim, J., Lee, S., Johnston, B., Vetter, J.S.: IRIS: a portable runtime system exploiting multiple heterogeneous programming systems. In: Proceedings of the 25th IEEE High Performance Extreme Computing Conference, ser. HPEC 2021, pp. 1–8 (2021)
Google Scholar
Valero-Lara, P., Catalán, S., Martorell, X., Usui, T., Labarta, J.: slass: a fully automatic auto-tuned linear algebra library based on openmp extensions implemented in ompss (lass library). J. Parallel Distributed Comput. 138, 153–171 (2020)
Article Google Scholar
Valero-Lara, P., Catalán, S., Martorell, X., Labarta, J.: BLAS-3 optimized by ompss regions (lass library). In: 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019, Pavia, Italy, 13–15 February 2019, pp. 25–32. IEEE (2019)
Google Scholar
Dongarra, J.J., et al.: PLASMA: parallel linear algebra software for multicore using openmp. ACM Trans. Math. Softw. 45(2), 16:1-16:35 (2019)
Article MathSciNet MATH Google Scholar
Valero-Lara, P., Martínez-Pérez, I., Sirvent, R., Martorell, X., Peña, A.J.: NVIDIA GPUs scalability to solve multiple (batch) tridiagonal systems implementation of cuThomasBatch. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds.) PPAM 2017. LNCS, vol. 10777, pp. 243–253. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78024-5_22
Chapter Google Scholar
Valero-Lara, P., Martínez-Pérez, I., Sirvent, R., Martorell, X., Peña, A.J.: cuThomasBatch and cuThomasVBatch, CUDA routines to compute batch of tridiagonal systems on NVIDIA GPUs. Concurr. Comput. Pract. Exp. 30(24), e4909 (2018)
Article Google Scholar
Valero-Lara, P., Pinelli, A., Favier, J., Matias, M.P.: Block tridiagonal solvers on heterogeneous architectures. In: IEEE 10th International Symposium on Parallel and Distributed Processing with Applications, ser. ISPA 2012, pp. 609–616 (2012)
Google Scholar
Valero-Lara, P., Pinelli, A., Prieto-Matias, M.: Fast finite difference Poisson solvers on heterogeneous architectures. Comput. Phys. Commun. 185(4), 1265–1272 (2014)
Article MathSciNet MATH Google Scholar
Demmel, J.W., Gilbert, J.R., Li, X.S.: An asynchronous parallel supernodal algorithm for sparse gaussian elimination. SIAM J. Matrix Anal. Appl. 20(4), 915–952 (1999)
Article MathSciNet MATH Google Scholar
Trott, C.R., et al.: Kokkos 3: programming model extensions for the exascale era. IEEE Trans. Parallel Distributed Syst. 33(4), 805–817 (2022). https://doi.org/10.1109/TPDS.2021.3097283
Article Google Scholar
Beckingsale, D., Hornung, R.D., Scogland, T., Vargas, A.: Performance portable C++ programming with RAJA. In: Hollingsworth, J.K., Keidar, I. (eds.) Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2019, Washington, DC, USA, 16–20 February 2019, pp. 455–456. ACM (2019)
Google Scholar
Valero-Lara, P., Jansson, J.: Heterogeneous CPU+GPU approaches for mesh refinement over lattice-boltzmann simulations. Concurr. Comput. Pract. Exp. 29(7), e3919 (2017)
Article Google Scholar
Valero-Lara, P., Igual, F.D., Prieto-Matías, M., Pinelli, A., Favier, J.: Accelerating fluid-solid simulations (lattice-boltzmann & immersed-boundary) on heterogeneous architectures. J. Comput. Sci. 10, 249–261 (2015)
Article Google Scholar
Valero-Lara, P., Kim, J., Hernandez, O., Vetter, J.S.: Openmp target task: tasking and target offloading on heterogeneous systems. In: Chaves, R., et al. (eds.) Euro-Par 2021. LNCS, vol. 13098, pp. 445–455. Springer, Cham (2021). https://doi.org/10.1007/978-3-031-06156-1_35
Chapter Google Scholar
Tomov, S., Dongarra, J., Baboulin, M.: Towards dense linear algebra for hybrid GPU accelerated manycore systems. Technical report, 2008-01 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Oak Ridge National Laboratory, Oak Ridge, USA
Pedro Valero-Lara, Jungwon Kim & Jeffrey S. Vetter

Authors

Pedro Valero-Lara
View author publications
You can also search for this author in PubMed Google Scholar
Jungwon Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey S. Vetter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pedro Valero-Lara .

Editor information

Editors and Affiliations

University of Glasgow, Glasgow, UK
Jeremy Singer
University of Glasgow, Glasgow, UK
Yehia Elkhatib
University of Santiago de Compostela, Santiago de Compostela, La Coruña, Spain
Dora Blanco Heras
Louisiana State University, Baton Rouge, LA, USA
Patrick Diehl
University of Edinburgh, Edinburgh, UK
Nick Brown
Universidade de Lisboa, Lisbon, Portugal
Aleksandar Ilic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Valero-Lara, P., Kim, J., Vetter, J.S. (2023). A Portable and Heterogeneous LU Factorization on IRIS. In: Singer, J., Elkhatib, Y., Blanco Heras, D., Diehl, P., Brown, N., Ilic, A. (eds) Euro-Par 2022: Parallel Processing Workshops. Euro-Par 2022. Lecture Notes in Computer Science, vol 13835. Springer, Cham. https://doi.org/10.1007/978-3-031-31209-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-31209-0_2
Published: 02 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31208-3
Online ISBN: 978-3-031-31209-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Portable and Heterogeneous LU Factorization on IRIS

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

MatRIS: Addressing the Challenges for Portability and Heterogeneity Using Tasking for Matrix Decomposition (Cholesky)

The LAMA Approach for Writing Portable Applications on Heterogenous Architectures

Transparent Application Acceleration by Intelligent Scheduling of Shared Library Calls on Heterogeneous Systems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Portable and Heterogeneous LU Factorization on IRIS

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

MatRIS: Addressing the Challenges for Portability and Heterogeneity Using Tasking for Matrix Decomposition (Cholesky)

The LAMA Approach for Writing Portable Applications on Heterogenous Architectures

Transparent Application Acceleration by Intelligent Scheduling of Shared Library Calls on Heterogeneous Systems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation