[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3404397.3404413acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Efficient Block Algorithms for Parallel Sparse Triangular Solve

Published: 17 August 2020 Publication History

Abstract

The sparse triangular solve (SpTRSV) kernel is an important building block for a number of linear algebra routines such as sparse direct and iterative solvers. The major challenge of accelerating SpTRSV lies in the difficulties of finding higher parallelism. Existing work mainly focuses on reducing dependencies and synchronizations in the level-set methods. However, the 2D block layout of the input matrix has been largely ignored in designing more efficient SpTRSV algorithms.
In this paper, we implement three block algorithms, i.e., column block, row block and recursive block algorithms, for parallel SpTRSV on modern GPUs, and propose an adaptive approach that can automatically select the best kernels according to input sparsity structures. By testing 159 sparse matrices on two high-end NVIDIA GPUs, the experimental results demonstrate that the recursive block algorithm has the best performance among the three block algorithms, and it is on average 4.72x (up to 72.03x) and 9.95x (up to 61.08x) faster than cuSPARSE v2 and Sync-free methods, respectively. Besides, our method merely needs moderate cost for preprocessing the input matrix, thus is highly efficient for multiple right-hand sides and iterative scenarios.

References

[1]
[1] E. Agullo, A. Buttari, A. Guermouche, and F. Lopez. Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems. ACM Trans. Math. Softw., 43(2), 2016.
[2]
[2] E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, H. Ltaief, P. Luszczek, and S. Tomov. Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects. Journal of Physics: Conference Series, 180:012037, 2009.
[3]
[3] K. Akbudak, H. Ltaief, A. Mikhalev, A. Charara, A. Esposito, and D. Keyes. Exploiting Data Sparsity for Large-Scale Matrix Computations. In Euro-Par ’18, pages 721–734, 2018.
[4]
[4] P. Amestoy, A. Buttari, J.-Y. L’Excellent, and T. Mary. On the Complexity of the Block Low-Rank Multifrontal Factorization. SIAM Journal on Scientific Computing, 39(4):A1710–A1740, 2017.
[5]
[5] P. R. Amestoy, A. Buttari, J.-Y. L’Excellent, and T. Mary. Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures. ACM Trans. Math. Softw., 45(1), 2019.
[6]
[6] P. R. Amestoy, A. Buttari, J.-Y. L’Excellent, and T. Mary. Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures. ACM Trans. Math. Softw., 45(1), 2019.
[7]
[7] E. Anderson and Y. Saad. Solving Sparse Triangular Linear Systems on Parallel Computers. International Journal of High Speed Computing, 1(1):73–95, 1989.
[8]
[8] H. Anzt, E. Chow, and J. Dongarra. Iterative Sparse Triangular Solves for Preconditioning. In Euro-Par ’15, pages 650–661. 2015.
[9]
[9] H. Anzt, E. Chow, and J. Dongarra. ParILUT–A New Parallel Threshold ILU Factorization. SIAM Journal on Scientific Computing, 40(4):C503–C519, 2018.
[10]
[10] H. Anzt, E. Chow, T. Huckle, and J. Dongarra. Batched Generation of Incomplete Sparse Approximate Inverses on GPUs. In 2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), pages 49–56, 2016.
[11]
[11] H. Anzt, E. Chow, D. B. Szyld, and J. Dongarra. Domain Overlap for Iterative Sparse Triangular Solves on GPUs. In Software for Exascale Computing - SPPEXA 2013-2015, pages 527–545, 2016.
[12]
[12] H. Anzt, M. Gates, J. Dongarra, M. Kreutzer, G. Wellein, and M. Köhler. Preconditioned Krylov solvers on GPUs. Parallel Computing, 68:32 – 44, 2017.
[13]
[13] H. Anzt, T. Huckle, J. Brackle, and J. Dongarra. Incomplete Sparse Approximate Inverses for Parallel Preconditioning. Parallel Computing, 71:1–22, 2018.
[14]
[14] A. M. Bradley. A Hybrid Multithreaded Direct Sparse Triangular Solver. In SIAM CSC workshop ’16, pages 13–22, 2016.
[15]
[15] A. Buluç and J. R. Gilbert. On the Representation and Multiplication of Hypersparse Matrices. In IPDPS ’08, pages 1–11, 2008.
[16]
[16] D. Buono, F. Petrini, F. Checconi, X. Liu, X. Que, C. Long, and T.-C. Tuan. Optimizing Sparse Matrix-Vector Multiplication for Large-Scale Data Analytics. In ICS ’16, 2016.
[17]
[17] A. Buttari, V. Eijkhout, J. Langou, and S. Filippone. Performance Optimization and Modeling of Blocked Sparse Kernels. The International Journal of High Performance Computing Applications, 21(4):467–484, 2007.
[18]
[18] A. Buttari, J. Langou, J. Kurzak, and J. Dongarra. Parallel Tiled QR Factorization for Multicore Architectures. Concurrency and Computation: Practice and Experience, 20(13):1573–1590, 2008.
[19]
[19] A. Buttari, J. Langou, J. Kurzak, and J. Dongarra. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures. Parallel Computing, 35(1):38 – 53, 2009.
[20]
[20] A. Charara, D. Keyes, and H. Ltaief. A Framework for Dense Triangular Matrix Kernels on Various Manycore Architectures. Concurrency and Computation: Practice and Experience, 29(15):e4187, 2017.
[21]
[21] A. Charara, H. Ltaief, and D. Keyes. Redesigning Triangular Dense Matrix Computations on GPUs. In Euro-Par ’16, pages 477–489, 2016.
[22]
[22] J. Chen, J. Fang, W. Liu, T. Tang, and C. Yang. clMF: A Fine-Grained and Portable Alternating Least Squares Algorithm for Parallel Matrix Factorization. Future Generation Computer Systems, 108:1192–1205, 2020.
[23]
[23] K. Cheshmi, S. Kamil, M. M. Strout, and M. M. Dehnavi. Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis. In SC ’17, page 1–13, 2017.
[24]
[24] K. Cheshmi, S. Kamil, M. M. Strout, and M. M. Dehnavi. ParSy: Inspection and Transformation of Sparse Matrix Computations for Parallelism. In SC ’18, pages 779–793, 2018.
[25]
[25] E. Chow, H. Anzt, J. Scott, and J. Dongarra. Using Jacobi Iterations and Blocking for Solving Sparse Triangular Systems in Incomplete Factorization Preconditioning. Journal of Parallel and Distributed Computing, 119:219 – 230, 2018.
[26]
[26] E. Chow and A. Patel. Fine-Grained Parallel Incomplete LU Factorization. SIAM Journal on Scientific Computing, 37(2):C169–C193, 2015.
[27]
[27] T. Cojean, A. Guermouche, A. Hugo, R. Namyst, and P. Wacrenier. Resource Aggregation for Task-Based Cholesky Factorization on Top of Modern Architectures. Parallel Computing, 83:73 – 92, 2019.
[28]
[28] T. Davis. Direct Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, 2006.
[29]
[29] T. A. Davis and Y. Hu. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw., 38(1):1:1–1:25, 2011.
[30]
[30] N. Ding, S. Williams, Y. Liu, and X. S. Li. Leveraging One-Sided Communication for Sparse Triangular Solvers. In SIAM PP ’20, pages 93–105, 2020.
[31]
[31] J. Dongarra, V. Eijkhout, and P. Łuszczek. Recursive Approach in Sparse Matrix LU Factorization. Scientific Programming, 9(1):51–60, 2001.
[32]
[32] J. Dongarra, M. Faverge, H. Ltaief, and P. Luszczek. Achieving Numerical Accuracy and High Performance Using Recursive Tile LU Factorization with Partial Pivoting. Concurrency and Computation: Practice and Experience, 26(7):1408–1431, 2014.
[33]
[33] I. S. Duff, A. M. Erisman, and J. K. Reid. Direct Methods for Sparse Matrices. Oxford University Press, Inc., 2nd edition, 2017.
[34]
[34] I. S. Duff and B. Uçar. On the Block Triangular Form of Symmetric Matrices. SIAM Review, 52(3):455–470, 2010.
[35]
[35] E. Dufrechou and P. Ezzatti. A New GPU Algorithm to Compute a Level Set-Based Analysis for the Parallel Solution of Sparse Triangular Systems. In IPDPS ’18, pages 920–929, 2018.
[36]
[36] E. Dufrechou and P. Ezzatti. Solving Sparse Triangular Linear Systems in Modern GPUs: A Synchronization-Free Algorithm. In PDP ’18, pages 196–203, 2018.
[37]
[37] J. González-Domínguez, M. J. Martín, G. L. Taboada, and J. Touriño. Dense Triangular Solvers on Multicore Clusters using UPC. Procedia Computer Science, 4:231 – 240, 2011.
[38]
[38] L. Grigori, J. W. Demmel, and X. S. Li. Parallel Symbolic Factorization for Sparse LU with Static Pivoting. SIAM Journal on Scientific Computing, 29(3):1289–1314, 2007.
[39]
[39] A. Haidar, H. Ltaief, A. YarKhan, and J. Dongarra. Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures. Concurrency and Computation: Practice and Experience, 24(3):305–321, 2012.
[40]
[40] J. D. Hogg. A Fast Dense Triangular Solve in CUDA. SIAM Journal on Scientific Computing, 35(3):C303–C322, 2013.
[41]
[41] K. Hou, W. Liu, H. Wang, and W.-c. Feng. Fast Segmented Sort on GPUs. In ICS ’17, pages 12:1–12:10, 2017.
[42]
[42] D. Irony and S. Toledo. Trading Replication for Communication in Parallel Distributed-Memory Dense Solvers. Parallel Processing Letters, 12(01):79–94, 2002.
[43]
[43] H. Kabir, J. D. Booth, G. Aupy, A. Benoit, Y. Robert, and P. Raghavan. STS-k: A Multilevel Sparse Triangular Solution Scheme for NUMA Multicores. In SC ’15, pages 55:1–55:11, 2015.
[44]
[44] A. Li, W. Liu, M. R. B. Kristensen, B. Vinter, H. Wang, K. Hou, A. Marquez, and S. L. Song. Exploring and Analyzing the Real Impact of Modern On-package Memory on HPC Scientific Kernels. In SC ’17, pages 26:1–26:14, 2017.
[45]
[45] R. Li and Y. Saad. GPU-Accelerated Preconditioned Iterative Linear Solvers. The Journal of Supercomputing, 63(2):443–466, 2013.
[46]
[46] X. S. Li. An Overview of SuperLU: Algorithms, Implementation, and User Interface. ACM Trans. Math. Softw., 31(3):302–325, 2005.
[47]
[47] J. Liu, X. He, W. Liu, and G. Tan. Register-Aware Optimizations for Parallel Sparse Matrix-Matrix Multiplication. International Journal of Parallel Programming, page 403–417, 2019.
[48]
[48] W. Liu. Parallel and Scalable Sparse Basic Linear Algebra Subprograms. PhD thesis, University of Copenhagen, 2015.
[49]
[49] W. Liu, A. Li, J. Hogg, I. S. Duff, and B. Vinter. A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves. In Euro-Par ’16, pages 617–630, 2016.
[50]
[50] W. Liu, A. Li, J. D. Hogg, I. S. Duff, and B. Vinter. Fast Synchronization-Free Algorithms for Parallel Sparse Triangular Solves with Multiple Right-Hand Sides. Concurrency and Computation: Practice and Experience, 29(21):e4244–n/a, 2017.
[51]
[51] W. Liu and B. Vinter. A Framework for General Sparse Matrix-Matrix Multiplication on GPUs and Heterogeneous Processors. Journal of Parallel and Distributed Computing, 85(C):47–61, 2015.
[52]
[52] W. Liu and B. Vinter. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. In ICS ’15, pages 339–350, 2015.
[53]
[53] W. Liu and B. Vinter. Speculative Segmented Sum for Sparse Matrix-vector Multiplication on Heterogeneous Processors. Parallel Computing, 49(C):179–193, 2015.
[54]
[54] Y. Liu, M. Jacquelin, P. Ghysels, and X. S. Li. Highly Scalable Distributed-Memory Sparse Triangular Solution Algorithms. In SIAM CSC workshop ’18, pages 87–96.
[55]
[55] K. K. Matam and K. Kothapalli. Accelerating Sparse Matrix Vector Multiplication in Iterative Methods Using GPU. In ICPP ’11, pages 612–621, 2011.
[56]
[56] J. Mayer. Parallel Algorithms for Solving Linear Systems with Sparse Triangular Matrices. Computing, 86(4):291–312, 2009.
[57]
[57] M. S. Mohammadi, T. Yuki, K. Cheshmi, E. C. Davis, M. Hall, M. M. Dehnavi, P. Nandy, C. Olschanowsky, A. Venkat, and M. M. Strout. Sparse Computation Data Dependence Simplification for Efficient Compiler-Generated Inspectors. In PLDI ’19, page 594–609, 2019.
[58]
[58] M. Naumov. Parallel Solution of Sparse Triangular Linear Systems in the Preconditioned Iterative Methods on the GPU. Technical report, NVIDIA, 2011.
[59]
[59] M. Naumov, P. Castonguay, and J. Cohen. Parallel Graph Coloring with Applications to the Incomplete-LU Factorization on the GPU. Nvidia White Paper, 2015.
[60]
[60] J. Park, M. Smelyanskiy, N. Sundaram, and P. Dubey. Sparsifying Synchronization for High-Performance Shared-Memory Sparse Triangular Solver. In ISC ’14, pages 124–140, 2014.
[61]
[61] A. Picciau, G. E. Inggs, J. Wickerson, E. C. Kerrigan, and G. A. Constantinides. Balancing Locality and Concurrency: Solving Sparse Triangular Systems on GPUs. In HiPC ’16, 2016.
[62]
[62] Y. Saad. Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2nd edition, 2003.
[63]
[63] F. Sadi, J. Sweeney, T. M. Low, J. C. Hoe, L. Pileggi, and F. Franchetti. Efficient SpMV Operation for Large and Highly Sparse Matrices Using Scalable Multi-Way Merge Parallelization. In MICRO ’19, page 347–358, 2019.
[64]
[64] J. H. Saltz. Aggregation Methods for Solving Sparse Triangular Systems on Multiprocessors. SIAM Journal on Scientific and Statistical Computing, 11(1):123–144, 1990.
[65]
[65] P. Sao, R. Kannan, X. S. Li, and R. Vuduc. A Communication-Avoiding 3D Sparse Triangular Solver. In ICS ’19, page 127–137, 2019.
[66]
[66] E. Saule, K. Kaya, and Ü. V. Çatalyürek. Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi. In PPAM ’14, pages 559–570, 2014.
[67]
[67] R. Schreiber and W.-P. Tang. Vectorizing the Conjugate Gradient Method. In Proceedings of the Symposium on CYBER 205 Applications, 1982.
[68]
[68] M. M. Strout, M. Hall, and C. Olschanowsky. The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code. Proceedings of the IEEE, 106(11):1921–1934, 2018.
[69]
[69] M. M. Strout, A. LaMielle, L. Carter, J. Ferrante, B. Kreaseck, and C. Olschanowsky. An Approach for Code Generation in the Sparse Polyhedral Framework. Parallel Computing, 53:32 – 57, 2016.
[70]
[70] J. Su, F. Zhang, W. Liu, B. He, R. Wu, X. Du, and R. Wang. CapelliniSpTRSV: A Thread-Level Synchronization-Free Sparse Triangular Solve on GPUs. In ICPP ’20, 2020.
[71]
[71] B. Suchoski, C. Severn, M. Shantharam, and P. Raghavan. Adapting Sparse Triangular Solution to GPUs. In ICPPW ’12, pages 140–148, 2012.
[72]
[72] D. T. Vooturi, G. Varma, and K. Kothapalli. Dynamic Block Sparse Reparameterization of Convolutional Neural Networks. In ICCV ’19 Workshops, Oct 2019.
[73]
[73] B. Uçar and C. Aykanat. Partitioning Sparse Matrices for Parallel Preconditioned Iterative Methods. SIAM Journal on Scientific Computing, 29(4):1683–1709, 2007.
[74]
[74] A. Venkat, M. S. Mohammadi, J. Park, H. Rong, R. Barik, M. M. Strout, and M. Hall. Automating Wavefront Parallelization for Sparse Matrix Computations. In SC ’16, pages 480–491, 2016.
[75]
[75] D. T. Vooturi and K. Kothapalli. Efficient Sparse Neural Networks Using Regularized Multi Block Sparsity Pattern on a GPU. In HiPC ’19, pages 215–224, 2019.
[76]
[76] R. Vuduc, S. Kamil, J. Hsu, R. Nishtala, J. W. Demmel, and K. A. Yelick. Automatic Performance Tuning and Analysis of Sparse Triangular Solve. In ICS ’02 Workshop, 2002.
[77]
[77] H. Wang, W. Liu, K. Hou, and W.-c. Feng. Parallel Transposition of Sparse Data Structures. In ICS ’16, pages 33:1–33:13, 2016.
[78]
[78] X. Wang, W. Liu, W. Xue, and L. Wu. SwSpTRSV: A Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architectures. In PPoPP ’18, page 338–353, 2018.
[79]
[79] X. Wang, P. Xu, W. Xue, Y. Ao, C. Yang, H. Fu, L. Gan, G. Yang, and W. Zheng. A Fast Sparse Triangular Solver for Structured-Grid Problems on Sunway Many-Core Processor SW26010. In ICPP ’18, 2018.
[80]
[80] T. Wicky, E. Solomonik, and T. Hoefler. Communication-Avoiding Parallel Algorithms for Solving Triangular Systems of Linear Equations. In IPDPS ’17, pages 678–687, 2017.
[81]
[81] M. Wittmann, G. Hager, R. Janalik, M. Lanser, A. Klawonn, O. Rheinbach, O. Schenk, and G. Wellein. Multicore Performance Engineering of Sparse Triangular Solves Using a Modified Roofline Model. In SBAC-PAD ’18, pages 233–241, 2018.
[82]
[82] M. M. Wolf, M. A. Heroux, and E. G. Boman. Factors Impacting Performance of Multithreaded Sparse Triangular Solve. In VECPAR ’10, pages 32–44. 2011.
[83]
[83] Z. Xie, G. Tan, W. Liu, and N. Sun. IA-SpGEMM: An Input-Aware Auto-Tuning Framework for Parallel Sparse Matrix-Matrix Multiplication. In ICS ’19, pages 94–105, 2019.
[84]
[84] B. Yılmaz, B. Sipahioğrlu, N. Ahmad, and D. Unat. Adaptive Level Binning: A New Algorithm for Solving Sparse Triangular Systems. In HPC Asia ’20, page 188–198, 2020.
[85]
[85] F. Zhang, W. Liu, N. Feng, J. Zhai, and X. Du. Performance Evaluation and Analysis of Sparse Matrix and Graph Kernels on Heterogeneous Processors. CCF Transactions on High Performance Computing, pages 131–143, 2019.
[86]
[86] F. Zhang, J. Zhai, B. Wu, B. He, W. Chen, and X. Du. Automatic Irregularity-Aware Fine-Grained Workload Partitioning on Integrated Architectures. IEEE Transactions on Knowledge and Data Engineering, 2019.

Cited By

View all
  • (2024)AG-SpTRSV: An Automatic Framework to Optimize Sparse Triangular Solve on GPUsACM Transactions on Architecture and Code Optimization10.1145/367491121:4(1-25)Online publication date: 25-Jun-2024
  • (2024)LevelST: Stream-based Accelerator for Sparse Triangular SolverProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637568(67-77)Online publication date: 1-Apr-2024
  • (2024)A new level-set analysis and sparse storage format for the SPTRSV in GPUs2024 IEEE 36th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD63648.2024.00014(59-69)Online publication date: 13-Nov-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '20: Proceedings of the 49th International Conference on Parallel Processing
August 2020
844 pages
ISBN:9781450388160
DOI:10.1145/3404397
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 August 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPU
  2. block algorithm
  3. sparse matrix
  4. sparse triangular solve

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICPP '20

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)81
  • Downloads (Last 6 weeks)7
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)AG-SpTRSV: An Automatic Framework to Optimize Sparse Triangular Solve on GPUsACM Transactions on Architecture and Code Optimization10.1145/367491121:4(1-25)Online publication date: 25-Jun-2024
  • (2024)LevelST: Stream-based Accelerator for Sparse Triangular SolverProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637568(67-77)Online publication date: 1-Apr-2024
  • (2024)A new level-set analysis and sparse storage format for the SPTRSV in GPUs2024 IEEE 36th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD63648.2024.00014(59-69)Online publication date: 13-Nov-2024
  • (2024)pSyncPIM: Partially Synchronous Execution of Sparse Matrix Operations for All-Bank PIM Architectures2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00034(354-367)Online publication date: 29-Jun-2024
  • (2023)Design and Implementation of a Parallel Algorithm for Solving Linear Equations Using Gaussian Elimination MethodProceedings of the 2023 7th International Conference on Electronic Information Technology and Computer Engineering10.1145/3650400.3650588(1114-1118)Online publication date: 20-Oct-2023
  • (2023)TileSpTRSV: a tiled algorithm for parallel sparse triangular solve on GPUsCCF Transactions on High Performance Computing10.1007/s42514-023-00151-15:2(129-143)Online publication date: 12-Jun-2023
  • (2023)Toward efficient structured-grid triangular solver on sunway many-core processorsThe Journal of Supercomputing10.1007/s11227-023-05802-280:8(10610-10636)Online publication date: 27-Dec-2023
  • (2022)TileSpGEMMProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508431(90-106)Online publication date: 2-Apr-2022
  • (2022)swSuperLU: A highly scalable sparse direct solver on Sunway manycore architectureThe Journal of Supercomputing10.1007/s11227-021-04270-w78:9(11441-11463)Online publication date: 1-Jun-2022
  • (2021)A Split Execution Model for SpTRSVIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.307450132:11(2809-2822)Online publication date: 1-Nov-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media