Parallel Implementation of Conjugate Gradient Method on Graphics Processors

Marcin Wozniak²⁰,
Tomasz Olas²⁰ &
Roman Wyrzykowski²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6067))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

1575 Accesses
4 Citations

Abstract

Nowadays GPUs become extremely promising multi/many-core architectures for a wide range of demanding applications. Basic features of these architectures include utilization of a large number of relatively simple processing units which operate in the SIMD fashion, as well as hardware supported, advanced multithreading. However, the utilization of GPUs in an every-day practice is still limited, mainly because of necessity of deep adaptation of implemented algorithms to a target architecture. In this work, we propose how to perform such an adaptation to achieve an efficient parallel implementation of the conjugate gradient (CG) algorithm, which is widely used for solving large sparse linear systems of equations, arising e.g. in FEM problems. Aiming at efficient implementation of the main operation of the CG algorithm, which is sparse matrix-vector multiplication (SpMV), different techniques of optimizing access to the hierarchical memory of GPUs are proposed and studied. The experimental investigation of a proposed CUDA-based implementation of the CG algorithm is carried out on two GPU architectures: GeForce 8800 and Tesla C1060. It has been shown that optimization of access to GPU memory allows us to reduce considerably the execution time of the SpMV operation, and consequently to achieve a significant speedup over CPUs when implementing the whole CG algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

GPGPU-based parallel computing applied in the FEM using the conjugate gradient algorithm: a review

Article 22 June 2018

Strategies for Efficient Execution of Pipelined Conjugate Gradient Method on GPU Systems

Computing the Sparse Matrix-Vector Product in High-Precision Arithmetic for GPU Architectures

References

Basermann, A., Reichel, B., Schelthoff, C.: Preconditioned CG methods for sparse matrices on massively parallel machines. Parallel Computing Journal 23(3), 381–393 (1997)
Article MATH MathSciNet Google Scholar
Baskaran, M.M., Bordawekar, R.: Optimizing Sparse Matrix-Vector Multiplication on GPUs. IBM Research Report No. RC24704, W0812-047 (2009)
Google Scholar
Bell, N., Garland, M.: Efficient Sparse Matrix-Vector Multiplication on CUDA. NVIDIA Tech. Report No. NVR-2008-004 (2008)
Google Scholar
CUDPP: CUDA Data Parallel Primitives Library, http://gpgpu.org/developer/cudpp
Dokken, T., Hagen, T.R., Hjelmervik, J.M.: An Introduction to General-Purpose Computing on Programmable Graphics Hardware. In: Geometric Modelling, Numerical Simulation, and Optimization: Applied Mathematics at SINTEF, pp. 123–161. Springer, Heidelberg (2007)
Chapter Google Scholar
Fujimoto, N.: Faster matrix-vector multiplication on GeForce 8800GTX. In: Proc. IEEE Int. Symp. on Parallel and Distributed Processing - IPDPS 2008, pp. 1–8 (2008)
Google Scholar
Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable Parallel Programming with CUDA. Queue 6(2), 40–53 (2008)
Article Google Scholar
NVIDIA CUDA Programming Guide 2.2, http://developer.download.nvidia.com/compute/cuda/2_2/toolkit/docs/NVIDIA_CUDA_Programming_Guide_2.2.pdf
OpenCL - The open standard for parallel programming of heterogeneous systems, http://www.khronos.org/opencl
Olas, T., Karczewski, K., Tomas, A., Wyrzykowski, R.: FEM Computations on Clusters Using Different Models of Parallel Programming. In: Wyrzykowski, R., Dongarra, J., Paprzycki, M., Waśniewski, J. (eds.) PPAM 2001. LNCS, vol. 2328, pp. 170–184. Springer, Heidelberg (2002)
Chapter Google Scholar
Saad, Y.: SPARSKIT: A basic toolkit for sparse matrix computations, http://www-users.cs.umn.edu/~saad/software/SPARSKIT/paper.ps
Sengupta, S., Harris, M., Zhang, Y., Owens, J.D.: Scan Primitives for GPU Computing. In: Proc. 22nd ACM SIGGRAPH/EUROGRAPHICS Symp. on Graphics Hardware, pp. 97–106 (2007)
Google Scholar
Smailbegovic, F.S., Gaydadjiev, G.N., Vassiliadis, S.: Sparse Matrix Storage Format. In: Proc. 16th Ann. Workshop on Circuits, Systems and Signal Processing, ProRisc 2005, pp. 445–448 (2005)
Google Scholar
Stathis, P.T.: Sparse Matrix Vector Processing Formats. PhD Thesis, Delft University of Technology (2004), http://ce.et.tudelft.nl/publicationfiles/955_1_Thesis_P_T_Stathis.pdf
Strzodka, R., Doggett, M., Kolb, A.: Scientific computation for simulations on programmable graphics hardware. Simulation Modelling Practice and Theory 13(8), 667–860 (2005)
Article Google Scholar
Tvrdik, P., Simecek, I.: A New Diagonal Blocking Format and Model of Cache Behavior for Sparse Matrices. In: Wyrzykowski, R., Dongarra, J., Meyer, N., Waśniewski, J. (eds.) PPAM 2005. LNCS, vol. 3911, pp. 164–171. Springer, Heidelberg (2006)
Chapter Google Scholar
Tvrdik, P., Simecek, I.: A New Approach for Accelerating the Sparse Matrix-Vector Multiplication. In: Proc. 8th Int. Symp. on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2006, pp. 156–163. IEEE Computer Society, Los Alamitos (2006)
Chapter Google Scholar
Vázquez, F., Garzón, E.M., Martínez, J.A., Fernández, J.J.: Accelerating sparse matrix vector product with GPUs. In: Proc. 9th Int. Conf. Computational and Mathematical Methods in Science and Engineering, CMMSE, pp. 1081–1092 (2009)
Google Scholar
Vuduc, R.W.: Automatic Performance Tuning of Sparse Matrix Kernels. PhD Thesis, University of California, Berkeley (2003), http://bebop.cs.berkeley.edu/pubs/vuduc2003-dissertation.pdf
Williams, S., Oliker, l., Vuduc, R., Shalf, J., Yelick, K., Demmel, J.: Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Computing 35(3), 178–194 (2009)
Article Google Scholar
Wyrzykowski, R.: PC-clusters and multicore architectures. EXIT, Warsaw (2009) (in Polish)
Google Scholar
Wyrzykowski, R., Kanevski, J.: A Technique for Mapping Sparse Matrix Computations into Regular Processor Arrays. In: Lengauer, C., Griebl, M., Gorlatch, S. (eds.) Euro-Par 1997. LNCS, vol. 1300, pp. 310–317. Springer, Heidelberg (1997)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computational and Information Sciences, Czestochowa University of Technology, Dabrowskiego 74, 42-201, Czestochowa, Poland
Marcin Wozniak, Tomasz Olas & Roman Wyrzykowski

Authors

Marcin Wozniak
View author publications
You can also search for this author in PubMed Google Scholar
Tomasz Olas
View author publications
You can also search for this author in PubMed Google Scholar
Roman Wyrzykowski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computational and Information Sciences, Czestochowa University of Technology,
Roman Wyrzykowski
Department of Electrical Engineering and Computer Science, University of Tennessee, TN 37996-3450, Knoxville, USA
Jack Dongarra
Institute of Computer and Information Science, Czestochowa University of Technology, Dabrowskiego 73, PL-42-200, Czestochowa, Poland
Konrad Karczewski
Department of Informatics and Mathematical Modeling, Technical University of Denmark, Richard Petersens Plads, Building 321, 2800, Kongens Lyngby, Denmark
Jerzy Wasniewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wozniak, M., Olas, T., Wyrzykowski, R. (2010). Parallel Implementation of Conjugate Gradient Method on Graphics Processors. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2009. Lecture Notes in Computer Science, vol 6067. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14390-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-14390-8_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14389-2
Online ISBN: 978-3-642-14390-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Parallel Implementation of Conjugate Gradient Method on Graphics Processors

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

GPGPU-based parallel computing applied in the FEM using the conjugate gradient algorithm: a review

Strategies for Efficient Execution of Pipelined Conjugate Gradient Method on GPU Systems

Computing the Sparse Matrix-Vector Product in High-Precision Arithmetic for GPU Architectures

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Parallel Implementation of Conjugate Gradient Method on Graphics Processors

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

GPGPU-based parallel computing applied in the FEM using the conjugate gradient algorithm: a review

Strategies for Efficient Execution of Pipelined Conjugate Gradient Method on GPU Systems

Computing the Sparse Matrix-Vector Product in High-Precision Arithmetic for GPU Architectures

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation