Abstract
Nowadays GPUs become extremely promising multi/many-core architectures for a wide range of demanding applications. Basic features of these architectures include utilization of a large number of relatively simple processing units which operate in the SIMD fashion, as well as hardware supported, advanced multithreading. However, the utilization of GPUs in an every-day practice is still limited, mainly because of necessity of deep adaptation of implemented algorithms to a target architecture. In this work, we propose how to perform such an adaptation to achieve an efficient parallel implementation of the conjugate gradient (CG) algorithm, which is widely used for solving large sparse linear systems of equations, arising e.g. in FEM problems. Aiming at efficient implementation of the main operation of the CG algorithm, which is sparse matrix-vector multiplication (SpMV), different techniques of optimizing access to the hierarchical memory of GPUs are proposed and studied. The experimental investigation of a proposed CUDA-based implementation of the CG algorithm is carried out on two GPU architectures: GeForce 8800 and Tesla C1060. It has been shown that optimization of access to GPU memory allows us to reduce considerably the execution time of the SpMV operation, and consequently to achieve a significant speedup over CPUs when implementing the whole CG algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Basermann, A., Reichel, B., Schelthoff, C.: Preconditioned CG methods for sparse matrices on massively parallel machines. Parallel Computing Journal 23(3), 381–393 (1997)
Baskaran, M.M., Bordawekar, R.: Optimizing Sparse Matrix-Vector Multiplication on GPUs. IBM Research Report No. RC24704, W0812-047 (2009)
Bell, N., Garland, M.: Efficient Sparse Matrix-Vector Multiplication on CUDA. NVIDIA Tech. Report No. NVR-2008-004 (2008)
CUDPP: CUDA Data Parallel Primitives Library, http://gpgpu.org/developer/cudpp
Dokken, T., Hagen, T.R., Hjelmervik, J.M.: An Introduction to General-Purpose Computing on Programmable Graphics Hardware. In: Geometric Modelling, Numerical Simulation, and Optimization: Applied Mathematics at SINTEF, pp. 123–161. Springer, Heidelberg (2007)
Fujimoto, N.: Faster matrix-vector multiplication on GeForce 8800GTX. In: Proc. IEEE Int. Symp. on Parallel and Distributed Processing - IPDPS 2008, pp. 1–8 (2008)
Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable Parallel Programming with CUDA. Queue 6(2), 40–53 (2008)
NVIDIA CUDA Programming Guide 2.2, http://developer.download.nvidia.com/compute/cuda/2_2/toolkit/docs/NVIDIA_CUDA_Programming_Guide_2.2.pdf
OpenCL - The open standard for parallel programming of heterogeneous systems, http://www.khronos.org/opencl
Olas, T., Karczewski, K., Tomas, A., Wyrzykowski, R.: FEM Computations on Clusters Using Different Models of Parallel Programming. In: Wyrzykowski, R., Dongarra, J., Paprzycki, M., Waśniewski, J. (eds.) PPAM 2001. LNCS, vol. 2328, pp. 170–184. Springer, Heidelberg (2002)
Saad, Y.: SPARSKIT: A basic toolkit for sparse matrix computations, http://www-users.cs.umn.edu/~saad/software/SPARSKIT/paper.ps
Sengupta, S., Harris, M., Zhang, Y., Owens, J.D.: Scan Primitives for GPU Computing. In: Proc. 22nd ACM SIGGRAPH/EUROGRAPHICS Symp. on Graphics Hardware, pp. 97–106 (2007)
Smailbegovic, F.S., Gaydadjiev, G.N., Vassiliadis, S.: Sparse Matrix Storage Format. In: Proc. 16th Ann. Workshop on Circuits, Systems and Signal Processing, ProRisc 2005, pp. 445–448 (2005)
Stathis, P.T.: Sparse Matrix Vector Processing Formats. PhD Thesis, Delft University of Technology (2004), http://ce.et.tudelft.nl/publicationfiles/955_1_Thesis_P_T_Stathis.pdf
Strzodka, R., Doggett, M., Kolb, A.: Scientific computation for simulations on programmable graphics hardware. Simulation Modelling Practice and Theory 13(8), 667–860 (2005)
Tvrdik, P., Simecek, I.: A New Diagonal Blocking Format and Model of Cache Behavior for Sparse Matrices. In: Wyrzykowski, R., Dongarra, J., Meyer, N., Waśniewski, J. (eds.) PPAM 2005. LNCS, vol. 3911, pp. 164–171. Springer, Heidelberg (2006)
Tvrdik, P., Simecek, I.: A New Approach for Accelerating the Sparse Matrix-Vector Multiplication. In: Proc. 8th Int. Symp. on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2006, pp. 156–163. IEEE Computer Society, Los Alamitos (2006)
Vázquez, F., Garzón, E.M., Martínez, J.A., Fernández, J.J.: Accelerating sparse matrix vector product with GPUs. In: Proc. 9th Int. Conf. Computational and Mathematical Methods in Science and Engineering, CMMSE, pp. 1081–1092 (2009)
Vuduc, R.W.: Automatic Performance Tuning of Sparse Matrix Kernels. PhD Thesis, University of California, Berkeley (2003), http://bebop.cs.berkeley.edu/pubs/vuduc2003-dissertation.pdf
Williams, S., Oliker, l., Vuduc, R., Shalf, J., Yelick, K., Demmel, J.: Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Computing 35(3), 178–194 (2009)
Wyrzykowski, R.: PC-clusters and multicore architectures. EXIT, Warsaw (2009) (in Polish)
Wyrzykowski, R., Kanevski, J.: A Technique for Mapping Sparse Matrix Computations into Regular Processor Arrays. In: Lengauer, C., Griebl, M., Gorlatch, S. (eds.) Euro-Par 1997. LNCS, vol. 1300, pp. 310–317. Springer, Heidelberg (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wozniak, M., Olas, T., Wyrzykowski, R. (2010). Parallel Implementation of Conjugate Gradient Method on Graphics Processors. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2009. Lecture Notes in Computer Science, vol 6067. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14390-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-14390-8_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14389-2
Online ISBN: 978-3-642-14390-8
eBook Packages: Computer ScienceComputer Science (R0)