[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Parallel Implementation of Conjugate Gradient Method on Graphics Processors

  • Conference paper
Parallel Processing and Applied Mathematics (PPAM 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6067))

Abstract

Nowadays GPUs become extremely promising multi/many-core architectures for a wide range of demanding applications. Basic features of these architectures include utilization of a large number of relatively simple processing units which operate in the SIMD fashion, as well as hardware supported, advanced multithreading. However, the utilization of GPUs in an every-day practice is still limited, mainly because of necessity of deep adaptation of implemented algorithms to a target architecture. In this work, we propose how to perform such an adaptation to achieve an efficient parallel implementation of the conjugate gradient (CG) algorithm, which is widely used for solving large sparse linear systems of equations, arising e.g. in FEM problems. Aiming at efficient implementation of the main operation of the CG algorithm, which is sparse matrix-vector multiplication (SpMV), different techniques of optimizing access to the hierarchical memory of GPUs are proposed and studied. The experimental investigation of a proposed CUDA-based implementation of the CG algorithm is carried out on two GPU architectures: GeForce 8800 and Tesla C1060. It has been shown that optimization of access to GPU memory allows us to reduce considerably the execution time of the SpMV operation, and consequently to achieve a significant speedup over CPUs when implementing the whole CG algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 71.50
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 89.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Basermann, A., Reichel, B., Schelthoff, C.: Preconditioned CG methods for sparse matrices on massively parallel machines. Parallel Computing Journal 23(3), 381–393 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  2. Baskaran, M.M., Bordawekar, R.: Optimizing Sparse Matrix-Vector Multiplication on GPUs. IBM Research Report No. RC24704, W0812-047 (2009)

    Google Scholar 

  3. Bell, N., Garland, M.: Efficient Sparse Matrix-Vector Multiplication on CUDA. NVIDIA Tech. Report No. NVR-2008-004 (2008)

    Google Scholar 

  4. CUDPP: CUDA Data Parallel Primitives Library, http://gpgpu.org/developer/cudpp

  5. Dokken, T., Hagen, T.R., Hjelmervik, J.M.: An Introduction to General-Purpose Computing on Programmable Graphics Hardware. In: Geometric Modelling, Numerical Simulation, and Optimization: Applied Mathematics at SINTEF, pp. 123–161. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  6. Fujimoto, N.: Faster matrix-vector multiplication on GeForce 8800GTX. In: Proc. IEEE Int. Symp. on Parallel and Distributed Processing - IPDPS 2008, pp. 1–8 (2008)

    Google Scholar 

  7. Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable Parallel Programming with CUDA. Queue 6(2), 40–53 (2008)

    Article  Google Scholar 

  8. NVIDIA CUDA Programming Guide 2.2, http://developer.download.nvidia.com/compute/cuda/2_2/toolkit/docs/NVIDIA_CUDA_Programming_Guide_2.2.pdf

  9. OpenCL - The open standard for parallel programming of heterogeneous systems, http://www.khronos.org/opencl

  10. Olas, T., Karczewski, K., Tomas, A., Wyrzykowski, R.: FEM Computations on Clusters Using Different Models of Parallel Programming. In: Wyrzykowski, R., Dongarra, J., Paprzycki, M., Waśniewski, J. (eds.) PPAM 2001. LNCS, vol. 2328, pp. 170–184. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  11. Saad, Y.: SPARSKIT: A basic toolkit for sparse matrix computations, http://www-users.cs.umn.edu/~saad/software/SPARSKIT/paper.ps

  12. Sengupta, S., Harris, M., Zhang, Y., Owens, J.D.: Scan Primitives for GPU Computing. In: Proc. 22nd ACM SIGGRAPH/EUROGRAPHICS Symp. on Graphics Hardware, pp. 97–106 (2007)

    Google Scholar 

  13. Smailbegovic, F.S., Gaydadjiev, G.N., Vassiliadis, S.: Sparse Matrix Storage Format. In: Proc. 16th Ann. Workshop on Circuits, Systems and Signal Processing, ProRisc 2005, pp. 445–448 (2005)

    Google Scholar 

  14. Stathis, P.T.: Sparse Matrix Vector Processing Formats. PhD Thesis, Delft University of Technology (2004), http://ce.et.tudelft.nl/publicationfiles/955_1_Thesis_P_T_Stathis.pdf

  15. Strzodka, R., Doggett, M., Kolb, A.: Scientific computation for simulations on programmable graphics hardware. Simulation Modelling Practice and Theory 13(8), 667–860 (2005)

    Article  Google Scholar 

  16. Tvrdik, P., Simecek, I.: A New Diagonal Blocking Format and Model of Cache Behavior for Sparse Matrices. In: Wyrzykowski, R., Dongarra, J., Meyer, N., Waśniewski, J. (eds.) PPAM 2005. LNCS, vol. 3911, pp. 164–171. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  17. Tvrdik, P., Simecek, I.: A New Approach for Accelerating the Sparse Matrix-Vector Multiplication. In: Proc. 8th Int. Symp. on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2006, pp. 156–163. IEEE Computer Society, Los Alamitos (2006)

    Chapter  Google Scholar 

  18. Vázquez, F., Garzón, E.M., Martínez, J.A., Fernández, J.J.: Accelerating sparse matrix vector product with GPUs. In: Proc. 9th Int. Conf. Computational and Mathematical Methods in Science and Engineering, CMMSE, pp. 1081–1092 (2009)

    Google Scholar 

  19. Vuduc, R.W.: Automatic Performance Tuning of Sparse Matrix Kernels. PhD Thesis, University of California, Berkeley (2003), http://bebop.cs.berkeley.edu/pubs/vuduc2003-dissertation.pdf

  20. Williams, S., Oliker, l., Vuduc, R., Shalf, J., Yelick, K., Demmel, J.: Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Computing 35(3), 178–194 (2009)

    Article  Google Scholar 

  21. Wyrzykowski, R.: PC-clusters and multicore architectures. EXIT, Warsaw (2009) (in Polish)

    Google Scholar 

  22. Wyrzykowski, R., Kanevski, J.: A Technique for Mapping Sparse Matrix Computations into Regular Processor Arrays. In: Lengauer, C., Griebl, M., Gorlatch, S. (eds.) Euro-Par 1997. LNCS, vol. 1300, pp. 310–317. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wozniak, M., Olas, T., Wyrzykowski, R. (2010). Parallel Implementation of Conjugate Gradient Method on Graphics Processors. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2009. Lecture Notes in Computer Science, vol 6067. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14390-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14390-8_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14389-2

  • Online ISBN: 978-3-642-14390-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics