CUDA 2D Stencil Computations for the Jacobi Method

José María Cecilia¹⁶,
José Manuel García¹⁶ &
Manuel Ujaldón¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7133))

Included in the following conference series:

International Workshop on Applied Parallel Computing

1598 Accesses

Abstract

We are witnessing the consolidation of the GPUs streaming paradigm in parallel computing. This paper explores stencil operations in CUDA to optimize on GPUs the Jacobi method for solving Laplace’s differential equation. The code keeps constant the access pattern through a large number of loop iterations, that way being representative of a wide set of iterative linear algebra algorithms. Optimizations are focused on data parallelism, threads deployment and the GPU memory hierarchy, whose management is explicit by the CUDA programmer. Experimental results are shown on Nvidia Teslas C870 and C1060 GPUs and compared to a counterpart version optimized on a quadcore Intel CPU. The speed-up factor for our set of GPU optimizations reaches 3-4x and the execution times defeat those of the CPU by a wide margin, also showing great scalability when moving towards a more sophisticated GPU architecture and/or more demanding problem sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

EPSILOD: efficient parallel skeleton for generic iterative stencil computations in distributed GPUs

Article Open access 14 January 2023

Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations

Article 18 July 2016

An Efficient Parallel Adaptive GMG Solver for Large-Scale Stokes Problems

References

Amorim, R., Haase, G., Liebmann, M., Weber dos Santos, R.: Comparing CUDA and OpenGL Implementations for a Jacobi Iteration. Technical Report, Graz University of Technology (December 2008)
Google Scholar
Christen, M., Schenk, O., Neufeld, E., Messmer, P., Burkhart, H.: Parallel Data-Locality Aware Stencil Computations on Modern Micro-Architectures. In: Procs. IEEE Intl. Parallel and Distributed Processing Symposium, Rome (May 2009)
Google Scholar
CUDA: http://developer.nvidia.com/object/cuda.html (accessed September 15, 2010)
Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D.A., Shalf, J., Yelick, K.: Stencil Computation Optimization and Auto-Tuning on State-of-the-art Multicore Architectures. In: Proceedings ACM/IEEE Supercomputing 2008, Austin, TX, USA, pp. 1–12 (November 2008)
Google Scholar
Demmel, J.: Applied Numerical Linear Algebra. SIAM, Philadelphia (1997)
Google Scholar
Firestream: AMD Stream Computing, http://www.amd.com/us/products/workstation/firestream/Pages/firestream.aspx (accessed September 15, 2010)
GPGPU: General-Purpose Computation Using Graphics Hardware (2010), http://www.gpgpu.org
The Khronos Group: The OpenCL Core API Specification. Headers and documentation, http://www.khronos.org/registry/cl (accessed September 15, 2010)
Lester, B.: The Art of Parallel Programming. Prentice Hall, Engl. Cliffs (1993)
Google Scholar
OpenMP: The OpenMP API (2010), http://www.openmp.org
Owens, J., Luebke, D., Govindaraju, Harris, M., Kruger, J., Lefohn, A., Purcell, T.: A Survey of General-Purpose Computation on Graphics Hardware. Journal Computer Graphics Forum 26(1), 80–113 (2007)
Article Google Scholar
Tesla: Nvidia Tesla GPU computing solutions for HPC, http://www.nvidia.com/object/personal_supercomputing.html (accessed September 15, 2010)
Venkatasubramanian, S., Vuduc, R.W.: Tuned and Wildly Asynchronous Stencil Kernels for Hybrid CPU/GPU Systems. In: Proceedings ACM Intl. Conference on Supercomputing, New York, USA (June 2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Engineering and Technology Department, University of Murcia, Spain
José María Cecilia & José Manuel García
Computer Architecture Department, University of Malaga, Spain
Manuel Ujaldón

Authors

José María Cecilia
View author publications
You can also search for this author in PubMed Google Scholar
José Manuel García
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Ujaldón
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Kristján Jónasson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cecilia, J.M., García, J.M., Ujaldón, M. (2012). CUDA 2D Stencil Computations for the Jacobi Method. In: Jónasson, K. (eds) Applied Parallel and Scientific Computing. PARA 2010. Lecture Notes in Computer Science, vol 7133. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28151-8_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-28151-8_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28150-1
Online ISBN: 978-3-642-28151-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

CUDA 2D Stencil Computations for the Jacobi Method

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

EPSILOD: efficient parallel skeleton for generic iterative stencil computations in distributed GPUs

Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations

An Efficient Parallel Adaptive GMG Solver for Large-Scale Stokes Problems

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

CUDA 2D Stencil Computations for the Jacobi Method

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

EPSILOD: efficient parallel skeleton for generic iterative stencil computations in distributed GPUs

Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations

An Efficient Parallel Adaptive GMG Solver for Large-Scale Stokes Problems

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation