Abstract
We are witnessing the consolidation of the GPUs streaming paradigm in parallel computing. This paper explores stencil operations in CUDA to optimize on GPUs the Jacobi method for solving Laplace’s differential equation. The code keeps constant the access pattern through a large number of loop iterations, that way being representative of a wide set of iterative linear algebra algorithms. Optimizations are focused on data parallelism, threads deployment and the GPU memory hierarchy, whose management is explicit by the CUDA programmer. Experimental results are shown on Nvidia Teslas C870 and C1060 GPUs and compared to a counterpart version optimized on a quadcore Intel CPU. The speed-up factor for our set of GPU optimizations reaches 3-4x and the execution times defeat those of the CPU by a wide margin, also showing great scalability when moving towards a more sophisticated GPU architecture and/or more demanding problem sizes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Amorim, R., Haase, G., Liebmann, M., Weber dos Santos, R.: Comparing CUDA and OpenGL Implementations for a Jacobi Iteration. Technical Report, Graz University of Technology (December 2008)
Christen, M., Schenk, O., Neufeld, E., Messmer, P., Burkhart, H.: Parallel Data-Locality Aware Stencil Computations on Modern Micro-Architectures. In: Procs. IEEE Intl. Parallel and Distributed Processing Symposium, Rome (May 2009)
CUDA: http://developer.nvidia.com/object/cuda.html (accessed September 15, 2010)
Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D.A., Shalf, J., Yelick, K.: Stencil Computation Optimization and Auto-Tuning on State-of-the-art Multicore Architectures. In: Proceedings ACM/IEEE Supercomputing 2008, Austin, TX, USA, pp. 1–12 (November 2008)
Demmel, J.: Applied Numerical Linear Algebra. SIAM, Philadelphia (1997)
Firestream: AMD Stream Computing, http://www.amd.com/us/products/workstation/firestream/Pages/firestream.aspx (accessed September 15, 2010)
GPGPU: General-Purpose Computation Using Graphics Hardware (2010), http://www.gpgpu.org
The Khronos Group: The OpenCL Core API Specification. Headers and documentation, http://www.khronos.org/registry/cl (accessed September 15, 2010)
Lester, B.: The Art of Parallel Programming. Prentice Hall, Engl. Cliffs (1993)
OpenMP: The OpenMP API (2010), http://www.openmp.org
Owens, J., Luebke, D., Govindaraju, Harris, M., Kruger, J., Lefohn, A., Purcell, T.: A Survey of General-Purpose Computation on Graphics Hardware. Journal Computer Graphics Forum 26(1), 80–113 (2007)
Tesla: Nvidia Tesla GPU computing solutions for HPC, http://www.nvidia.com/object/personal_supercomputing.html (accessed September 15, 2010)
Venkatasubramanian, S., Vuduc, R.W.: Tuned and Wildly Asynchronous Stencil Kernels for Hybrid CPU/GPU Systems. In: Proceedings ACM Intl. Conference on Supercomputing, New York, USA (June 2009)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cecilia, J.M., García, J.M., Ujaldón, M. (2012). CUDA 2D Stencil Computations for the Jacobi Method. In: Jónasson, K. (eds) Applied Parallel and Scientific Computing. PARA 2010. Lecture Notes in Computer Science, vol 7133. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28151-8_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-28151-8_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28150-1
Online ISBN: 978-3-642-28151-8
eBook Packages: Computer ScienceComputer Science (R0)