Authors:
J. Jiménez
and
J. Ruiz de Miras
Affiliation:
University of Jaén, Spain
Keyword(s):
Curve-skeleton, 3D Thinning, CUDA, GPGPU, Optimizations, Fermi.
Related
Ontology
Subjects/Areas/Topics:
Computer Vision, Visualization and Computer Graphics
;
Fundamental Methods and Algorithms
;
Geometric Computing
;
Geometry and Modeling
Abstract:
The CUDA programming model allows the programmer to code algorithms for executing in a parallel way on NVIDIA GPU devices. But achieving acceptable acceleration rates writing programs that scale to thousands of independent threads is not always easy, especially when working with algorithms that have high data-sharing or data-dependence requirements. This type of algorithms is very common in fields like volume modelling or image analysis. In this paper we expose a comprehensive collection of optimizations to be considered in any CUDA implementation, and show how we have applied them in practice in a complex and not trivially parallelizable case study: a 3D curve-skeleton calculation algorithm. Two different GPU architectures have been used to test the implications of each optimization, the NVIDIA GT200 architecture and the new Fermi GF100. As a result, although the first direct CUDA implementation of our algorithm ran even slower than its CPU version, overall speedups of 19x (GT200) a
nd 68x (Fermi GF100) were finally achieved.
(More)