[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
Skip header Section
OpenCL Programming GuideJuly 2011
Publisher:
  • Addison-Wesley Professional
ISBN:978-0-321-74964-2
Published:23 July 2011
Pages:
648
Skip Bibliometrics Section
Reflects downloads up to 18 Dec 2024Bibliometrics
Skip Abstract Section
Abstract

Using the new OpenCL (Open Computing Language) standard, you can write applications that access all available programming resources: CPUs, GPUs, and other processors such as DSPs and the Cell/B.E. processor. Already implemented by Apple, AMD, Intel, IBM, NVIDIA, and other leaders, OpenCL has outstanding potential for PCs, servers, handheld/embedded devices, high performance computing, and even cloud systems. This is the first comprehensive, authoritative, and practical guide to OpenCL 1.1 specifically for working developers and software architects. Written by five leading OpenCL authorities, OpenCL Programming Guide covers the entire specification. It reviews key use cases, shows how OpenCL can express a wide range of parallel algorithms, and offers complete reference material on both the API and OpenCL C programming language. Through complete case studies and downloadable code examples, the authors show how to write complex parallel programs that decompose workloads across many different devices. They also present all the essentials of OpenCL software performance optimization, including probing and adapting to hardware. Coverage includes Understanding OpenCLs architecture, concepts, terminology, goals, and rationaleProgramming with OpenCL C and the runtime APIUsing buffers, sub-buffers, images, samplers, and eventsSharing and synchronizing data with OpenGL and Microsofts Direct3DSimplifying development with the C++ Wrapper APIUsing OpenCL Embedded Profiles to support devices ranging from cellphones to supercomputer nodesCase studies dealing with physics simulation; image and signal processing, such as image histograms, edge detection filters, Fast Fourier Transforms, and optical flow; math libraries, such as matrix multiplication and high-performance sparse matrix multiplication; and more

Cited By

  1. Lin X, Lai L and Li H (2024). Parallel Static Learning Toward Heterogeneous Computing Architectures, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 43:3, (983-993), Online publication date: 1-Mar-2024.
  2. Faqir-Rhazoui Y and García C (2023). Exploring the performance and portability of the k-means algorithm on SYCL across CPU and GPU architectures, The Journal of Supercomputing, 79:16, (18480-18506), Online publication date: 1-Nov-2023.
  3. Liu F, Fredriksson A and Markidis S (2022). A survey of HPC algorithms and frameworks for large-scale gradient-based nonlinear optimization, The Journal of Supercomputing, 78:16, (17513-17542), Online publication date: 1-Nov-2022.
  4. ACM
    Hogervorst T, Nane R, Marchiori G, Qiu T, Blatt M and Rustad A (2021). Hardware Acceleration of High-Performance Computational Flow Dynamics Using High-Bandwidth Memory-Enabled Field-Programmable Gate Arrays, ACM Transactions on Reconfigurable Technology and Systems, 15:2, (1-35), Online publication date: 30-Jun-2022.
  5. Li J, Cao W, Dong X, Li G, Wang X, Zhao P, Liu L and Feng X (2021). Compiler-assisted Operator Template Library for DNN Accelerators, International Journal of Parallel Programming, 49:5, (628-645), Online publication date: 1-Oct-2021.
  6. Alcaín E, Muñoz A, Schiavi E and Montemayor A (2021). A non-smooth non-local variational approach to saliency detection in real time, Journal of Real-Time Image Processing, 18:3, (739-750), Online publication date: 1-Jun-2021.
  7. Pankratz D, Nowicki T, Eltantawy A and Amaral J Vulkan vision Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization, (137-149)
  8. Stevens J and Klöckner A (2021). A mechanism for balancing accuracy and scope in cross-machine black-box GPU performance modeling, International Journal of High Performance Computing Applications, 34:6, (589-614), Online publication date: 1-Nov-2020.
  9. Li J, Cao W, Dong X, Li G, Wang X, Liu L and Feng X Compiler-Assisted Operator Template Library for DNN Accelerators Network and Parallel Computing, (3-16)
  10. You L, Yang E and Wang G (2020). A novel parallel image encryption algorithm based on hybrid chaotic maps with OpenCL implementation, Soft Computing - A Fusion of Foundations, Methodologies and Applications, 24:16, (12413-12427), Online publication date: 1-Aug-2020.
  11. Lai L, Tsai K and Li H (2019). GPGPU-Based ATPG System: Myth or Reality?, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 39:1, (239-247), Online publication date: 1-Jan-2020.
  12. Fichte J, Hecher M and Zisser M An Improved GPU-Based SAT Model Counter Principles and Practice of Constraint Programming, (491-509)
  13. ACM
    Jin Z and Finkel H Simulation of Random Network of Hodgkin and Huxley Neurons with Exponential Synaptic Conductances on an FPGA Platform Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, (653-657)
  14. Fasogbon P, Aksu E and Heikkilä L Demo: Accelerating Depth-Map on Mobile Device Using CPU-GPU Co-processing Computer Analysis of Images and Patterns, (75-86)
  15. ACM
    Chakrabarti S, Hoekstra M, Kuvaiskii D and Vij M Scaling Intel® Software Guard Extensions Applications with Intel® SGX Card Proceedings of the 8th International Workshop on Hardware and Architectural Support for Security and Privacy, (1-9)
  16. ACM
    Jin Z and Finkel H Exploring Integer Sum Reduction using Atomics on Intel CPU Proceedings of the International Workshop on OpenCL, (1-6)
  17. Boratto M, Alonso P, Pinto C, Melo P, Barreto M and Denaxas S (2019). Exploring hybrid parallel systems for probabilistic record linkage, The Journal of Supercomputing, 75:3, (1137-1149), Online publication date: 1-Mar-2019.
  18. ACM
    Şuşu A Compiling Efficiently with Arithmetic Emulation for the Custom-Width Connex Vector Processor Proceedings of the 5th Workshop on Programming Models for SIMD/Vector Processing, (1-8)
  19. Yamazaki T, Igarashi J, Makino J and Ebisuzaki T (2019). Real-time simulation of a cat-scale artificial cerebellum on PEZY-SC processors, International Journal of High Performance Computing Applications, 33:1, (155-168), Online publication date: 1-Jan-2019.
  20. ACM
    Duarte R, Simões Á, Henriques R and Neto H FPGA-based OpenCL Accelerator for Discovering Temporal Patterns in Gene Expression Data Using Biclustering Proceedings of the 6th International Workshop on Parallelism in Bioinformatics, (53-62)
  21. Rees D, Roberts R, Laramee R, Brookes P, D'Cruze T and Smith G GPU-assisted scatterplots for millions of call events Proceedings of the Conference on Computer Graphics & Visual Computing, (71-79)
  22. Carabaş M, Drăghici A, Lupescu G, Samoilă C and Sluşanschi E Integrating Parallel Computing in the Curriculum of the University Politehnica of Bucharest Euro-Par 2018: Parallel Processing Workshops, (222-234)
  23. ACM
    Burrows E, Friis H and Haveraaen M An array API for finite difference methods Proceedings of the 5th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, (59-66)
  24. ACM
    Wang H, Yun J and Bourd A OpenCL Optimization and Best Practices for Qualcomm Adreno GPUs Proceedings of the International Workshop on OpenCL, (1-8)
  25. Kao C and Hsu W (2018). Exploring hidden coherency of Ray-Tracing for heterogeneous systems using online feedback methodology, The Visual Computer: International Journal of Computer Graphics, 34:5, (633-643), Online publication date: 1-May-2018.
  26. ACM
    Guan H, Shen X and Krim H Egeria Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, (1-14)
  27. Chang Y, Wang S, Yang C, Hwang Y and Lee J (2017). Enabling PoCL-based runtime frameworks on the HSA for OpenCL 2.0 support, Journal of Systems Architecture: the EUROMICRO Journal, 81:C, (71-82), Online publication date: 1-Nov-2017.
  28. Ortega G, Filatovas E, Garzón E and Casado L (2017). Non-dominated sorting procedure for Pareto dominance ranking on multicore CPU and/or GPU, Journal of Global Optimization, 69:3, (607-627), Online publication date: 1-Nov-2017.
  29. ACM
    Moreira R, Collange C and Quintão Pereira F (2017). Function Call Re-Vectorization, ACM SIGPLAN Notices, 52:8, (313-326), Online publication date: 26-Oct-2017.
  30. Alharbi N, Chavent M and Laramee R Real-time rendering of molecular dynamics simulation data Proceedings of the Conference on Computer Graphics & Visual Computing, (43-51)
  31. Barford L, Bhattacharyya S and Liu Y (2017). Data Flow Algorithms for Processors with Vector Extensions, Journal of Signal Processing Systems, 87:1, (21-31), Online publication date: 1-Apr-2017.
  32. ACM
    Moreira R, Collange C and Quintão Pereira F Function Call Re-Vectorization Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, (313-326)
  33. ACM
    Gounalakis O, Lytos A and Dasygenis M Leveraging Parallelization Opportunities by an Online CAD Tool Proceedings of the SouthEast European Design Automation, Computer Engineering, Computer Networks and Social Media Conference, (25-31)
  34. De A, Zhang Y and Guo C (2016). A parallel adaptive segmentation method based on SOM and GPU with application to MRI image processing, Neurocomputing, 198:C, (180-189), Online publication date: 19-Jul-2016.
  35. ACM
    Lin S, Liu Y, Plishker W and Bhattacharyya S A Design Framework for Mapping Vectorized Synchronous Dataflow Graphs onto CPU-GPU Platforms Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems, (20-29)
  36. ACM
    Elsobky A, Farag A and Keshk A Efficient Implementation of McEliece Cryptosystem on Graphic Processing Unit Proceedings of the 10th International Conference on Informatics and Systems, (247-253)
  37. Strnad D and Nerat A (2016). Parallel construction of classification trees on a GPU, Concurrency and Computation: Practice & Experience, 28:5, (1417-1436), Online publication date: 10-Apr-2016.
  38. ACM
    Martineau M, McIntosh-Smith S, Boulton M and Gaudin W An Evaluation of Emerging Many-Core Parallel Programming Models Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, (1-10)
  39. Mansouri F, Huet S and Houzet D (2016). A domain-specific high-level programming model, Concurrency and Computation: Practice & Experience, 28:3, (750-767), Online publication date: 10-Mar-2016.
  40. Serrano E, Blas J and Carretero J (2015). A comparative study of an X-ray tomography reconstruction algorithm in accelerated and cloud computing systems, Concurrency and Computation: Practice & Experience, 27:18, (5538-5556), Online publication date: 25-Dec-2015.
  41. ACM
    Bailey M Fundamentals seminar ACM SIGGRAPH 2015 Courses, (1-129)
  42. McIntosh-Smith S, Price J, Sessions R and Ibarra A (2015). High performance in silico virtual drug screening on many-core processors, International Journal of High Performance Computing Applications, 29:2, (119-134), Online publication date: 1-May-2015.
  43. Rojek K, Ciznicki M, Rosa B, Kopta P, Kulczewski M, Kurowski K, Piotrowski Z, Szustak L, Wojcik D and Wyrzykowski R (2015). Adaptation of fluid model EULAG to graphics processing unit architecture, Concurrency and Computation: Practice & Experience, 27:4, (937-957), Online publication date: 25-Mar-2015.
  44. ACM
    Tarakji A, Börger L and Leupers R A comparative investigation of device-specific mechanisms for exploiting HPC accelerators Proceedings of the 8th Workshop on General Purpose Processing using GPUs, (1-12)
  45. ACM
    Qawasmeh A, Chapman B, Hugues M and Calandra H GPU technology applied to reverse time migration and seismic modeling via OpenACC Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, (75-85)
  46. Lin C, Nagarajan V and Gupta R Fence scoping Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, (105-116)
  47. Mansouri F, Huet S and Houzet D A Visual Programming Model to Implement Coarse-Grained DSP Applications on Parallel and Heterogeneous Clusters Revised Selected Papers, Part I, of the Euro-Par 2014 International Workshops on Parallel Processing - Volume 8805, (141-152)
  48. ACM
    Silberstein M (2014). GPUs: High-performance Accelerators for Parallel Applications, Ubiquity, 2014:August, (1-13), Online publication date: 1-Aug-2014.
  49. Elangovan V, Badia R and Ayguadé E Scalability and Parallel Execution of OmpSs-OpenCL Tasks on Heterogeneous CPU-GPU Environment Proceedings of the 29th International Conference on Supercomputing - Volume 8488, (141-155)
  50. ACM
    Tarakji A, Salscheider N, Alt S and Heiducoff J Feature-based device selection in heterogeneous computing systems Proceedings of the 11th ACM Conference on Computing Frontiers, (1-10)
  51. ACM
    Vanderbruggen T and Cavazos J Generating OpenCL C kernels from OpenACC Proceedings of the International Workshop on OpenCL 2013 & 2014, (1-10)
  52. ACM
    Hower D, Hechtman B, Beckmann B, Gaster B, Hill M, Reinhardt S and Wood D (2014). Heterogeneous-race-free memory models, ACM SIGARCH Computer Architecture News, 42:1, (427-440), Online publication date: 5-Apr-2014.
  53. ACM
    Hower D, Hechtman B, Beckmann B, Gaster B, Hill M, Reinhardt S and Wood D (2014). Heterogeneous-race-free memory models, ACM SIGPLAN Notices, 49:4, (427-440), Online publication date: 5-Apr-2014.
  54. ACM
    Ukidave Y, Gong X and Kaeli D Performance Evaluation and Optimization Mechanisms for Inter-operable Graphics and Computation on GPUs Proceedings of Workshop on General Purpose Processing Using GPUs, (37-45)
  55. Ukidave Y, Gong X and Kaeli D Performance Evaluation and Optimization Mechanisms for Inter-operable Graphics and Computation on GPUs Proceedings of Workshop on General Purpose Processing Using GPUs, (37-45)
  56. ACM
    Hower D, Hechtman B, Beckmann B, Gaster B, Hill M, Reinhardt S and Wood D Heterogeneous-race-free memory models Proceedings of the 19th international conference on Architectural support for programming languages and operating systems, (427-440)
  57. ACM
    Bailey M Combining GPU data-parallel computing with OpenGL ACM SIGGRAPH 2013 Courses, (1-65)
  58. ACM
    Mistry P, Ukidave Y, Schaa D and Kaeli D Valar Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, (54-65)
  59. ACM
    Ko Y, Burgstaller B and Scholz B Parallel from the beginning Proceeding of the 44th ACM technical symposium on Computer science education, (415-420)
  60. Capuzzo-Dolcetta R, Spera M and Punzo D (2013). A fully parallel, high precision, N-body code running on hybrid computing platforms, Journal of Computational Physics, 236, (580-593), Online publication date: 1-Mar-2013.
  61. ACM
    McKean D and Sprinkle J Heterogeneous multi-core systems Proceedings of the 2012 workshop on Domain-specific modeling, (45-48)
  62. ACM
    Joshi P, Bourges-Sévenier M, Russell K and Mo Z Graphics programming for the web ACM SIGGRAPH 2012 Courses, (1-75)
  63. Salavert Torres J, Blanquer Espert I, Tomas Dominguez A, Hernendez V, Medina I, Terraga J and Dopazo J (2012). Using GPUs for the Exact Alignment of Short-Read Genetic Sequences by Means of the Burrows-Wheeler Transform, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9:4, (1245-1256), Online publication date: 1-Jul-2012.
  64. ACM
    Su B and Keutzer K clSpMV Proceedings of the 26th ACM international conference on Supercomputing, (353-364)
  65. ACM
    Duchowski A, Price M, Meyer M and Orero P Aggregate gaze visualization with real-time heatmaps Proceedings of the Symposium on Eye Tracking Research and Applications, (13-20)
  66. Keenan M, Komarov I, D'Souza R and Riolo R Novel graphics processing unit-based parallel algorithms for understanding species diversity in forests Proceedings of the 2012 Symposium on High Performance Computing, (1-9)
Contributors
  • University of the West of England
  • Intel Corporation
Please enable JavaScript to view thecomments powered by Disqus.

Recommendations