Bhattacharjee, 2020 - Google Patents

Evaluation of GPU-specific device directives and multi-dimensional data structures in OpenMP

Bhattacharjee, 2020

Document ID: 13140543580033907387
Author: Bhattacharjee A
Publication year: 2020

External Links

Cited by

Snippet

OpenMP target offload has been in the inception phase for some time but has been gaining traction in the recent years with more compilers supporting the constructs and optimising it at the general level. Its ease of programming compared to other models makes it quite …

Continue reading at search.proquest.com (other versions)

238000011156 evaluation 0 title description 5

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
- G06F8/4441—Reducing the execution time required by the program code
- G06F8/4442—Reducing the number of cache misses; Data prefetching
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/451—Code distribution
- G06F8/452—Loops
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3604—Software analysis for verifying properties of programs
- G06F11/3612—Software analysis for verifying properties of programs by runtime analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL

Similar Documents

Publication	Publication Date	Title
Che et al.	2011	Dymaxion: Optimizing memory access patterns for heterogeneous systems
Chang et al.	2012	A scalable, numerically stable, high-performance tridiagonal solver using GPUs
Thouti et al.	2012	Comparison of OpenMP & OpenCL parallel processing technologies
Murarasu et al.	2011	Compact data structure and scalable algorithms for the sparse grid technique
Heinecke et al.	2011	Multi-and many-core data mining with adaptive sparse grids
Majeti et al.	2016	Automatic data layout generation and kernel mapping for cpu+ gpu architectures
Heinecke et al.	2013	Emerging architectures enable to boost massively parallel data mining using adaptive sparse grids
Rubin et al.	2014	Maps: Optimizing massively parallel applications using device-level memory abstraction
Fauzia et al.	2013	Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential
Martineau et al.	2017	The productivity, portability and performance of OpenMP 4.5 for scientific applications targeting Intel CPUs, IBM CPUs, and NVIDIA GPUs
Strout et al.	2014	Generalizing run-time tiling with the loop chain abstraction
Jin et al.	2022	Performance portability study of epistasis detection using sycl on nvidia gpu
Bakunas-Milanowski et al.	2017	Efficient algorithms for stream compaction on GPUs
Mehta et al.	2021	Evaluating performance portability of OpenMP for SNAP on NVIDIA, Intel, and AMD GPUs using the roofline methodology
Cruz et al.	2011	How to obtain efficient GPU kernels: An illustration using FMM & FGT algorithms
Bhattacharjee	2020	Evaluation of GPU-specific device directives and multi-dimensional data structures in OpenMP
Goossens	2018	Dataflow management, dynamic load balancing, and concurrent processing for real‐time embedded vision applications using Quasar
Balogh et al.	2018	Comparison of parallelisation approaches, languages, and compilers for unstructured mesh algorithms on GPUs
Hansson et al.	2014	A quantitative comparison of PRAM based emulated shared memory architectures to current multicore CPUs and GPUs
Lotrič et al.	2009	Parallel implementations of recurrent neural network learning
Calvert	2010	Parallelisation of java for graphics processors
Marcus	2012	Mcmini: Monte carlo on gpgpu
Jin et al.	2018	Optimizing Parallel Reduction on OpenCL FPGA Platform–A Case Study of Frequent Pattern Compression
Blindell et al.	2014	Synthesizing code for GPGPUs from abstract formal models
Leist et al.	2011	Graph generation on GPUs using dynamic memory allocation