Bhattacharjee, 2020 - Google Patents
Evaluation of GPU-specific device directives and multi-dimensional data structures in OpenMPBhattacharjee, 2020
- Document ID
- 13140543580033907387
- Author
- Bhattacharjee A
- Publication year
External Links
Snippet
OpenMP target offload has been in the inception phase for some time but has been gaining traction in the recent years with more compilers supporting the constructs and optimising it at the general level. Its ease of programming compared to other models makes it quite …
- 238000011156 evaluation 0 title description 5
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
- G06F8/4441—Reducing the execution time required by the program code
- G06F8/4442—Reducing the number of cache misses; Data prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/451—Code distribution
- G06F8/452—Loops
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3604—Software analysis for verifying properties of programs
- G06F11/3612—Software analysis for verifying properties of programs by runtime analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Che et al. | Dymaxion: Optimizing memory access patterns for heterogeneous systems | |
Chang et al. | A scalable, numerically stable, high-performance tridiagonal solver using GPUs | |
Thouti et al. | Comparison of OpenMP & OpenCL parallel processing technologies | |
Murarasu et al. | Compact data structure and scalable algorithms for the sparse grid technique | |
Heinecke et al. | Multi-and many-core data mining with adaptive sparse grids | |
Majeti et al. | Automatic data layout generation and kernel mapping for cpu+ gpu architectures | |
Heinecke et al. | Emerging architectures enable to boost massively parallel data mining using adaptive sparse grids | |
Rubin et al. | Maps: Optimizing massively parallel applications using device-level memory abstraction | |
Fauzia et al. | Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential | |
Martineau et al. | The productivity, portability and performance of OpenMP 4.5 for scientific applications targeting Intel CPUs, IBM CPUs, and NVIDIA GPUs | |
Strout et al. | Generalizing run-time tiling with the loop chain abstraction | |
Jin et al. | Performance portability study of epistasis detection using sycl on nvidia gpu | |
Bakunas-Milanowski et al. | Efficient algorithms for stream compaction on GPUs | |
Mehta et al. | Evaluating performance portability of OpenMP for SNAP on NVIDIA, Intel, and AMD GPUs using the roofline methodology | |
Cruz et al. | How to obtain efficient GPU kernels: An illustration using FMM & FGT algorithms | |
Bhattacharjee | Evaluation of GPU-specific device directives and multi-dimensional data structures in OpenMP | |
Goossens | Dataflow management, dynamic load balancing, and concurrent processing for real‐time embedded vision applications using Quasar | |
Balogh et al. | Comparison of parallelisation approaches, languages, and compilers for unstructured mesh algorithms on GPUs | |
Hansson et al. | A quantitative comparison of PRAM based emulated shared memory architectures to current multicore CPUs and GPUs | |
Lotrič et al. | Parallel implementations of recurrent neural network learning | |
Calvert | Parallelisation of java for graphics processors | |
Marcus | Mcmini: Monte carlo on gpgpu | |
Jin et al. | Optimizing Parallel Reduction on OpenCL FPGA Platform–A Case Study of Frequent Pattern Compression | |
Blindell et al. | Synthesizing code for GPGPUs from abstract formal models | |
Leist et al. | Graph generation on GPUs using dynamic memory allocation |