González et al., 2021 - Google Patents
Multi-GPU systems and Unified Virtual Memory for scientific applications: The case of the NAS multi-zone parallel benchmarksGonzález et al., 2021
View HTML- Document ID
- 9552595307304168626
- Author
- González M
- Morancho E
- Publication year
- Publication venue
- Journal of Parallel and Distributed Computing
External Links
Snippet
GPU-based computing systems have become a widely accepted solution for the high- performance-computing (HPC) domain. GPUs have shown highly competitive performance- per-watt ratios and can exploit an astonishing level of parallelism. However, exploiting the …
- 238000004891 communication 0 abstract description 70
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
- G06F8/4441—Reducing the execution time required by the program code
- G06F8/4442—Reducing the number of cache misses; Data prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/44—Arrangements for executing specific programmes
- G06F9/455—Emulation; Software simulation, i.e. virtualisation or emulation of application or operating system execution engines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3442—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment for planning or managing the needed capacity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/885—Monitoring specific for caches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/25—Using a specific main memory architecture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F1/00—Details of data-processing equipment not covered by groups G06F3/00 - G06F13/00, e.g. cooling, packaging or power supply specially adapted for computer application
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Allen et al. | In-depth analyses of unified virtual memory system for GPU accelerated computing | |
González et al. | Multi-GPU systems and Unified Virtual Memory for scientific applications: The case of the NAS multi-zone parallel benchmarks | |
Lee et al. | Automatically mapping code on an intelligent memory architecture | |
Van Lunteren et al. | Coherently attached programmable near-memory acceleration platform and its application to stencil processing | |
Do et al. | SNU-NPB 2019: parallelizing and optimizing NPB in OpenCL and CUDA for modern GPUs | |
Margiolas et al. | Portable and transparent host-device communication optimization for gpgpu environments | |
Cabezas et al. | GPU-SM: shared memory multi-GPU programming | |
Kumar et al. | Efficient automatic parallelization of a single GPU program for a multiple GPU system | |
Laso et al. | CIMAR, NIMAR, and LMMA: Novel algorithms for thread and memory migrations in user space on NUMA systems using hardware counters | |
Jung et al. | SnuRHAC: a runtime for heterogeneous accelerator clusters with CUDA unified memory | |
da Cruz et al. | Memory-aware thread and data mapping for hierarchical multi-core platforms | |
González et al. | Multi-GPU parallelization of the NAS multi-zone parallel benchmarks | |
Memarzia et al. | Toward Efficient In-memory Data Analytics on NUMA Systems | |
Flores et al. | Memory hierarchies for future hpc architectures | |
Allen | Holistic Performance Analysis and Optimization of Unified Virtual Memory | |
Mancuso | Next-generation safety-critical systems on multi-core platforms | |
Mao et al. | Designing New Memory Systems for Next-Generation Data Centers | |
Siddique et al. | Lmstr: exploring shared hardware controlled scratchpad memory for multicores | |
Pemberton | The Serverless Datacenter: Hardware and Software Techniques for Resource Disaggregation | |
Ribeiro et al. | Memory affinity management for numerical scientific applications over multi-core multiprocessors with hierarchical memory | |
Lorenzo-Castillo et al. | A flexible and dynamic page migration infrastructure based on hardware counters | |
Weidendorfer | Data transfer requirement analysis with bandwidth curves | |
Ravi | Toward Scalable Middleware for Shared HPC Resources | |
Nanda et al. | The design of compass: An execution driven simulator for commercial applications running on shared memory multiprocessors | |
Boku et al. | Oakforest-PACS: Advanced KNL Cluster System |