Hashmi et al., 2020 - Google Patents
FALCON-X: Zero-copy MPI derived datatype processing on modern CPU and GPU architecturesHashmi et al., 2020
View PDF- Document ID
- 5574837958005391382
- Author
- Hashmi J
- Chu C
- Chakraborty S
- Bayatpour M
- Subramoni H
- Panda D
- Publication year
- Publication venue
- Journal of Parallel and Distributed Computing
External Links
Snippet
This paper addresses the challenges of MPI derived datatype processing and proposes FALCON-X—A Fast and Low-overhead Communication framework for optimized zero-copy intra-node derived datatype communication on emerging CPU/GPU architectures. We …
- 238000004891 communication 0 abstract description 100
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
- G06F12/0817—Cache consistency protocols using directory methods
- G06F12/0826—Limited pointers directories; State-only directories without pointers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogramme communication; Intertask communication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/78—Architectures of general purpose stored programme computers comprising a single central processing unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/25—Using a specific main memory architecture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chen et al. | GFlink: An in-memory computing architecture on heterogeneous CPU-GPU clusters for big data | |
Pennycook et al. | Exploring simd for molecular dynamics, using intel® xeon® processors and intel® xeon phi coprocessors | |
Hoefler et al. | MPI+ MPI: a new hybrid approach to parallel programming with MPI plus shared memory | |
Wang et al. | Optimized non-contiguous MPI datatype communication for GPU clusters: Design, implementation and evaluation with MVAPICH2 | |
Hashmi et al. | FALCON-X: Zero-copy MPI derived datatype processing on modern CPU and GPU architectures | |
Elteir et al. | StreamMR: an optimized MapReduce framework for AMD GPUs | |
Zhou et al. | Accelerating mpi all-to-all communication with online compression on modern gpu clusters | |
Ross et al. | Parallel programming model for the epiphany many-core coprocessor using threaded mpi | |
Kristensen et al. | Bohrium: a virtual machine approach to portable parallelism | |
Hashmi et al. | Falcon: Efficient designs for zero-copy mpi datatype processing on emerging architectures | |
Nukada et al. | High performance 3-D FFT using multiple CUDA GPUs | |
Xue et al. | Multi‐GPU performance optimization of a computational fluid dynamics code using OpenACC | |
Agostini et al. | GPUDirect Async: Exploring GPU synchronous communication techniques for InfiniBand clusters | |
Yao et al. | Wukong+ G: Fast and concurrent RDF query processing using RDMA-assisted GPU graph exploration | |
Lu et al. | Towards efficient remote openmp offloading | |
Miki et al. | PACC: a directive-based programming framework for out-of-core stencil computation on accelerators | |
Kirtzic et al. | A parallel algorithm development model for the GPU architecture | |
Zhou et al. | Accelerating broadcast communication with gpu compression for deep learning workloads | |
Mal’kovskii et al. | Performance evaluation of a hybrid computer cluster built on IBM POWER8 microprocessors | |
Faber et al. | Platform agnostic streaming data application performance models | |
Eskikaya et al. | Distributed OpenCL distributing OpenCL platform on network scale | |
Chavarria-Miranda et al. | Early experience with out-of-core applications on the Cray XMT | |
Wong et al. | Efficient magnetohydrodynamic simulations on distributed multi-GPU systems using a novel GPU Direct–MPI hybrid approach | |
Buono et al. | Data analytics with nvlink: An spmv case study | |
Li et al. | Dual buffer rotation four-stage pipeline for CPU–GPU cooperative computing |