Hashmi et al., 2020 - Google Patents

FALCON-X: Zero-copy MPI derived datatype processing on modern CPU and GPU architectures

Hashmi et al., 2020

Document ID: 5574837958005391382
Author: Hashmi J; Chu C; Chakraborty S; Bayatpour M; Subramoni H; Panda D
Publication year: 2020
Publication venue: Journal of Parallel and Distributed Computing

External Links

Cited by

Snippet

This paper addresses the challenges of MPI derived datatype processing and proposes FALCON-X—A Fast and Low-overhead Communication framework for optimized zero-copy intra-node derived datatype communication on emerging CPU/GPU architectures. We …

Continue reading at www.sciencedirect.com (PDF) (other versions)

238000004891 communication 0 abstract description 100

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
- G06F12/0817—Cache consistency protocols using directory methods
- G06F12/0826—Limited pointers directories; State-only directories without pointers
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogramme communication; Intertask communication
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/78—Architectures of general purpose stored programme computers comprising a single central processing unit
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/25—Using a specific main memory architecture
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements

Similar Documents

Publication	Publication Date	Title
Chen et al.	2018	GFlink: An in-memory computing architecture on heterogeneous CPU-GPU clusters for big data
Pennycook et al.	2013	Exploring simd for molecular dynamics, using intel® xeon® processors and intel® xeon phi coprocessors
Hoefler et al.	2013	MPI+ MPI: a new hybrid approach to parallel programming with MPI plus shared memory
Wang et al.	2011	Optimized non-contiguous MPI datatype communication for GPU clusters: Design, implementation and evaluation with MVAPICH2
Hashmi et al.	2020	FALCON-X: Zero-copy MPI derived datatype processing on modern CPU and GPU architectures
Elteir et al.	2011	StreamMR: an optimized MapReduce framework for AMD GPUs
Zhou et al.	2022	Accelerating mpi all-to-all communication with online compression on modern gpu clusters
Ross et al.	2015	Parallel programming model for the epiphany many-core coprocessor using threaded mpi
Kristensen et al.	2014	Bohrium: a virtual machine approach to portable parallelism
Hashmi et al.	2019	Falcon: Efficient designs for zero-copy mpi datatype processing on emerging architectures
Nukada et al.	2012	High performance 3-D FFT using multiple CUDA GPUs
Xue et al.	2021	Multi‐GPU performance optimization of a computational fluid dynamics code using OpenACC
Agostini et al.	2018	GPUDirect Async: Exploring GPU synchronous communication techniques for InfiniBand clusters
Yao et al.	2021	Wukong+ G: Fast and concurrent RDF query processing using RDMA-assisted GPU graph exploration
Lu et al.	2022	Towards efficient remote openmp offloading
Miki et al.	2019	PACC: a directive-based programming framework for out-of-core stencil computation on accelerators
Kirtzic et al.	2012	A parallel algorithm development model for the GPU architecture
Zhou et al.	2022	Accelerating broadcast communication with gpu compression for deep learning workloads
Mal’kovskii et al.	2019	Performance evaluation of a hybrid computer cluster built on IBM POWER8 microprocessors
Faber et al.	2021	Platform agnostic streaming data application performance models
Eskikaya et al.	2012	Distributed OpenCL distributing OpenCL platform on network scale
Chavarria-Miranda et al.	2008	Early experience with out-of-core applications on the Cray XMT
Wong et al.	2014	Efficient magnetohydrodynamic simulations on distributed multi-GPU systems using a novel GPU Direct–MPI hybrid approach
Buono et al.	2017	Data analytics with nvlink: An spmv case study
Li et al.	2019	Dual buffer rotation four-stage pipeline for CPU–GPU cooperative computing