[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Hashmi et al., 2020 - Google Patents

FALCON-X: Zero-copy MPI derived datatype processing on modern CPU and GPU architectures

Hashmi et al., 2020

View PDF
Document ID
5574837958005391382
Author
Hashmi J
Chu C
Chakraborty S
Bayatpour M
Subramoni H
Panda D
Publication year
Publication venue
Journal of Parallel and Distributed Computing

External Links

Snippet

This paper addresses the challenges of MPI derived datatype processing and proposes FALCON-X—A Fast and Low-overhead Communication framework for optimized zero-copy intra-node derived datatype communication on emerging CPU/GPU architectures. We …
Continue reading at www.sciencedirect.com (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0817Cache consistency protocols using directory methods
    • G06F12/0826Limited pointers directories; State-only directories without pointers
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/30Arrangements for executing machine-instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogramme communication; Intertask communication
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored programme computers
    • G06F15/78Architectures of general purpose stored programme computers comprising a single central processing unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30861Retrieval from the Internet, e.g. browsers
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformations of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/25Using a specific main memory architecture
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements

Similar Documents

Publication Publication Date Title
Chen et al. GFlink: An in-memory computing architecture on heterogeneous CPU-GPU clusters for big data
Pennycook et al. Exploring simd for molecular dynamics, using intel® xeon® processors and intel® xeon phi coprocessors
Hoefler et al. MPI+ MPI: a new hybrid approach to parallel programming with MPI plus shared memory
Wang et al. Optimized non-contiguous MPI datatype communication for GPU clusters: Design, implementation and evaluation with MVAPICH2
Hashmi et al. FALCON-X: Zero-copy MPI derived datatype processing on modern CPU and GPU architectures
Elteir et al. StreamMR: an optimized MapReduce framework for AMD GPUs
Zhou et al. Accelerating mpi all-to-all communication with online compression on modern gpu clusters
Ross et al. Parallel programming model for the epiphany many-core coprocessor using threaded mpi
Kristensen et al. Bohrium: a virtual machine approach to portable parallelism
Hashmi et al. Falcon: Efficient designs for zero-copy mpi datatype processing on emerging architectures
Nukada et al. High performance 3-D FFT using multiple CUDA GPUs
Xue et al. Multi‐GPU performance optimization of a computational fluid dynamics code using OpenACC
Agostini et al. GPUDirect Async: Exploring GPU synchronous communication techniques for InfiniBand clusters
Yao et al. Wukong+ G: Fast and concurrent RDF query processing using RDMA-assisted GPU graph exploration
Lu et al. Towards efficient remote openmp offloading
Miki et al. PACC: a directive-based programming framework for out-of-core stencil computation on accelerators
Kirtzic et al. A parallel algorithm development model for the GPU architecture
Zhou et al. Accelerating broadcast communication with gpu compression for deep learning workloads
Mal’kovskii et al. Performance evaluation of a hybrid computer cluster built on IBM POWER8 microprocessors
Faber et al. Platform agnostic streaming data application performance models
Eskikaya et al. Distributed OpenCL distributing OpenCL platform on network scale
Chavarria-Miranda et al. Early experience with out-of-core applications on the Cray XMT
Wong et al. Efficient magnetohydrodynamic simulations on distributed multi-GPU systems using a novel GPU Direct–MPI hybrid approach
Buono et al. Data analytics with nvlink: An spmv case study
Li et al. Dual buffer rotation four-stage pipeline for CPU–GPU cooperative computing