Jocksch et al., 2019 - Google Patents

Optimized all‐to‐all communication on multicore architectures applied to FFTs with pencil decomposition

Jocksch et al., 2019

Document ID: 17204755068321753449
Author: Jocksch A; Kraushaar M; Daverio D
Publication year: 2019
Publication venue: Concurrency and Computation: Practice and Experience

External Links

Cited by

Snippet

All‐to‐all communication is a basic functionality of parallel communication libraries such as the Message Passing Interface (MPI). Typically, there are multiple different underlying algorithms, which are chosen according to message size. We propose a communication …

Continue reading at cug.org (PDF) (other versions)

238000004891 communication 0 title abstract description 98

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogramme communication; Intertask communication
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Programme synchronisation; Mutual exclusion, e.g. by means of semaphores; Contention for resources among tasks
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/44—Arrangements for executing specific programmes
- G06F9/455—Emulation; Software simulation, i.e. virtualisation or emulation of application or operating system execution engines
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/78—Architectures of general purpose stored programme computers comprising a single central processing unit
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements

Similar Documents

Publication	Publication Date	Title
Samardzic et al.	2021	F1: A fast and programmable accelerator for fully homomorphic encryption
Chen et al.	2018	GFlink: An in-memory computing architecture on heterogeneous CPU-GPU clusters for big data
US10037267B2 (en)	2018-07-31	Instruction set architecture and software support for register state migration
US10367741B1 (en)	2019-07-30	High performance, scalable multi chip interconnect
Hoefler et al.	2011	The scalable process topology interface of MPI 2.2
Zeldovich et al.	2003	Multiprocessor support for event-driven programs.
Pereira et al.	2015	PSkel: A stencil programming framework for CPU‐GPU systems
Bayatpour et al.	2018	Salar: Scalable and adaptive designs for large message reduction collectives
Ying et al.	2021	Bluefog: Make decentralized algorithms practical for optimization and deep learning
Tan et al.	2021	Arena: Asynchronous reconfigurable accelerator ring to enable data-centric parallel computing
US8792786B2 (en)	2014-07-29	Photonically-enabled in-flight data reorganization
Zhou et al.	2022	Accelerating mpi all-to-all communication with online compression on modern gpu clusters
Jocksch et al.	2019	Optimized all‐to‐all communication on multicore architectures applied to FFTs with pencil decomposition
Papadopoulou et al.	2017	A performance study of UCX over InfiniBand
Silberstein	2017	OmniX: an accelerator-centric OS for omni-programmable systems
Si et al.	2017	Process-based asynchronous progress model for MPI point-to-point communication
Haghi et al.	2022	Reconfigurable switches for high performance and flexible MPI collectives
Chrisochoides et al.	2000	Mobile object layer: A runtime substrate for parallel adaptive and irregular computations
Wan et al.	2021	TESLAC: accelerating lattice-based cryptography with AI accelerator
Damania et al.	2023	Pytorch rpc: Distributed deep learning built on tensor-optimized remote procedure calls
Lin et al.	2014	Master–worker model for mapreduce paradigm on the tile64 many-core platform
Igual et al.	2013	Scheduling algorithms‐by‐blocks on small clusters
Dhanraj	2012	Enhancement of LiMIC-Based Collectives for Multi-core Clusters
Malakar et al.	2017	Hierarchical read–write optimizations for scientific applications with multi-variable structured datasets
Bai et al.	2019	A hybrid ARM‐FPGA cluster for cryptographic algorithm acceleration