Jocksch et al., 2019 - Google Patents
Optimized all‐to‐all communication on multicore architectures applied to FFTs with pencil decompositionJocksch et al., 2019
View PDF- Document ID
- 17204755068321753449
- Author
- Jocksch A
- Kraushaar M
- Daverio D
- Publication year
- Publication venue
- Concurrency and Computation: Practice and Experience
External Links
Snippet
All‐to‐all communication is a basic functionality of parallel communication libraries such as the Message Passing Interface (MPI). Typically, there are multiple different underlying algorithms, which are chosen according to message size. We propose a communication …
- 238000004891 communication 0 title abstract description 98
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogramme communication; Intertask communication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Programme synchronisation; Mutual exclusion, e.g. by means of semaphores; Contention for resources among tasks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/44—Arrangements for executing specific programmes
- G06F9/455—Emulation; Software simulation, i.e. virtualisation or emulation of application or operating system execution engines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/78—Architectures of general purpose stored programme computers comprising a single central processing unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Samardzic et al. | F1: A fast and programmable accelerator for fully homomorphic encryption | |
Chen et al. | GFlink: An in-memory computing architecture on heterogeneous CPU-GPU clusters for big data | |
US10037267B2 (en) | Instruction set architecture and software support for register state migration | |
US10367741B1 (en) | High performance, scalable multi chip interconnect | |
Hoefler et al. | The scalable process topology interface of MPI 2.2 | |
Zeldovich et al. | Multiprocessor support for event-driven programs. | |
Pereira et al. | PSkel: A stencil programming framework for CPU‐GPU systems | |
Bayatpour et al. | Salar: Scalable and adaptive designs for large message reduction collectives | |
Ying et al. | Bluefog: Make decentralized algorithms practical for optimization and deep learning | |
Tan et al. | Arena: Asynchronous reconfigurable accelerator ring to enable data-centric parallel computing | |
US8792786B2 (en) | Photonically-enabled in-flight data reorganization | |
Zhou et al. | Accelerating mpi all-to-all communication with online compression on modern gpu clusters | |
Jocksch et al. | Optimized all‐to‐all communication on multicore architectures applied to FFTs with pencil decomposition | |
Papadopoulou et al. | A performance study of UCX over InfiniBand | |
Silberstein | OmniX: an accelerator-centric OS for omni-programmable systems | |
Si et al. | Process-based asynchronous progress model for MPI point-to-point communication | |
Haghi et al. | Reconfigurable switches for high performance and flexible MPI collectives | |
Chrisochoides et al. | Mobile object layer: A runtime substrate for parallel adaptive and irregular computations | |
Wan et al. | TESLAC: accelerating lattice-based cryptography with AI accelerator | |
Damania et al. | Pytorch rpc: Distributed deep learning built on tensor-optimized remote procedure calls | |
Lin et al. | Master–worker model for mapreduce paradigm on the tile64 many-core platform | |
Igual et al. | Scheduling algorithms‐by‐blocks on small clusters | |
Dhanraj | Enhancement of LiMIC-Based Collectives for Multi-core Clusters | |
Malakar et al. | Hierarchical read–write optimizations for scientific applications with multi-variable structured datasets | |
Bai et al. | A hybrid ARM‐FPGA cluster for cryptographic algorithm acceleration |