Small et al., 2010 - Google Patents
Near-optimal Rendezvous protocols for RDMA-enabled clustersSmall et al., 2010
View PDF- Document ID
- 10334287003007040013
- Author
- Small M
- Gu Z
- Yuan X
- Publication year
- Publication venue
- 2010 39th International Conference on Parallel Processing
External Links
Snippet
Optimizing Message Passing Interface (MPI) point-to-point communication for large messages is of paramount importance since most communications in MPI applications are performed by such operations. Remote Direct Memory Access (RDMA) allows one-sided …
- 238000004891 communication 0 abstract description 98
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17356—Indirect interconnection networks
- G06F15/17368—Indirect interconnection networks non hierarchical topologies
- G06F15/17381—Two dimensional, e.g. mesh, torus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17337—Direct connection machines, e.g. completely connected computers, point to point communication networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogramme communication; Intertask communication
- G06F9/546—Message passing systems or structures, e.g. queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/78—Architectures of general purpose stored programme computers comprising a single central processing unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/20—Handling requests for interconnection or transfer for access to input/output bus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F1/00—Details of data-processing equipment not covered by groups G06F3/00 - G06F13/00, e.g. cooling, packaging or power supply specially adapted for computer application
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2207/00—Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F2207/38—Indexing scheme relating to groups G06F7/38 - G06F7/575
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L49/00—Packet switching elements
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kumar et al. | The deep computing messaging framework: generalized scalable message passing on the Blue Gene/P supercomputer | |
Shah et al. | Performance and experience with LAPI-a new high-performance communication library for the IBM RS/6000 SP | |
Ajima et al. | Tofu interconnect 2: System-on-chip integration of high-performance interconnect | |
Pakin et al. | Fast Messages: Efficient, portable communication for workstation clusters and MPPs | |
US8032892B2 (en) | Message passing with a limited number of DMA byte counters | |
US8082424B2 (en) | Determining when a set of compute nodes participating in a barrier operation on a parallel computer are ready to exit the barrier operation | |
US7788334B2 (en) | Multiple node remote messaging | |
US8325633B2 (en) | Remote direct memory access | |
Araki et al. | User-space communication: A quantitative study | |
US7802025B2 (en) | DMA engine for repeating communication patterns | |
US20130067206A1 (en) | Endpoint-Based Parallel Data Processing In A Parallel Active Messaging Interface Of A Parallel Computer | |
Suresh et al. | A novel framework for efficient offloading of communication operations to bluefield smartnics | |
Small et al. | Near-optimal Rendezvous protocols for RDMA-enabled clusters | |
Shoemaker et al. | Numesh: An architecture optimized for scheduled communication | |
Suresh et al. | Network assisted non-contiguous transfers for GPU-aware MPI libraries | |
US8782164B2 (en) | Implementing asyncronous collective operations in a multi-node processing system | |
Rashti et al. | A speculative and adaptive MPI rendezvous protocol over RDMA-enabled interconnects | |
Wong et al. | Push-Pull Messaging: a high-performance communication mechanism for commodity SMP clusters | |
Schneider et al. | Kernel-based offload of collective operations–implementation, evaluation and lessons learned | |
Kee et al. | An efficient implementation of the BSP programming library for VIA | |
Nunes et al. | A profiler for a heterogeneous multi-core multi-FPGA system | |
Gu et al. | Protocol customization for improving MPI performance on RDMA-enabled clusters | |
Roweth et al. | Optimised global reduction on QsNet/sup II | |
Mohamed et al. | High-performance message striping over reliable transport protocols | |
Peryshkova et al. | Analysis of All-to-all Collective Operations on Hierarchical Computer Clusters |