Small et al., 2010 - Google Patents

Near-optimal Rendezvous protocols for RDMA-enabled clusters

Small et al., 2010

Document ID: 10334287003007040013
Author: Small M; Gu Z; Yuan X
Publication year: 2010
Publication venue: 2010 39th International Conference on Parallel Processing

External Links

Cited by

Snippet

Optimizing Message Passing Interface (MPI) point-to-point communication for large messages is of paramount importance since most communications in MPI applications are performed by such operations. Remote Direct Memory Access (RDMA) allows one-sided …

Continue reading at www.researchgate.net (PDF) (other versions)

238000004891 communication 0 abstract description 98

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17356—Indirect interconnection networks
- G06F15/17368—Indirect interconnection networks non hierarchical topologies
- G06F15/17381—Two dimensional, e.g. mesh, torus
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17337—Direct connection machines, e.g. completely connected computers, point to point communication networks
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogramme communication; Intertask communication
- G06F9/546—Message passing systems or structures, e.g. queues
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/78—Architectures of general purpose stored programme computers comprising a single central processing unit
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/20—Handling requests for interconnection or transfer for access to input/output bus
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F1/00—Details of data-processing equipment not covered by groups G06F3/00 - G06F13/00, e.g. cooling, packaging or power supply specially adapted for computer application
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2207/00—Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F2207/38—Indexing scheme relating to groups G06F7/38 - G06F7/575
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L49/00—Packet switching elements

Similar Documents

Publication	Publication Date	Title
Kumar et al.	2008	The deep computing messaging framework: generalized scalable message passing on the Blue Gene/P supercomputer
Shah et al.	1998	Performance and experience with LAPI-a new high-performance communication library for the IBM RS/6000 SP
Ajima et al.	2014	Tofu interconnect 2: System-on-chip integration of high-performance interconnect
Pakin et al.	1997	Fast Messages: Efficient, portable communication for workstation clusters and MPPs
US8032892B2 (en)	2011-10-04	Message passing with a limited number of DMA byte counters
US8082424B2 (en)	2011-12-20	Determining when a set of compute nodes participating in a barrier operation on a parallel computer are ready to exit the barrier operation
US7788334B2 (en)	2010-08-31	Multiple node remote messaging
US8325633B2 (en)	2012-12-04	Remote direct memory access
Araki et al.	1998	User-space communication: A quantitative study
US7802025B2 (en)	2010-09-21	DMA engine for repeating communication patterns
US20130067206A1 (en)	2013-03-14	Endpoint-Based Parallel Data Processing In A Parallel Active Messaging Interface Of A Parallel Computer
Suresh et al.	2023	A novel framework for efficient offloading of communication operations to bluefield smartnics
Small et al.	2010	Near-optimal Rendezvous protocols for RDMA-enabled clusters
Shoemaker et al.	1996	Numesh: An architecture optimized for scheduled communication
Suresh et al.	2022	Network assisted non-contiguous transfers for GPU-aware MPI libraries
US8782164B2 (en)	2014-07-15	Implementing asyncronous collective operations in a multi-node processing system
Rashti et al.	2009	A speculative and adaptive MPI rendezvous protocol over RDMA-enabled interconnects
Wong et al.	1999	Push-Pull Messaging: a high-performance communication mechanism for commodity SMP clusters
Schneider et al.	2011	Kernel-based offload of collective operations–implementation, evaluation and lessons learned
Kee et al.	2002	An efficient implementation of the BSP programming library for VIA
Nunes et al.	2008	A profiler for a heterogeneous multi-core multi-FPGA system
Gu et al.	2013	Protocol customization for improving MPI performance on RDMA-enabled clusters
Roweth et al.	2005	Optimised global reduction on QsNet/sup II
Mohamed et al.	2006	High-performance message striping over reliable transport protocols
Peryshkova et al.	2020	Analysis of All-to-all Collective Operations on Hierarchical Computer Clusters