Keyword: Collective operations : Search

research-article

MPI collective communication through a single set of interfaces: A case for orthogonality

Parallel Computing (PACO), Volume 107, Issue Chttps://doi.org/10.1016/j.parco.2021.102826

Abstract

We present and discuss a unified view of and interface for collective communication in the MPI (Message-Passing Interface) standard that in a natural way exploits MPI’s orthogonality of concepts. We observe that the currently separate ...

review-article

Efficient design for MPI asynchronous progress without dedicated resources

Parallel Computing (PACO), Volume 85, Issue CPages 13–26https://doi.org/10.1016/j.parco.2019.03.003

Highlights

Presents a scalable asynchronous progress design that requires - No additional software or hardware resources. - No interrupts from the network adapter. - No ...

Abstract

The overlap of computation and communication is critical for good performance of many HPC applications. State-of-the-art designs for the asynchronous progress require specially designed hardware resources (advanced switches or network ...

research-article

Efficient Asynchronous Communication Progress for MPI without Dedicated Resources

EuroMPI '18: Proceedings of the 25th European MPI Users' Group MeetingArticle No.: 14, Pages 1–11https://doi.org/10.1145/3236367.3236376

The overlap of computation and communication is critical for good performance of many HPC applications. State-of-the-art designs for the asynchronous progress require specially designed hardware resources (advanced switches or network interface cards), ...

article

A novel MPI reduction algorithm resilient to imbalances in process arrival times

The Journal of Supercomputing (JSCO), Volume 72, Issue 5Pages 1973–2013https://doi.org/10.1007/s11227-016-1707-x

Reduction algorithms are optimized only under the assumption that all processes commence the reduction simultaneously. Research on process arrival times has shown that this is rarely the case. Thus, all benchmarking methodologies that take into account ...

article

Scalable PGAS collective operations in NUMA clusters

Cluster Computing (KLU-CLUS), Volume 17, Issue 4Pages 1473–1495https://doi.org/10.1007/s10586-014-0377-9

The increasing number of cores per processor is turning manycore-based systems in pervasive. This involves dealing with multiple levels of memory in non uniform memory access (NUMA) systems and processor cores hierarchies, accessible via complex ...

research-article

Exploring the effect of noise on the performance benefit of nonblocking allreduce

EuroMPI/ASIA '14: Proceedings of the 21st European MPI Users' Group MeetingPages 77–82https://doi.org/10.1145/2642769.2642786

Relaxed synchronization offers the potential of maintaining application scalability by allowing many processes to make independent progress when some processes suffer delays. Yet, the benefits of this approach in important parallel workloads have not ...

article

KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework

Journal of Parallel and Distributed Computing (JPDC), Volume 73, Issue 2Pages 176–188https://doi.org/10.1016/j.jpdc.2012.09.016

The multiplication of cores in today's architectures raises the importance of intra-node communication in modern clusters and their impact on the overall parallel application performance. Although several proposals focused on this issue in the past, ...

Article

Low-Latency Collectives for the Intel SCC

CLUSTER '12: Proceedings of the 2012 IEEE International Conference on Cluster ComputingPages 346–354https://doi.org/10.1109/CLUSTER.2012.58

Message passing has been adopted as the main programming paradigm for many-core processors with on-chip networks for inter-core communication. To this end, message-passing libraries such as MPI can be used, as they provide well-known interfaces to ...

article

A calculus for parallel computations over multidimensional dense arrays

Computer Languages, Systems and Structures (CLSS), Volume 33, Issue 3-4Pages 82–110https://doi.org/10.1016/j.cl.2006.07.005

We present a calculus to formalize and give costs to parallel computations over multidimensional dense arrays. The calculus extends a simple distribution calculus (proposed in some previous work) with computation and data collection. We consider an SPMD ...

article

Optimizing a conjugate gradient solver with non-blocking collective operations

Parallel Computing (PACO), Volume 33, Issue 9Pages 624–633https://doi.org/10.1016/j.parco.2007.06.006

This paper presents a case study that analyzes the suitability and usage of non-blocking collective operations in parallel applications. As with their point-to-point counterparts, non-blocking collective operations provide the ability to overlap ...

Search Results

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Conference Event

Proceedings Series

Publication Date

MPI collective communication through a single set of interfaces: A case for orthogonality

Efficient design for MPI asynchronous progress without dedicated resources

Efficient Asynchronous Communication Progress for MPI without Dedicated Resources

A novel MPI reduction algorithm resilient to imbalances in process arrival times

Scalable PGAS collective operations in NUMA clusters

Exploring the effect of noise on the performance benefit of nonblocking allreduce

KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework

Low-Latency Collectives for the Intel SCC

A calculus for parallel computations over multidimensional dense arrays

Optimizing a conjugate gradient solver with non-blocking collective operations

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Conference Event

Proceedings Series

Publication Date

Save to Binder