The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors

The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory MultiprocessorsJanuary 1995

January 1995

1995 Technical Report

Publisher:

Stanford University
408 Panama Mall, Suite 217
Stanford
CA
United States

Published:01 January 1995

Bibliometrics

Abstract

Distributed shared memory (DSM) machines can be characterized by four parameters, based on a slightly modified version of the logP model. The l (latency) and o (occupancy of the communication controller) parameters are the keys to performance in these machines, and are largely determined by major architectural decisions about the aggressiveness and customization of the node and network. For recent and upcoming machines, the g (gap) parameter that measures node-to-network bandwidth does not appear to be a bottleneck. Conventional wisdom is that latency is the dominant factor in determining the performance of a DSM machine. We show, however, that controller occupancy--which causes contention even in highly optimized applications--plays a major role, especially at low latencies. When latency hiding is used, occupancy becomes more critical, even in machines with high latency networks. Scaling the problem size is often used as a technique to overcome limitations in communication latency and bandwidth. We show that in many structured computations occupancy-induced contention is not alleviated by increasing problem size, and that there are important classes of applications for which the performance lost by using higher latency networks or higher occupancy controllers cannot be regained easily, if at all, by scaling the problem size.

Cited By

Contributors

Christopher Martin Holt
Stanford University
- Publication Years1986 - 2003
- Publication counts12
- Citation count365
- Available for Download4
- Downloads (cumulative)3,299
- Downloads (12 months)398
- Downloads (6 weeks)51
- Average Downloads per Article825
- Average Citation per Article30
View Full Profile
Mark Andrew Heinrich
University of Central Florida
- Publication Years1994 - 2024
- Publication counts26
- Citation count1,952
- Available for Download23
- Downloads (cumulative)15,559
- Downloads (12 months)2,110
- Downloads (6 weeks)323
- Average Downloads per Article676
- Average Citation per Article75
View Full Profile
Jaswinder Pal Singh
Princeton University
- Publication Years1991 - 2008
- Publication counts83
- Citation count11,696
- Available for Download67
- Downloads (cumulative)57,609
- Downloads (12 months)7,239
- Downloads (6 weeks)1,043
- Average Downloads per Article860
- Average Citation per Article141
View Full Profile
Edward Eric Rothberg
Hewlett Packard Enterprise
- Publication Years1989 - 2003
- Publication counts23
- Citation count405
- Available for Download7
- Downloads (cumulative)2,980
- Downloads (12 months)516
- Downloads (6 weeks)78
- Average Downloads per Article426
- Average Citation per Article18
View Full Profile
John L. Hennessy
Stanford University
- Publication Years1977 - 2024
- Publication counts129
- Citation count14,407
- Available for Download101
- Downloads (cumulative)154,513
- Downloads (12 months)31,194
- Downloads (6 weeks)8,910
- Average Downloads per Article1,530
- Average Citation per Article112
View Full Profile

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Recommendations

Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation

While the desire to use commodity parts in the communication architecture of a DSM multiprocessor offers advantages in cost and design time, the impact on application performance is unclear. We study this performance impact through detailed simulation, ...
Scalable directory architecture for distributed shared memory chip multiprocessors

Traditional Directory-based cache coherence protocol is far from optimal for large-scale cache coherent shared memory multiprocessors due to the increasing latency to access directories stored in DRAM memory. Instead of keeping directories in main ...
Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors
ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture

Destination-set prediction can improve the latency/bandwidth tradeoff in shared-memory multiprocessors. The destination set is the collection of processors that receive a particular coherence request. Snooping protocols send requests to the maximal ...

Browse Reports

Sections

Cited By

Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation

Scalable directory architecture for distributed shared memory chip multiprocessors

Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors

Save to Binder

Sections

Cited By

Save to Binder

Recommendations

Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation

Scalable directory architecture for distributed shared memory chip multiprocessors

Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors