More Web Proxy on the site http://driver.im/

research-article

Limits on Interconnection Network Performance

Author:

A. AgarwalAuthors Info & Claims

IEEE Transactions on Parallel and Distributed Systems, Volume 2, Issue 4

Pages 398 - 412

https://doi.org/10.1109/71.97897

Published: 01 October 1991 Publication History

Abstract

The latency of direct networks is modeled, taking into account both switch and wiredelays. A simple closed-form expression for contention in buffered, direct networks is derived and found to agree closely with simulations. The model includes the effects of packet size and communication locality. Network analysis under various constraints and under different workload parameters reveals that performance is highly sensitive to these constraints and workloads. A two-dimensional network is shown to have the lowest latency only when switch delays and network contention are ignored; three- or four-dimensional networks are favored otherwise. If communication locality exists, two-dimensional networks regain their advantage. Communication locality decreases both the base network latency and the network bandwidth requirements of applications. It is shown that a much larger fraction of the resulting performance improvement arises from the reduction in bandwidth requirements than from the decrease in latency.

References

[1]

{1} S. Abraham and K. Padmanabhan, "Performance of the direct binary n- cube network for multiprocessors," IEEE Trans. Comput., vol. 38, pp. 1000-1011, July 1989.

Digital Library

[2]

{2} A. Agarwal, B.-H. Lim, D. A. Kranz, and J. Kubiatowicz, "APRIL: A processor architecture for multiprocessing," in Proc. 17th Annu. Int. Symp. Comput. Architecture, June 1990, pp. 104-114.

Digital Library

[3]

{3} W. C. Athas and C. L. Seitz, "Multicomputers: Message-passing concurrent computers," IEEE Comput. Mag., vol. 21, pp. 9-24, Aug. 1988.

Digital Library

[4]

{4} S. Borkar et al., "iWarp: An integrated solution to high-speed parallel computing," in Proc. Supercomput. '88, Nov. 1988.

Digital Library

[5]

{5} D. Chaiken, C. Fields, K. Kurihara, and A. Agarwal, "Directory-based cache-coherence in large-scale multiprocessors," IEEE Comput. Mag., vol. 23, pp. 41-58, June 1990.

Digital Library

[6]

{6} D. Chaiken, J. Kubiatowicz, and A. Agarwal, "LimitLESS directories: A scalable cache coherence scheme," in Proc. Fourth Int. Conf. Architectural Support for Programming Languages Oper. Syst. (ASPLOS IV), ACM, Apr. 1991.

Digital Library

[7]

{7} W. J. Dally, A VLSI Architecture for Concurrent Data Structures. New York: Kluwer, 1987.

Digital Library

[8]

{8} W. J. Dally, "Performance analysis of k-ary n-cube interconnection networks," IEEE Trans. Comput., vol. 39, pp. 775-785, June 1990.

Digital Library

[9]

{9} W. J. Dally et al., "The J-Machine: A fine-grain concurrent computer," in Proc. IFIP Congress, 1989.

[10]

{10} D. Gajski, D. Kuck, D. Lawrie, and A. Saleh, "Cedar--A large scale multiprocessor," in Proc. Int. Conf. Parallel Processing, Aug. 1983, pp. 524-529.

[11]

{11} A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir, "The NYU Ultracomputer--Designing a MIMD shared-memory parallel machine," IEEE Trans. Comput., vol. C-32, pp. 175-189, Feb. 1983.

[12]

{12} R. Halstead and S. Ward, "The MuNet: A scalable decentralized architecture for parallel computation," in Proc. 7th Annu. Symp. Comput. Architecture, May 1980, pp. 139-145.

Digital Library

[13]

{13} W. D. Hillis, The Connection Machine. Cambridge, MA: M.I.T. Press, 1985.

Digital Library

[14]

{14} P. Kermani and L. Kleinrock, "Virtual cut-through: A new computer communication switching technique," Comput. Networks, vol. 3, pp. 267-286, Oct. 1979.

[15]

{15} L. Kleinrock, Queueing Systems. New York: Wiley, 1975.

Digital Library

[16]

{16} C. P. Kruskal and M. Snir, "The performance of multistage interconnection networks for multiprocessors," IEEE Trans. Comput., vol. C-32, pp. 1091-1098, Dec. 1983.

[17]

{17} C. P. Kruskal, M. Snir, and A. Weiss, "The distribution of waiting times in clocked multistage interconnection networks," IEEE Trans. Comput., vol. 37, pp. 1337-1352, Nov. 1988.

Digital Library

[18]

{18} J. T. Kuehn and B. J. Smith, "The HORIZON supercomputing system: Architecture and software," in Proc. Supercomputing '88, Nov. 1988.

Digital Library

[19]

{19} D. H. Lawrie, "Access and alignment of data in an array processor," IEEE Trans. Comput., vol. C-24, pp. 1145-1155, Dec. 1975.

[20]

{20} D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam, "Design of the Stanford DASH multiprocessor," Comput. Syst. Lab. TR 89-403, Stanford Univ., Dec. 1989.

[21]

{21} A. Norton and G. F. Pfister, "A methodology for predicting multiprocessor performance," in Proc. ICPP, Aug. 1985, pp. 772-781.

[22]

{22} J. H. Patel, "Performance of processor-memory interconnections for multiprocessors," IEEE Trans. Comput., vol. C-30, pp. 771-780, Oct. 1981.

[23]

{23} G. F. Pfister, W. C. Brantley, D. A. George, S. L. Harvey, W. J. Kleinfelder, K. P. McAuliffe, E. A. Melton, A. Norton, and J. Weiss, "The IBM Research Parallel Processor Prototype (RP3): Introduction and architecture," in Proc. ICPP, Aug. 1985, pp. 764-771.

[24]

{24} C. L. Seitz, "Concurrent VLSI architectures," IEEE Trans. Comput., vol. C-33, pp. 1247-1265, Dec. 1984.

[25]

{25} C. L. Seitz, "The Cosmic Cube," Commun. ACM, vol. 28, no. 1, pp. 22-33, Jan. 1985.

Digital Library

[26]

{26} C. L. Seitz et al., "The architecture and programming of the Ametek Series 2010 multicomputer," in Proc. Third Conf. Hypercube Concurrent Comput. and Appl., Jan. 1988.

Digital Library

[27]

{27} H. J. Siegel, Interconnection Networks for Large-Scale Parallel Processing , 2nd ed. New York: McGraw-Hill, 1990.

Digital Library

[28]

{28} H. Sullivan and T. R. Bashkow, "A large scale, homogeneous, fully distributed parallel machine," in Proc. 4th Annu. Symp. Comput. Architecture , Mar. 1977, pp. 105-117.

Digital Library

[29]

{29} C. D. Thompson, "A complexity theory for VLSI," Ph.D. dissertation, Dep. Comput. Sci., Carnegie-Mellon Univ., 1980.

Digital Library

Cited By

Werner LRoob JSchneider K(2023)Network-On-Chip Performance Evaluation by Synchronous Circuit SimulationProceedings of the 16th International Workshop on Network on Chip Architectures10.1145/3610396.3618089(9-14)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3610396.3618089
Moudi MOthman MYeah Lun KAbdul Rahiman A(2018)Mathematical Modelling of Wormhole-Routed x-Folded TM Topology in the Presence of Uniform TrafficComputational Science – ICCS 201810.1007/978-3-319-93713-7_29(358-365)Online publication date: 11-Jun-2018
https://dl.acm.org/doi/10.1007/978-3-319-93713-7_29
Moudi MOthman MLun KAbdul Rahiman A(2017)Performance Modeling of x-Folded TM Architecture in the Presence of Transpose TrafficProcedia Computer Science10.1016/j.procs.2017.10.043116:C(251-258)Online publication date: 1-Nov-2017
https://dl.acm.org/doi/10.1016/j.procs.2017.10.043
Show More Cited By

Limits on Interconnection Network Performance

Recommendations

Performance Tradeoffs in Multithreaded Processors

An analytical performance model for multithreaded processors that includes cache interference, network contention, context-switching overhead, and data-sharing effects is presented. The model is validated through the author's simulations and by ...
Accuracy vs. performance in parallel simulation of interconnection networks
IPPS '95: Proceedings of the 9th International Symposium on Parallel Processing

Parallel simulation is emerging as the dominant technique for studying parallel computers. However the interconnection networks of these machines can be modeled at many different levels of abstraction, allowing researchers to trade off accuracy and ...
Abstracting network characteristics and locality properties of parallel systems
HPCA '95: Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture

Abstracting features of parallel systems is a technique that has been traditionally used in theoretical and analytical models for program development and performance evaluation. We explore the use of abstractions in execution-driven simulators in order ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems

IEEE Transactions on Parallel and Distributed Systems Volume 2, Issue 4

October 1991

123 pages

ISSN:1045-9219

Issue’s Table of Contents

Copyright © Copyright © 1991 IEEE. All Rights Reserved.

Publisher

IEEE Press

Publication History

Published: 01 October 1991

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

172
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Werner LRoob JSchneider K(2023)Network-On-Chip Performance Evaluation by Synchronous Circuit SimulationProceedings of the 16th International Workshop on Network on Chip Architectures10.1145/3610396.3618089(9-14)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3610396.3618089
Moudi MOthman MYeah Lun KAbdul Rahiman A(2018)Mathematical Modelling of Wormhole-Routed x-Folded TM Topology in the Presence of Uniform TrafficComputational Science – ICCS 201810.1007/978-3-319-93713-7_29(358-365)Online publication date: 11-Jun-2018
https://dl.acm.org/doi/10.1007/978-3-319-93713-7_29
Moudi MOthman MLun KAbdul Rahiman A(2017)Performance Modeling of x-Folded TM Architecture in the Presence of Transpose TrafficProcedia Computer Science10.1016/j.procs.2017.10.043116:C(251-258)Online publication date: 1-Nov-2017
https://dl.acm.org/doi/10.1016/j.procs.2017.10.043
Su YWang ZFan ZCao ZLiu XShao EAn XSun N(2017)HyperFatTreeInternational Journal of Parallel Programming10.1007/s10766-015-0393-245:1(172-184)Online publication date: 1-Feb-2017
https://dl.acm.org/doi/10.1007/s10766-015-0393-2
Akbar RSafaei FModallalkar S(2016)A novel power efficient adaptive RED-based flow control mechanism for networks-on-chipComputers and Electrical Engineering10.1016/j.compeleceng.2015.09.02351:C(121-138)Online publication date: 1-Apr-2016
https://dl.acm.org/doi/10.1016/j.compeleceng.2015.09.023
Tallent NVishnu AVan Dam HDaily JKerbyson DHoisie A(2015)Diagnosing the causes and severity of one-sided message contentionACM SIGPLAN Notices10.1145/2858788.268851650:8(130-139)Online publication date: 24-Jan-2015
https://dl.acm.org/doi/10.1145/2858788.2688516
Tallent NVishnu AVan Dam HDaily JKerbyson DHoisie ACohen AGrove D(2015)Diagnosing the causes and severity of one-sided message contentionProceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/2688500.2688516(130-139)Online publication date: 24-Jan-2015
https://dl.acm.org/doi/10.1145/2688500.2688516
Weldezion AGrange MJantsch ATenhunen HPamunuwa D(2015)Zero-load predictive model for performance analysis in deflection routing NoCsMicroprocessors & Microsystems10.1016/j.micpro.2015.09.00239:8(634-647)Online publication date: 1-Nov-2015
https://dl.acm.org/doi/10.1016/j.micpro.2015.09.002
Amiri-Zarandi MSafaei FRoozikhar M(2015)Performance evaluation of generic multi-stage interconnection networks with blocking and back-pressure mechanismThe Journal of Supercomputing10.1007/s11227-014-1350-371:3(1038-1066)Online publication date: 1-Mar-2015
https://dl.acm.org/doi/10.1007/s11227-014-1350-3
Xu TLeppänen VLiljeberg PPlosila JTenhunen H(2015)PDNOCConcurrency and Computation: Practice & Experience10.1002/cpe.336427:4(1054-1067)Online publication date: 25-Mar-2015
https://dl.acm.org/doi/10.1002/cpe.3364
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents