[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Limits on Interconnection Network Performance

Published: 01 October 1991 Publication History

Abstract

The latency of direct networks is modeled, taking into account both switch and wiredelays. A simple closed-form expression for contention in buffered, direct networks is derived and found to agree closely with simulations. The model includes the effects of packet size and communication locality. Network analysis under various constraints and under different workload parameters reveals that performance is highly sensitive to these constraints and workloads. A two-dimensional network is shown to have the lowest latency only when switch delays and network contention are ignored; three- or four-dimensional networks are favored otherwise. If communication locality exists, two-dimensional networks regain their advantage. Communication locality decreases both the base network latency and the network bandwidth requirements of applications. It is shown that a much larger fraction of the resulting performance improvement arises from the reduction in bandwidth requirements than from the decrease in latency.

References

[1]
{1} S. Abraham and K. Padmanabhan, "Performance of the direct binary n- cube network for multiprocessors," IEEE Trans. Comput., vol. 38, pp. 1000-1011, July 1989.
[2]
{2} A. Agarwal, B.-H. Lim, D. A. Kranz, and J. Kubiatowicz, "APRIL: A processor architecture for multiprocessing," in Proc. 17th Annu. Int. Symp. Comput. Architecture, June 1990, pp. 104-114.
[3]
{3} W. C. Athas and C. L. Seitz, "Multicomputers: Message-passing concurrent computers," IEEE Comput. Mag., vol. 21, pp. 9-24, Aug. 1988.
[4]
{4} S. Borkar et al., "iWarp: An integrated solution to high-speed parallel computing," in Proc. Supercomput. '88, Nov. 1988.
[5]
{5} D. Chaiken, C. Fields, K. Kurihara, and A. Agarwal, "Directory-based cache-coherence in large-scale multiprocessors," IEEE Comput. Mag., vol. 23, pp. 41-58, June 1990.
[6]
{6} D. Chaiken, J. Kubiatowicz, and A. Agarwal, "LimitLESS directories: A scalable cache coherence scheme," in Proc. Fourth Int. Conf. Architectural Support for Programming Languages Oper. Syst. (ASPLOS IV), ACM, Apr. 1991.
[7]
{7} W. J. Dally, A VLSI Architecture for Concurrent Data Structures. New York: Kluwer, 1987.
[8]
{8} W. J. Dally, "Performance analysis of k-ary n-cube interconnection networks," IEEE Trans. Comput., vol. 39, pp. 775-785, June 1990.
[9]
{9} W. J. Dally et al., "The J-Machine: A fine-grain concurrent computer," in Proc. IFIP Congress, 1989.
[10]
{10} D. Gajski, D. Kuck, D. Lawrie, and A. Saleh, "Cedar--A large scale multiprocessor," in Proc. Int. Conf. Parallel Processing, Aug. 1983, pp. 524-529.
[11]
{11} A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir, "The NYU Ultracomputer--Designing a MIMD shared-memory parallel machine," IEEE Trans. Comput., vol. C-32, pp. 175-189, Feb. 1983.
[12]
{12} R. Halstead and S. Ward, "The MuNet: A scalable decentralized architecture for parallel computation," in Proc. 7th Annu. Symp. Comput. Architecture, May 1980, pp. 139-145.
[13]
{13} W. D. Hillis, The Connection Machine. Cambridge, MA: M.I.T. Press, 1985.
[14]
{14} P. Kermani and L. Kleinrock, "Virtual cut-through: A new computer communication switching technique," Comput. Networks, vol. 3, pp. 267-286, Oct. 1979.
[15]
{15} L. Kleinrock, Queueing Systems. New York: Wiley, 1975.
[16]
{16} C. P. Kruskal and M. Snir, "The performance of multistage interconnection networks for multiprocessors," IEEE Trans. Comput., vol. C-32, pp. 1091-1098, Dec. 1983.
[17]
{17} C. P. Kruskal, M. Snir, and A. Weiss, "The distribution of waiting times in clocked multistage interconnection networks," IEEE Trans. Comput., vol. 37, pp. 1337-1352, Nov. 1988.
[18]
{18} J. T. Kuehn and B. J. Smith, "The HORIZON supercomputing system: Architecture and software," in Proc. Supercomputing '88, Nov. 1988.
[19]
{19} D. H. Lawrie, "Access and alignment of data in an array processor," IEEE Trans. Comput., vol. C-24, pp. 1145-1155, Dec. 1975.
[20]
{20} D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam, "Design of the Stanford DASH multiprocessor," Comput. Syst. Lab. TR 89-403, Stanford Univ., Dec. 1989.
[21]
{21} A. Norton and G. F. Pfister, "A methodology for predicting multiprocessor performance," in Proc. ICPP, Aug. 1985, pp. 772-781.
[22]
{22} J. H. Patel, "Performance of processor-memory interconnections for multiprocessors," IEEE Trans. Comput., vol. C-30, pp. 771-780, Oct. 1981.
[23]
{23} G. F. Pfister, W. C. Brantley, D. A. George, S. L. Harvey, W. J. Kleinfelder, K. P. McAuliffe, E. A. Melton, A. Norton, and J. Weiss, "The IBM Research Parallel Processor Prototype (RP3): Introduction and architecture," in Proc. ICPP, Aug. 1985, pp. 764-771.
[24]
{24} C. L. Seitz, "Concurrent VLSI architectures," IEEE Trans. Comput., vol. C-33, pp. 1247-1265, Dec. 1984.
[25]
{25} C. L. Seitz, "The Cosmic Cube," Commun. ACM, vol. 28, no. 1, pp. 22-33, Jan. 1985.
[26]
{26} C. L. Seitz et al., "The architecture and programming of the Ametek Series 2010 multicomputer," in Proc. Third Conf. Hypercube Concurrent Comput. and Appl., Jan. 1988.
[27]
{27} H. J. Siegel, Interconnection Networks for Large-Scale Parallel Processing , 2nd ed. New York: McGraw-Hill, 1990.
[28]
{28} H. Sullivan and T. R. Bashkow, "A large scale, homogeneous, fully distributed parallel machine," in Proc. 4th Annu. Symp. Comput. Architecture , Mar. 1977, pp. 105-117.
[29]
{29} C. D. Thompson, "A complexity theory for VLSI," Ph.D. dissertation, Dep. Comput. Sci., Carnegie-Mellon Univ., 1980.

Cited By

View all
  • (2023)Network-On-Chip Performance Evaluation by Synchronous Circuit SimulationProceedings of the 16th International Workshop on Network on Chip Architectures10.1145/3610396.3618089(9-14)Online publication date: 28-Oct-2023
  • (2018)Mathematical Modelling of Wormhole-Routed x-Folded TM Topology in the Presence of Uniform TrafficComputational Science – ICCS 201810.1007/978-3-319-93713-7_29(358-365)Online publication date: 11-Jun-2018
  • (2017)Performance Modeling of x-Folded TM Architecture in the Presence of Transpose TrafficProcedia Computer Science10.1016/j.procs.2017.10.043116:C(251-258)Online publication date: 1-Nov-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems  Volume 2, Issue 4
October 1991
123 pages

Publisher

IEEE Press

Publication History

Published: 01 October 1991

Author Tags

  1. Index Termsbuffered networks
  2. closed-form expression
  3. communication locality
  4. direct networks
  5. four-dimensional networks
  6. interconnection network performance
  7. latency
  8. multiprocessor interconnection networks
  9. network contention
  10. networkbandwidth requirements
  11. packet size
  12. performanceevaluation
  13. switch delays
  14. two-dimensionalnetwork
  15. wiredelays

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Network-On-Chip Performance Evaluation by Synchronous Circuit SimulationProceedings of the 16th International Workshop on Network on Chip Architectures10.1145/3610396.3618089(9-14)Online publication date: 28-Oct-2023
  • (2018)Mathematical Modelling of Wormhole-Routed x-Folded TM Topology in the Presence of Uniform TrafficComputational Science – ICCS 201810.1007/978-3-319-93713-7_29(358-365)Online publication date: 11-Jun-2018
  • (2017)Performance Modeling of x-Folded TM Architecture in the Presence of Transpose TrafficProcedia Computer Science10.1016/j.procs.2017.10.043116:C(251-258)Online publication date: 1-Nov-2017
  • (2017)HyperFatTreeInternational Journal of Parallel Programming10.1007/s10766-015-0393-245:1(172-184)Online publication date: 1-Feb-2017
  • (2016)A novel power efficient adaptive RED-based flow control mechanism for networks-on-chipComputers and Electrical Engineering10.1016/j.compeleceng.2015.09.02351:C(121-138)Online publication date: 1-Apr-2016
  • (2015)Diagnosing the causes and severity of one-sided message contentionACM SIGPLAN Notices10.1145/2858788.268851650:8(130-139)Online publication date: 24-Jan-2015
  • (2015)Diagnosing the causes and severity of one-sided message contentionProceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/2688500.2688516(130-139)Online publication date: 24-Jan-2015
  • (2015)Zero-load predictive model for performance analysis in deflection routing NoCsMicroprocessors & Microsystems10.1016/j.micpro.2015.09.00239:8(634-647)Online publication date: 1-Nov-2015
  • (2015)Performance evaluation of generic multi-stage interconnection networks with blocking and back-pressure mechanismThe Journal of Supercomputing10.1007/s11227-014-1350-371:3(1038-1066)Online publication date: 1-Mar-2015
  • (2015)PDNOCConcurrency and Computation: Practice & Experience10.1002/cpe.336427:4(1054-1067)Online publication date: 25-Mar-2015
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media