[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Working sets, cache sizes, and node granularity issues for large-scale multiprocessors

Published: 01 May 1993 Publication History

Abstract

The distribution of resources among processors, memory and caches is a crucial question faced by designers of large-scale parallel machines. If a machine is to solve problems with a certain data set size, should it be built with a large number of processors each with a small amount of memory, or a smaller number of processors each with a large amount of memory? How much cache memory should be provided per processor for cost-effectiveness? And how do these decisions change as larger problems are run on larger machines?
In this paper, we explore the above questions based on the characteristics of five important classes of large-scale parallel scientific applications. We first show that all the applications have a hierarchy of well-defined per-processor working sets, whose size, performance impact and scaling characteristics can help determine how large different levels of a multiprocessor's cache hierarchy should be. Then, we use these working sets together with certain other important characteristics of the applications—such as communication to computation ratios, concurrency, and load balancing behavior—to reflect upon the broader question of the granularity of processing nodes in high-performance multiprocessors.
We find that very small caches whose sizes do not increase with the problem or machine size are adequate for all but two of the application classes. Even in the two exceptions, the working sets scale quite slowly with problem size, and the cache sizes needed for problems that will be run in the foreseeable future are small. We also find that relatively fine-grained machines, with large numbers of processors and quite small amounts of memory per processor, are appropriate for all the applications.

References

[1]
David H. Bailey. FFTs in External or Hierarchical Memories. Journal of Supercomputing, 4:23-25, 1990.
[2]
Geoffrey Fox et al. Solving Problems on Concurrent Processors, Volume I: General Techniques and Regular Problems. Prentice Hall, 1988.
[3]
Lars Hemquist. Hierarchical N-body methods. Computer Physics Communications, 48:107-115, 1988.
[4]
H.T. Kung. Memory requirements for balanced computer architectures. In Proceedings of the 13th Annual International Symposium on Computer Architecture, 1986.
[5]
Gordon Moore. VLSI: Some fundamental challenges. IEEE Spectrum, pages 30-37, April 1979.
[6]
jason Nieh and Marc Levoy. Volume rendering on scalable shared-memory MIMD architectures. In Proceedings of the Boston Workshop on Volume Visualization, October 1992.
[7]
John K. Salmon. Parallel Hierarchical N-body Methods. PhD thesis, California Institute of Technology, December 1990.
[8]
Jaswinder Pal Singh, John L. Hennessy, and Anoop Gupta. Implications of hierarchical N-body techniques for multiprocessor architecture. Technical Report CSL-TR-92-506, Stanford University, 1992.
[9]
Jaswinder Pal Singh, John L. Hennessy, and Anoop Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, 26(7), July 1993. To appear. Also Stanford Univeristy Tech. Report no. CSL- TR-92-541, 1992.
[10]
Jaswinder Pal Singh, Chris Holt, Takashi Totsuka, Anoop Gupta, and John L. Hennessy. Load balancing and data locality in hierarchical N-body methods. Journal of Parallel and Distributed Computing. To appear. Prelim. version available as Stanford Univeristy Tech. Report no. CSL-TR- 92-505, Jan. 1992.
[11]
R. van de Geijn. Massively parallel LINPACK benchmark on the Intel Touchstone Delta and iPSC/860 systems. Technical Report CS-91-28, University of Texas at Austin, Ausu~t 1991.
[12]
Charles van Loan. Computational Frameworks for the Fast Fourier Transform. SIAM, 1992.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 21, Issue 2
Special Issue: Proceedings of the 20th annual international symposium on Computer architecture (ISCA '93)
May 1993
348 pages
ISSN:0163-5964
DOI:10.1145/173682
Issue’s Table of Contents
  • cover image ACM Conferences
    ISCA '93: Proceedings of the 20th annual international symposium on computer architecture
    June 1993
    361 pages
    ISBN:0818638109
    DOI:10.1145/165123

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1993
Published in SIGARCH Volume 21, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)116
  • Downloads (Last 6 weeks)16
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2005)A performance tuning approach for shared-memory multiprocessorsEuro-Par'97 Parallel Processing10.1007/BFb0002718(72-83)Online publication date: 26-Sep-2005
  • (2003)ReferencesInterconnection Networks10.1016/B978-155860852-8/50015-8(569-592)Online publication date: 2003
  • (1996)A quantitative study of parallel scientific applications with explicit communicationThe Journal of Supercomputing10.1007/BF0012809710:1(5-24)Online publication date: 1996
  • (1995)I/O limitations in parallel molecular dynamicsProceedings of the 1995 ACM/IEEE conference on Supercomputing10.1145/224170.224220(23-es)Online publication date: 8-Dec-1995
  • (2019)Deadline-Aware Fair Scheduling for Multi-Tenant Crowd-Powered SystemsACM Transactions on Social Computing10.1145/33010032:1(1-29)Online publication date: 21-Feb-2019
  • (2019)Mi Casa es su Casa? Examining Airbnb Hospitality Exchange Practices in a Developing EconomyACM Transactions on Social Computing10.1145/32998172:1(1-24)Online publication date: 6-Feb-2019
  • (2017)Adaptive Runtime-Assisted Block Prefetching on Chip-MultiprocessorsInternational Journal of Parallel Programming10.1007/s10766-016-0431-845:3(530-550)Online publication date: 1-Jun-2017
  • (2014)Tight Bounds for Asynchronous RenamingJournal of the ACM10.1145/259763061:3(1-51)Online publication date: 2-Jun-2014
  • (2014)Computing All Maps into a SphereJournal of the ACM10.1145/259762961:3(1-44)Online publication date: 2-Jun-2014
  • (2011)Estimating Application Cache Requirement for Provisioning Caches in Virtualized SystemsProceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems10.1109/MASCOTS.2011.67(55-62)Online publication date: 25-Jul-2011
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media