[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/165123.165126acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article
Free access

Working sets, cache sizes, and node granularity issues for large-scale multiprocessors

Published: 01 May 1993 Publication History

Abstract

The distribution of resources among processors, memory and caches is a crucial question faced by designers of large-scale parallel machines. If a machine is to solve problems with a certain data set size, should it be built with a large number of processors each with a small amount of memory, or a smaller number of processors each with a large amount of memory? How much cache memory should be provided per processor for cost-effectiveness? And how do these decisions change as larger problems are run on larger machines?
In this paper, we explore the above questions based on the characteristics of five important classes of large-scale parallel scientific applications. We first show that all the applications have a hierarchy of well-defined per-processor working sets, whose size, performance impact and scaling characteristics can help determine how large different levels of a multiprocessor's cache hierarchy should be. Then, we use these working sets together with certain other important characteristics of the applications—such as communication to computation ratios, concurrency, and load balancing behavior—to reflect upon the broader question of the granularity of processing nodes in high-performance multiprocessors.
We find that very small caches whose sizes do not increase with the problem or machine size are adequate for all but two of the application classes. Even in the two exceptions, the working sets scale quite slowly with problem size, and the cache sizes needed for problems that will be run in the foreseeable future are small. We also find that relatively fine-grained machines, with large numbers of processors and quite small amounts of memory per processor, are appropriate for all the applications.

References

[1]
David H. Bailey. FFTs in External or Hierarchical Memories. Journal of Supercomputing, 4:23-25, 1990.
[2]
Geoffrey Fox et al. Solving Problems on Concurrent Processors, Volume I: General Techniques and Regular Problems. Prentice Hall, 1988.
[3]
Lars Hemquist. Hierarchical N-body methods. Computer Physics Communications, 48:107-115, 1988.
[4]
H.T. Kung. Memory requirements for balanced computer architectures. In Proceedings of the 13th Annual International Symposium on Computer Architecture, 1986.
[5]
Gordon Moore. VLSI: Some fundamental challenges. IEEE Spectrum, pages 30-37, April 1979.
[6]
jason Nieh and Marc Levoy. Volume rendering on scalable shared-memory MIMD architectures. In Proceedings of the Boston Workshop on Volume Visualization, October 1992.
[7]
John K. Salmon. Parallel Hierarchical N-body Methods. PhD thesis, California Institute of Technology, December 1990.
[8]
Jaswinder Pal Singh, John L. Hennessy, and Anoop Gupta. Implications of hierarchical N-body techniques for multiprocessor architecture. Technical Report CSL-TR-92-506, Stanford University, 1992.
[9]
Jaswinder Pal Singh, John L. Hennessy, and Anoop Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, 26(7), July 1993. To appear. Also Stanford Univeristy Tech. Report no. CSL- TR-92-541, 1992.
[10]
Jaswinder Pal Singh, Chris Holt, Takashi Totsuka, Anoop Gupta, and John L. Hennessy. Load balancing and data locality in hierarchical N-body methods. Journal of Parallel and Distributed Computing. To appear. Prelim. version available as Stanford Univeristy Tech. Report no. CSL-TR- 92-505, Jan. 1992.
[11]
R. van de Geijn. Massively parallel LINPACK benchmark on the Intel Touchstone Delta and iPSC/860 systems. Technical Report CS-91-28, University of Texas at Austin, Ausu~t 1991.
[12]
Charles van Loan. Computational Frameworks for the Fast Fourier Transform. SIAM, 1992.

Cited By

View all
  • (2019)Deadline-Aware Fair Scheduling for Multi-Tenant Crowd-Powered SystemsACM Transactions on Social Computing10.1145/33010032:1(1-29)Online publication date: 21-Feb-2019
  • (2019)Mi Casa es su Casa? Examining Airbnb Hospitality Exchange Practices in a Developing EconomyACM Transactions on Social Computing10.1145/32998172:1(1-24)Online publication date: 6-Feb-2019
  • (2017)Adaptive Runtime-Assisted Block Prefetching on Chip-MultiprocessorsInternational Journal of Parallel Programming10.1007/s10766-016-0431-845:3(530-550)Online publication date: 1-Jun-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '93: Proceedings of the 20th annual international symposium on computer architecture
June 1993
361 pages
ISBN:0818638109
DOI:10.1145/165123
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 21, Issue 2
    Special Issue: Proceedings of the 20th annual international symposium on Computer architecture (ISCA '93)
    May 1993
    348 pages
    ISSN:0163-5964
    DOI:10.1145/173682
    Issue’s Table of Contents

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1993

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

20ISCA93
Sponsor:
20ISCA93: 20th International Symposium on Computer Architecture
May 16 - 19, 1993
California, San Diego, USA

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)116
  • Downloads (Last 6 weeks)18
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Deadline-Aware Fair Scheduling for Multi-Tenant Crowd-Powered SystemsACM Transactions on Social Computing10.1145/33010032:1(1-29)Online publication date: 21-Feb-2019
  • (2019)Mi Casa es su Casa? Examining Airbnb Hospitality Exchange Practices in a Developing EconomyACM Transactions on Social Computing10.1145/32998172:1(1-24)Online publication date: 6-Feb-2019
  • (2017)Adaptive Runtime-Assisted Block Prefetching on Chip-MultiprocessorsInternational Journal of Parallel Programming10.1007/s10766-016-0431-845:3(530-550)Online publication date: 1-Jun-2017
  • (2014)Tight Bounds for Asynchronous RenamingJournal of the ACM10.1145/259763061:3(1-51)Online publication date: 2-Jun-2014
  • (2014)Computing All Maps into a SphereJournal of the ACM10.1145/259762961:3(1-44)Online publication date: 2-Jun-2014
  • (2011)Estimating Application Cache Requirement for Provisioning Caches in Virtualized SystemsProceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems10.1109/MASCOTS.2011.67(55-62)Online publication date: 25-Jul-2011
  • (2010)Generalized ERSS tree modelPerformance Evaluation10.1016/j.peva.2010.08.00467:11(1139-1154)Online publication date: 1-Nov-2010
  • (2009)Program locality analysis using reuse distanceACM Transactions on Programming Languages and Systems10.1145/1552309.155231031:6(1-39)Online publication date: 26-Aug-2009
  • (2008)EditorialACM Transactions on Information Systems10.1145/1402256.140226426:4(1-6)Online publication date: 7-Oct-2008
  • (2007)On the Memory Access Patterns of Supercomputer ApplicationsIEEE Transactions on Computers10.1109/TC.2007.103956:7(937-945)Online publication date: 1-Jul-2007
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media