[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/195473.195583acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article
Free access

The effectiveness of multiple hardware contexts

Published: 01 November 1994 Publication History

Abstract

Multithreaded processors are used to tolerate long memory latencies. By executing threads loaded in multiple hardware contexts, an otherwise idle processor can keep busy, thus increasing its utilization. However, the larger size of a multi-thread working set can have a negative effect on cache conflict misses. In this paper we evaluate the two phenomena together, examining their combined effect on execution time.
The usefulness of multiple hardware contexts depends on: program data locality, cache organization and degree of multiprocessing. Multiple hardware contexts are most effective on programs that have been optimized for data locality. For these programs, execution time dropped with increasing contexts, over widely varying architectures. With unoptimized applications, multiple contexts had limited value. The best performance was seen with only two contexts, and only on uniprocessors and small multiprocessors. The behavior of the unoptimized applications changed more noticeably with variations in cache associativity and cache hierarchy, unlike the optimized programs.
As a mechanism for exploiting program parallelism, an additional processor is clearly better than another context. However, there were many configurations for which the addition of a few hardware contexts brought as much or greater performance than a larger multiprocessor with fewer than the optimal number of contexts.

References

[1]
A. Agarwal. Limits on interconnection network performnce, iEEE Transactions on Parallel and Distributed Systms, 2(4):398-412, October 1991.
[2]
A. Agarwai. Performance tradeoffs in multithreaded processors. IEEE Transactions on Parallel and Distributed Systems, 3(5):525-539, September 1992.
[3]
A. Agarwal, B-H. Lim, D. Kranz, and J. Kubiatowicz. APRIL: A processor architecture for multiprocessmg. 17th Annual International Symposium on Computer Arc. hitecture, pages 104-114, May 1990.
[4]
R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The Tera computer system. International Conference on Supercomputing, pages 1{-6, June 1990.
[5]
B.N. Bershad, E. D, Lazowska, and H. M. Levy. PRESTO: A system for object-oriented parallel programming. Software: Practice and Experience, 18(8):713-732, August 1988.
[6]
B. Boothe and A. Ranade. Improved mulfithreading techniques for hiding communication latency in multiprocessors. 19th Annual International Symposium on Computer Architecture, pages 214-223, May 1992.
[7]
D. Chaiken, J. Kubiatowicz, and A. Agarwal. LimitLESS directories: A scalable cache coherence scheme. Architectural Support for Programming Languages and Operating Systems, pages 224-234, April 1991.
[8]
S.j. Eggers, D. R. Keppel, E. J. Koldinger, and H. M. Levy. Techniques for efficient inline tracing on a shared-memory mulfiprocessor. ACM SiGMETRICS Conference on Measurernent and Modeling of Computer Systems, pages 37-46, May 990.
[9]
K.i. Farkas and N. P. Jouppi. Complexity/performance tradeoffs with non-blocking loads. 21th Annual International Symposium on Computer Architecture, pages 211-222, April 1994.
[10]
M. K. Fattens and A. R. Pleszkum. Strategies for achieving processor throughput. 18th Annual International Symposium on Computer Architecture, pages 362-369, May 1991.
[11]
A. Gupta, J. Hennesey, K. Gharachorloo, T Mowry, and W- D. Weber. Comparative evaluation of latency reducing and tolerating techniques. 18th Annual International Symposium on Computer Architecture, pages 254-263, May 1991.
[12]
R. H. Halstead and T. Fujita. MASA: A mulfithreaded processor architecture for parallel symbolic computing. 15th Annual International Symposium on Computer Archi,tecture, pages 443--451, May 1988.
[13]
T.E. Jeremiassen and S.J. Eggers. Computing per-process summary side-effect information. 5th Workshop on Languages and Compilers for Parallel Computing, August 1992. Also appeared as LNCS #757, pages 175-19I.
[14]
T.E. Jeremiassen and S.J. Eggers. Static analysis of barrier synchronization in explicitly parallel programs. International Conference on Parallel Architectures and Compilation Techniques, Montreal, August 1994.
[15]
D. Kroft. Lockup-free instruction fetch/prefetch cache organization. 8th Annual Symposium on Computer Architecture, pages 81-87, May 1981.
[16]
E. P. Markatos and T. J. LeBlanc. Using processor affinity in loop scheduling on shared-memory multiprocessors. Supercompt~ing '92, pages 104-113, November 1992.
[17]
J. H. Mulder, N. T. Quach, and M. J Flynn. An area model for on-chip memories and its applications. IEEE Journal of Solid-State Circuits, 26(2):98-106, February 1991.
[18]
C. D. Polychronopoulos and D. J. Kuck. Guided selfscheduling' A practical scheduling scheme for parallel supercomputers. IEEE Transactions on Computers, C- 36(t2):1425-1439, December 1987.
[19]
R. H. Saavedra-Barrera, D. E. Culler, and T. yon Eicken. Analysis of multithreaded architectures for parallel computing. 2nd Annual ACM Symposium on Parallel Algorithms and Architectures, pages 169-178, July 1990.
[20]
J.P. Singh, W-D. Weber, and A. Gupta. SPLASH: Stanford parallel applications for shared-memory. Computer Architecture News, 20(1 ):5--44, March 1992.
[21]
B.J. Smith. Architecture and applications of the HEP multiprocessor computer system. SPIE, Real-Time Signal Processing/V, 298:241-248, 1981.
[22]
Symmetry Technical Summary. Sequent Computer Systems, Inc.
[23]
R. Thekkath and S.J. Eggers. Impact of sharing-basedthTead placement on multithreaded architectures. 21th Annual international Symposium on Computer Architecture, pages 176- 186, April 1994.
[24]
T.H. Tzen and L. M. Ni. Dynamic loop scheduling for sharedmemory multiprocessors. 1991 International Conference on Parallel Processing, pages 1i:246-250, August 1991.
[25]
T. Wada, S. Rajan, and S. A. Przybylski. An analytical access time model for on-chip cache memories. IEEE Journal of Solid-State Circuits, 27(8):1147-1156, August 1992.
[26]
W-D. Weber and A. Gupta. Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: Preliminary results. 16th Annual International Symposium on Computer Architecture, pages 273-280, June 1989.

Cited By

View all
  • (2023)Re-Cache: Mitigating cache contention by exploiting locality characteristics with reconfigurable memory hierarchy for GPGPUsMicroelectronics Journal10.1016/j.mejo.2023.105825138(105825)Online publication date: Aug-2023
  • (2013)OWLACM SIGPLAN Notices10.1145/2499368.245115848:4(395-406)Online publication date: 16-Mar-2013
  • (2013)OWLACM SIGARCH Computer Architecture News10.1145/2490301.245115841:1(395-406)Online publication date: 16-Mar-2013
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS VI: Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
November 1994
341 pages
ISBN:0897916603
DOI:10.1145/195473
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 1994

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ASPLOS94
Sponsor:

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)162
  • Downloads (Last 6 weeks)15
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Re-Cache: Mitigating cache contention by exploiting locality characteristics with reconfigurable memory hierarchy for GPGPUsMicroelectronics Journal10.1016/j.mejo.2023.105825138(105825)Online publication date: Aug-2023
  • (2013)OWLACM SIGPLAN Notices10.1145/2499368.245115848:4(395-406)Online publication date: 16-Mar-2013
  • (2013)OWLACM SIGARCH Computer Architecture News10.1145/2490301.245115841:1(395-406)Online publication date: 16-Mar-2013
  • (2013)OWLProceedings of the eighteenth international conference on Architectural support for programming languages and operating systems10.1145/2451116.2451158(395-406)Online publication date: 16-Mar-2013
  • (2013)Multi-criteria checkpointing strategiesProceedings of the 19th international conference on Parallel Processing10.1007/978-3-642-40047-6_43(420-431)Online publication date: 26-Aug-2013
  • (2009)Memory-level parallelism aware fetch policies for simultaneous multithreading processorsACM Transactions on Architecture and Code Optimization10.1145/1509864.15098676:1(1-33)Online publication date: 2-Apr-2009
  • (2009)Efficient Scheduling of Nested Parallel Loops on Multi-Core SystemsProceedings of the 2009 International Conference on Parallel Processing10.1109/ICPP.2009.19(74-83)Online publication date: 22-Sep-2009
  • (2006)Exploiting multilevel parallelism using OpenMP on a massive multithreaded architectureJournal of Embedded Computing10.5555/1370998.13710072:2(141-155)Online publication date: 1-Apr-2006
  • (2006)A processor extension for cycle-accurate real-time softwareProceedings of the 2006 international conference on Embedded and Ubiquitous Computing10.1007/11802167_46(449-458)Online publication date: 1-Aug-2006
  • (2005)Optimizing NANOS OpenMP for the IBM Cyclops Multithreaded ArchitectureProceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 0110.1109/IPDPS.2005.317Online publication date: 4-Apr-2005
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media