More Web Proxy on the site http://driver.im/

Article

Free access

The effectiveness of multiple hardware contexts

Authors:

Radhika Thekkath,

Susan J. EggersAuthors Info & Claims

ASPLOS VI: Proceedings of the sixth international conference on Architectural support for programming languages and operating systems

Pages 328 - 337

https://doi.org/10.1145/195473.195583

Published: 01 November 1994 Publication History

Abstract

Multithreaded processors are used to tolerate long memory latencies. By executing threads loaded in multiple hardware contexts, an otherwise idle processor can keep busy, thus increasing its utilization. However, the larger size of a multi-thread working set can have a negative effect on cache conflict misses. In this paper we evaluate the two phenomena together, examining their combined effect on execution time.

The usefulness of multiple hardware contexts depends on: program data locality, cache organization and degree of multiprocessing. Multiple hardware contexts are most effective on programs that have been optimized for data locality. For these programs, execution time dropped with increasing contexts, over widely varying architectures. With unoptimized applications, multiple contexts had limited value. The best performance was seen with only two contexts, and only on uniprocessors and small multiprocessors. The behavior of the unoptimized applications changed more noticeably with variations in cache associativity and cache hierarchy, unlike the optimized programs.

As a mechanism for exploiting program parallelism, an additional processor is clearly better than another context. However, there were many configurations for which the addition of a few hardware contexts brought as much or greater performance than a larger multiprocessor with fewer than the optimal number of contexts.

References

[1]

A. Agarwal. Limits on interconnection network performnce, iEEE Transactions on Parallel and Distributed Systms, 2(4):398-412, October 1991.

Digital Library

[2]

A. Agarwai. Performance tradeoffs in multithreaded processors. IEEE Transactions on Parallel and Distributed Systems, 3(5):525-539, September 1992.

Digital Library

[3]

A. Agarwal, B-H. Lim, D. Kranz, and J. Kubiatowicz. APRIL: A processor architecture for multiprocessmg. 17th Annual International Symposium on Computer Arc. hitecture, pages 104-114, May 1990.

Digital Library

[4]

R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The Tera computer system. International Conference on Supercomputing, pages 1{-6, June 1990.

Digital Library

[5]

B.N. Bershad, E. D, Lazowska, and H. M. Levy. PRESTO: A system for object-oriented parallel programming. Software: Practice and Experience, 18(8):713-732, August 1988.

Digital Library

[6]

B. Boothe and A. Ranade. Improved mulfithreading techniques for hiding communication latency in multiprocessors. 19th Annual International Symposium on Computer Architecture, pages 214-223, May 1992.

Digital Library

[7]

D. Chaiken, J. Kubiatowicz, and A. Agarwal. LimitLESS directories: A scalable cache coherence scheme. Architectural Support for Programming Languages and Operating Systems, pages 224-234, April 1991.

Digital Library

[8]

S.j. Eggers, D. R. Keppel, E. J. Koldinger, and H. M. Levy. Techniques for efficient inline tracing on a shared-memory mulfiprocessor. ACM SiGMETRICS Conference on Measurernent and Modeling of Computer Systems, pages 37-46, May 990.

Digital Library

[9]

K.i. Farkas and N. P. Jouppi. Complexity/performance tradeoffs with non-blocking loads. 21th Annual International Symposium on Computer Architecture, pages 211-222, April 1994.

Digital Library

[10]

M. K. Fattens and A. R. Pleszkum. Strategies for achieving processor throughput. 18th Annual International Symposium on Computer Architecture, pages 362-369, May 1991.

Digital Library

[11]

A. Gupta, J. Hennesey, K. Gharachorloo, T Mowry, and W- D. Weber. Comparative evaluation of latency reducing and tolerating techniques. 18th Annual International Symposium on Computer Architecture, pages 254-263, May 1991.

Digital Library

[12]

R. H. Halstead and T. Fujita. MASA: A mulfithreaded processor architecture for parallel symbolic computing. 15th Annual International Symposium on Computer Archi,tecture, pages 443--451, May 1988.

Digital Library

[13]

T.E. Jeremiassen and S.J. Eggers. Computing per-process summary side-effect information. 5th Workshop on Languages and Compilers for Parallel Computing, August 1992. Also appeared as LNCS #757, pages 175-19I.

Digital Library

[14]

T.E. Jeremiassen and S.J. Eggers. Static analysis of barrier synchronization in explicitly parallel programs. International Conference on Parallel Architectures and Compilation Techniques, Montreal, August 1994.

Digital Library

[15]

D. Kroft. Lockup-free instruction fetch/prefetch cache organization. 8th Annual Symposium on Computer Architecture, pages 81-87, May 1981.

Digital Library

[16]

E. P. Markatos and T. J. LeBlanc. Using processor affinity in loop scheduling on shared-memory multiprocessors. Supercompt~ing '92, pages 104-113, November 1992.

Digital Library

[17]

J. H. Mulder, N. T. Quach, and M. J Flynn. An area model for on-chip memories and its applications. IEEE Journal of Solid-State Circuits, 26(2):98-106, February 1991.

[18]

C. D. Polychronopoulos and D. J. Kuck. Guided selfscheduling' A practical scheduling scheme for parallel supercomputers. IEEE Transactions on Computers, C- 36(t2):1425-1439, December 1987.

Digital Library

[19]

R. H. Saavedra-Barrera, D. E. Culler, and T. yon Eicken. Analysis of multithreaded architectures for parallel computing. 2nd Annual ACM Symposium on Parallel Algorithms and Architectures, pages 169-178, July 1990.

Digital Library

[20]

J.P. Singh, W-D. Weber, and A. Gupta. SPLASH: Stanford parallel applications for shared-memory. Computer Architecture News, 20(1 ):5--44, March 1992.

Digital Library

[21]

B.J. Smith. Architecture and applications of the HEP multiprocessor computer system. SPIE, Real-Time Signal Processing/V, 298:241-248, 1981.

[22]

Symmetry Technical Summary. Sequent Computer Systems, Inc.

[23]

R. Thekkath and S.J. Eggers. Impact of sharing-basedthTead placement on multithreaded architectures. 21th Annual international Symposium on Computer Architecture, pages 176- 186, April 1994.

Digital Library

[24]

T.H. Tzen and L. M. Ni. Dynamic loop scheduling for sharedmemory multiprocessors. 1991 International Conference on Parallel Processing, pages 1i:246-250, August 1991.

[25]

T. Wada, S. Rajan, and S. A. Przybylski. An analytical access time model for on-chip cache memories. IEEE Journal of Solid-State Circuits, 27(8):1147-1156, August 1992.

[26]

W-D. Weber and A. Gupta. Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: Preliminary results. 16th Annual International Symposium on Computer Architecture, pages 273-280, June 1989.

Digital Library

Cited By

Zhang YWang MWang WYu Z(2023)Re-Cache: Mitigating cache contention by exploiting locality characteristics with reconfigurable memory hierarchy for GPGPUsMicroelectronics Journal10.1016/j.mejo.2023.105825138(105825)Online publication date: Aug-2023
https://doi.org/10.1016/j.mejo.2023.105825
Jog AKayiran OChidambaram Nachiappan NMishra AKandemir MMutlu OIyer RDas C(2013)OWLACM SIGPLAN Notices10.1145/2499368.245115848:4(395-406)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2499368.2451158
Jog AKayiran OChidambaram Nachiappan NMishra AKandemir MMutlu OIyer RDas C(2013)OWLACM SIGARCH Computer Architecture News10.1145/2490301.245115841:1(395-406)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2490301.2451158
Show More Cited By

Recommendations

The effectiveness of multiple hardware contexts

Multithreaded processors are used to tolerate long memory latencies. By executing threads loaded in multiple hardware contexts, an otherwise idle processor can keep busy, thus increasing its utilization. However, the larger size of a multi-thread ...
Effectiveness of hardware-based stride and sequential prefetching in shared-memory multiprocessors
HPCA '95: Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture

We study the relative efficiency of previously proposed stride and sequential prefetching-two promising hardware-based prefetching schemes to reduce read-miss penalties in shared-memory multiprocessors. Although stride accesses dominate in four out of ...
Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results
ISCA '89: Proceedings of the 16th annual international symposium on Computer architecture

A fundamental problem that any scalable multiprocessor must address is the ability to tolerate high latency memory operations. This paper explores the extent to which multiple hardware contexts per processor can help to mitigate the negative effects of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS VI: Proceedings of the sixth international conference on Architectural support for programming languages and operating systems

November 1994

341 pages

ISBN:0897916603

DOI:10.1145/195473

Chairmen:
Forest Baskett
Silicon Graphics
,
Douglas Clark
Princeton Univ.

ACM SIGOPS Operating Systems Review Volume 28, Issue 5
Dec. 1994
323 pages
ISSN:0163-5980
DOI:10.1145/381792
Chairman:
Henry M. Levy
Univ. of Washington, Seattle
Issue’s Table of Contents
ACM SIGPLAN Notices Volume 29, Issue 11
Nov. 1994
323 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/195470
Editor:
Richard L. Wexelblat
Washington D.C.
Issue’s Table of Contents

Copyright © 1994 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 1994

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ASPLOS94

Sponsor:

ASPLOS94: 6th Conference on Architectural Support of Programming Languages & Operating Systems

October 5 - 7, 1994

California, San Jose, USA

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

55
Total Citations
View Citations
729
Total Downloads

Downloads (Last 12 months)162
Downloads (Last 6 weeks)15

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang YWang MWang WYu Z(2023)Re-Cache: Mitigating cache contention by exploiting locality characteristics with reconfigurable memory hierarchy for GPGPUsMicroelectronics Journal10.1016/j.mejo.2023.105825138(105825)Online publication date: Aug-2023
https://doi.org/10.1016/j.mejo.2023.105825
Jog AKayiran OChidambaram Nachiappan NMishra AKandemir MMutlu OIyer RDas C(2013)OWLACM SIGPLAN Notices10.1145/2499368.245115848:4(395-406)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2499368.2451158
Jog AKayiran OChidambaram Nachiappan NMishra AKandemir MMutlu OIyer RDas C(2013)OWLACM SIGARCH Computer Architecture News10.1145/2490301.245115841:1(395-406)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2490301.2451158
Jog AKayiran OChidambaram Nachiappan NMishra AKandemir MMutlu OIyer RDas CSarkar VBodik R(2013)OWLProceedings of the eighteenth international conference on Architectural support for programming languages and operating systems10.1145/2451116.2451158(395-406)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2451116.2451158
Bouteiller ACappello FDongarra JGuermouche AHérault TRobert Y(2013)Multi-criteria checkpointing strategiesProceedings of the 19th international conference on Parallel Processing10.1007/978-3-642-40047-6_43(420-431)Online publication date: 26-Aug-2013
https://dl.acm.org/doi/10.1007/978-3-642-40047-6_43
Eyerman SEeckhout L(2009)Memory-level parallelism aware fetch policies for simultaneous multithreading processorsACM Transactions on Architecture and Code Optimization10.1145/1509864.15098676:1(1-33)Online publication date: 2-Apr-2009
https://dl.acm.org/doi/10.1145/1509864.1509867
Kejariwal ANicolau AVeidenbaum ABanerjee UPolychronopoulos C(2009)Efficient Scheduling of Nested Parallel Loops on Multi-Core SystemsProceedings of the 2009 International Conference on Parallel Processing10.1109/ICPP.2009.19(74-83)Online publication date: 22-Sep-2009
https://dl.acm.org/doi/10.1109/ICPP.2009.19
Ródenas DMartorell XAyguadé ELabarta JAlmási GCaşcaval CCastaños JMoreira J(2006)Exploiting multilevel parallelism using OpenMP on a massive multithreaded architectureJournal of Embedded Computing10.5555/1370998.13710072:2(141-155)Online publication date: 1-Apr-2006
https://dl.acm.org/doi/10.5555/1370998.1371007
Ip NEdwards S(2006)A processor extension for cycle-accurate real-time softwareProceedings of the 2006 international conference on Embedded and Ubiquitous Computing10.1007/11802167_46(449-458)Online publication date: 1-Aug-2006
https://dl.acm.org/doi/10.1007/11802167_46
Rodenas DMartorell XAyguade ELabarta JAlmasi GCascaval CCastanos JMoreira J(2005)Optimizing NANOS OpenMP for the IBM Cyclops Multithreaded ArchitectureProceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 0110.1109/IPDPS.2005.317Online publication date: 4-Apr-2005
https://dl.acm.org/doi/10.1109/IPDPS.2005.317
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten