[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Tolerating Late Memory Traps in Dynamically Scheduled Processors

Published: 01 June 2004 Publication History

Abstract

Abstract--In the past few years, exception support for memory functions such as virtual memory, informing memory operations, software assist for shared memory protocols, or interactions with processors in memory has been advocated in various research papers. These memory traps may occur on a miss in the cache hierarchy or on a local or remote memory access. However, contemporary, dynamically scheduled processors only support memory exceptions detected in the TLB associated with the first-level cache. They do not support memory exceptions taken deep in the memory hierarchy. In this case, memory traps may be late, in the sense that the exception condition may still be undecided when a long-latency memory instruction reaches the retirement stage. In this paper we evaluate through simulation the overhead of memory traps in dynamically scheduled processors, focusing on the added overhead incurred when a memory trap is late. We also propose some simple mechanisms to reduce this added overhead while preserving the memory consistency model. With more aggressive memory access mechanisms in the processor we observe that the overhead of all memory traps--either early or late--is increased while the lateness of a trap becomes largely tolerated so that the performance gap between early and late memory traps is greatly reduced. Additionally, because of caching effects in the memory hierarchy, the frequency of memory traps usually decreases as they are taken deeper in the memory hierarchy and their overall impact on execution times becomes negligible. We conclude that support for memory traps taken throughout the memory hierarchy could be added to dynamically scheduled processors at low hardware cost and little performance degradation.

References

[1]
A. Appel and K. Li, “Virtual Memory Primitives for User Programs,” Proc. Fourth Conf. Architecture Support for Programming Languages and Operating Systems (ASPLOS), pp. 96-107, 1991.
[2]
D. Callahan K. Kennedy and A. Porterfield, “Software Prefetching,” Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 40-52, Apr. 1991.
[3]
M. Cekleov and M. Dubois, “Virtual-Address Caches, Part 1: Problems and Solutions in Uniprocessors,” IEEE Micro, pp. 64-71, Sept./Oct. 1997.
[4]
M. Cekleov and M. Dubois, “Virtual-Address Caches, Part 2: Multiprocessor Issues,” IEEE Micro, pp. 69-74, Nov./Dec. 1997.
[5]
D. Chaiken and A. Agarwal, “Software-Extended Coherent Shared Memory: Performance and Cost,” Proc. 21st Ann. Int'l Symp. Computer Architecture (ISCA), pp. 314-324, 1994.
[6]
R. Chappel J. Stark S. Kim and Y. Patt, “Simultaneous Subordinate Microthreading (SSMT),” Proc. 26th Ann. Int'l Symp. Computer Architecture (ISCA), May 1999.
[7]
D. Cheriton G. Slavenburg and P. Boyle, “Software-Controlled Caches in the VMP Multiprocessor,” Proc. 13th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 366-375, 1986.
[8]
M. Dubois C. Scheurich and F. Briggs, “Memory Access Buffering in Multiprocessors,” Proc. 13th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 320-328, 1986.
[9]
K. Gharachorloo A. Gupta and J. Hennessy, “Performance Evaluation of Memory Consistency Models for Shared-Memory Multiprocessors,” Proc. Fourth Conf. Architecture Support for Programming Languages and Operating Systems (ASPLOS), pp. 245-257, 1991.
[10]
K. Gharachorloo A. Gupta and J. Hennessy, “Two Techniques to Enhance the Performance of Memory Consistency Models,” Proc. Int'l Conf. Parallel Processing, pp. I355-I364, 1991.
[11]
C. Gniady B. Falsafi and T.N. Vijaykumar, “Is SC + ILP = RC?,” Proc. 26th Ann. Int'l Symp. Computer Architecture, pp. 162-171, May 1999.
[12]
H. Grahn and P. Stenstrom, “Efficient Strategies for Software-Only Directory Protocols in Shared-Memory Multiprocessors,” Proc. 22nd Ann. Int'l Symp. Computer Architecture (ISCA), pp. 38-47, 1995.
[13]
E. Hagersten and M. Koster, “WildFire: A Scalable Path for SMPs,” Proc. Fifth Int'l Symp. High Performance Computer Architecture (HPCA), Jan. 1999.
[14]
M. Hill J. Larus S. Reinhardt and D. Wood, “Cooperative Shared Memory: Software and Hardware for Scalable Multiprocessors,” ACM Trans. Computer Systems, vol. 11, no. 4, pp. 300-318, Nov. 1993.
[15]
M. Horowitz M. Martonosi T. Mowry and M. Smith, “Informing Memory Operations: Providing Memory Performance Feedback in Modern Processors,” Proc. 23rd Ann. Int'l Symp. Computer Architecture (ISCA), pp. 260-270, 1996.
[16]
B. Jacob and T. Mudge, “Software-Managed Address Translation,” Proc. Third Int'l Symp. High Performance Computer Architecture (HPCA), Feb. 1997.
[17]
B. Jacob and T. Mudge, “A Look at Several Memory Management Units, TLB-Refill Mechanisms, and Page Table Organizations,” Proc. Eighth Conf. Architecture Support for Programming Languages and Operating Systems (ASPLOS), 1998.
[18]
K. Li and P. Hudak, “Memory Coherence in Shared Virtual Memory Systems,” ACM Trans. Computer Systems, vol. 7, no. 4, pp. 321-359, Nov. 1989.
[19]
A. Moga A. Gefflaut and M. Dubois, “Hardware vs. Software Implementation of COMA,” Proc. 1997 Int'l Conf. Parallel Processing, pp. 248-256, Aug. 1997.
[20]
D. Nagle R. Uhlig T. Stanley S. Sechrest T. Mudge and R. Brown, “Design Tradeoffs for Software-Managed TLBs,” Proc. 20th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 27-38, 1993.
[21]
V. Pai P. Ranganathan and S. Adve, “RSIM Reference Manual,” Technical Report 9705, Dept. of Electrical and Computer Eng., Rice Univ., Aug. 1997.
[22]
V. Pai P. Ranganathan S. Adve and T. Harton, “An Evaluation of Memory Consistency Models for Shared-Memory Systems with ILP Processors,” Proc. Seventh Conf. Architecture Support for Programming Languages and Operating Systems (ASPLOS), pp. 12-23, Oct. 1996.
[23]
X. Qiu and M. Dubois, “Options for Dynamic Address Translation for COMAs,” Proc. 25th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 214-225, 1998.
[24]
X. Qiu and M. Dubois, “Tolerating Late Memory Traps for ILP Processors,” Proc. 26th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 76-87, 1999.
[25]
X. Qiu and M. Dubois, “Towards Virtually-Addressed Memory Hierarchies,” Proc. Seventh Int'l Symp. High Performance Computer Architecture (HPCA), pp. 51-62, Jan. 2001.
[26]
P. Ranganathan V. Pai and S. Adve, “Using Speculative Retirement and Larger Instruction Windows to Narrow the Performance Gap between Memory Consistency Models,” Proc. Ninth ACM Symp. Parallel Algorithms and Architectures (SPAA), June 1997.
[27]
S. Reinhardt J. Larus and D. Wood, “Tempest and Typhoon: User-Level Shared Memory,” Proc. 21st Ann. Int'l Symp. Computer Architecture (ISCA), pp. 325-336, 1994.
[28]
I. Schoinas B. Falsafi A. Lebeck S. Reinhardt J. Larus and D. Wood, “Fine-Grain Access Control for Distributed Shared Memory,” Proc. Sixth Conf. Architecture Support for Programming Languages and Operating Systems (ASPLOS), pp. 297-306, 1994.
[29]
J. Smith and A. Pleszkun, “Implementation of Precise Interrupt in Pipelined Processors,” Proc. 12th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 36-44, 1985.
[30]
J.E. Smith and G.S. Sohi, “The Microarchitecture of Superscalar Processors,” Proc. IEEE, vol. 83, pp. 1609-1624, Dec. 1995.
[31]
Y.H. Song and M. Dubois, “Assisted Execution,” Technical Report #CENG 98-25, Dept. of EE-Systems, Univ. of Southern California, Oct. 1998.
[32]
M. Swanson L. Stoller and J. Carter, “Increasing TLB Reach Using Superpages Backed by Shadow Memory,” Proc. 25th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 204-213, 1998.
[33]
M. Talluri S. Kong M.D. Hill and D.A. Patterson, “Trade-Offs in Supporting Two Page Sizes,” Proc. 19th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 415-424, May 1992.
[34]
P. Teller and A. Gottlieb, “Locating Multiprocessor TLBs at Memory,“ Proc. 27th Ann. Hawaii Int'l Conf. System Science, pp. 554-563, 1994.
[35]
D. Weaver and T. Germond, The SPARC Architecture Manual, version 9. Prentice Hall, 1994.
[36]
S.C. Woo M. Ohara and E. Torrie, “The SPLASH-2 Programs: Characterization and Methodological Considerations,” Proc. 22nd Ann. Int'l Symp. Computer Architecture (ISCA), pp. 24-36, 1995.
[37]
K.C. Yeager, “The MIPS R10000 Superscalar Microprocessor,” IEEE Micro, pp. 28-40, Apr. 1996.
[38]
D. Yeung J. Kubiatowicz and A. Agarwal, “MGS: A Multigrain Shared Memory System,” Proc. 23rd Ann. Int'l Symp. Computer Architecture (ISCA), pp. 44-55, 1996.
[39]
C. Zilles J. Emer and G. Sohi, “The Use of Multithreading for Exception Handling,” Proc. 32nd Ann. Int'l Symp. Microarchitecture (Micro-32), 1999.

Cited By

View all
  • (2006)A low-cost memory remapping scheme for address bus protectionProceedings of the 15th international conference on Parallel architectures and compilation techniques10.1145/1152154.1152169(74-83)Online publication date: 16-Sep-2006
  • (2004)TCP Onloading for Data Center ServersComputer10.1109/MC.2004.22337:11(48-58)Online publication date: 1-Nov-2004

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers
IEEE Transactions on Computers  Volume 53, Issue 6
June 2004
144 pages

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 June 2004

Author Tags

  1. 65
  2. Microarchitecture
  3. exception
  4. instruction-level parallelism
  5. memory consistency model
  6. memory system
  7. prefetching.
  8. simulations
  9. trap

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2006)A low-cost memory remapping scheme for address bus protectionProceedings of the 15th international conference on Parallel architectures and compilation techniques10.1145/1152154.1152169(74-83)Online publication date: 16-Sep-2006
  • (2004)TCP Onloading for Data Center ServersComputer10.1109/MC.2004.22337:11(48-58)Online publication date: 1-Nov-2004

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media