More Web Proxy on the site http://driver.im/

Article

Free access

Access region locality for high-bandwidth processor memory system design

Authors:

Gyungho LeeAuthors Info & Claims

MICRO 32: Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture

Pages 136 - 146

Published: 16 November 1999 Publication History

PDF eReader Publisher Site

Abstract

This paper studies an interesting yet less explored behavior of memory access instructions, called access region locality. Unlike the traditional temporal and spatial data locality that focuses on individual memory locations and how accesses to the locations are inter-related, the access region locality concerns with each static memory instruction and its range of access locations at run time. We consider program's data, heap, and stack regions in this paper. Our experimental study using a set of SPEC95 benchmark programs shows that most memory reference instructions access a single region at run time. Also shown is that it is possible to accurately predict the access region of a memory instruction at run time by scrutinizing the addressing mode of the instruction and the past access region history of it. A simple run-time access region predictor is developed that is similar to a branch predictor in structure. We describe and evaluate a superscalar processor with two distinct sets of memory pipelines, driven by the access region predictor. Experimentalresultsindicate that the proposed mechanism is very effective in providing high memory bandwidth to the processor, resulting in comparable or better performance than a conventional memory design with a heavily multi-ported data cache that can lead to much higher hardware complexity.

References

[1]

A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools, Addison-Wesley, 1986.

Digital Library

[2]

T.M. Austin and G. S. Sohi. "Zero-Cycle Loads: Microarchitecture Support for Reducing Load Latency," Proc. of the 28th Annual Int'l Syrup. on Microarch., pp. 82- 92, Nov. 1995.

Digital Library

[3]

D. Burger and T. M. Austin. "The SimpleScalar Tool Set, Version 2.0," Computer Sciences Department Technical Report. No. 1342, Univ. of Wisconsin, June 1997.

[4]

S. Cho, P.-C. Yew, and G. Lee. "Decoupling Local Variable Accesses in a Wide-Issue Superscalar Processor," Proc. of the 26th lnt'! Syrup. on Computer Arch., pp. 100- 110, May 1999.

Digital Library

[5]

G. Chrysos and I. Emer. "Memory Dependence Prediction Using Store Sets," Proc. of the 25th Int'l Syrup. on Computer Arch., pp. 142 - 153, July 1998.

Digital Library

[6]

D. Ditzel and R. McLellan. "Register Allocation for Free: The C Machine Stack Cache," Proc. of' the Syrup. on Architectural Support for Prog. Lang. and Operating Systems, pp. 48 - 56, March 1982.

Digital Library

[7]

R. J. Eickemeyer and S. Vassiliadis. "A Load-Instruction Unit for Pipelined Processors," IBM J. of Research and Development, 1993.

[8]

EGCS Project. ht tp: / / egcs. cygnus, com.

[9]

L. Gwennap. "intel's P6 Uses Decoupled Superscalar Design," Microprocessor Report, Vol. 9, No. 2, Feb. 1995.

[10]

L. Gwennap. "Digital 21264 Sets New Standard," Microprocessor Report, Volume 10, issue 14, Oct. 1996.

[11]

D, Hunt. "Advanced Performance Features of the 64-bit PA-8000," Proc. ofthe COMPCON, pp. 123- 128, 1995.

Digital Library

[12]

M. Johnson. Superscalar Microprocessor Design, Prentice Hall, 1991.

[13]

M. H. Lipasti and J. P. Shen. "Superspeculative Microarchitecture for Beyond AD 2000," IEEE Computer, pp. 59 - 66, Sept. 1997.

Digital Library

[14]

M. H. Lipasti, C. B. Wilkerson, and J. P. Shen. "Value Locality and Load Value Prediction," Proc. of the 7th inl'l Syrup. on Architectural Support for Programming Languages and Operating Systems, pp. 138 - 147, Oct. 1996.

Digital Library

[15]

S. McFarling. "Combining Branch Predictors," WRL Technical Note TN-36, Digital Equipment Corp., June 1993.

[16]

A. Moshovos, S. E. Breach, T. N. Vijaykumar, and G. S. Sohi. "Dynamic Speculation and Synchronization of Data Dependences," Proc. ofthe 24th lnt'lSymp, on Computer Arch., pp. 181 - 193, June 1997.

Digital Library

[17]

A. Moshovos and G. S. Sohi. "Streamlining Inter-operation Memory Communication via Data Dependence Prediction," Proc. of the 30th Annual Int'l Syrup. on Microarch., pp. 235 - 245, Dec. 1997.

Digital Library

[18]

S. Parlacharla, N. P. Jouppi, and J. E. Smith. "Complexity-Effective Superscalar Processors," Proc. of the 24th lnt'l Symp. on Computer Arch., pp. 206-218, June 1997.

Digital Library

[19]

Y. N. Part, S. J. Patel, D. H. Friendly, and J. Stark. "One Billion Transistors, One Uniprocessor, One Chip," IEEE Computer, pp. 51 - 57, Sept. 1997.

Digital Library

[20]

I. A. Rivers, G. S. Tyson, E. S. Davidson, and T. M. Austin. "On High-Bandwidth Data Cache Design for Multi-lssue Processors," Proc. of the 30th Annual Int'! Syrup. on Microarch., pp. 46- 56, Dec. t997.

Digital Library

[21]

E. Rotenberg, S. Bennet, and I. E. Smith. "Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching," Proc. of the 29th Annual lnt'l Syrup. on Microarch., pp. 24- 34, Dec. 1996.

Digital Library

[22]

E. Rotenberg, Q. jacobson, and J. E. Smith. "Trace Processors," Proc. of the 30th Annual Int'l Symp, on Microarch., pp. 138 - 148, Dec. 1997.

Digital Library

[23]

Y. Sazeides and I. E. Smith. "The Predictability of Data Values," Proc. of the 30th Annual Int'l Syrup. on Microarch., pp. 248 - 258, Dec. 1997.

Digital Library

[24]

A.J. Smith. "Cache Memories," Computing Surveys 14:3, pp. 473 - 530, Sept. 1982.

Digital Library

[25]

G.S. Sohi. "Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers," IEEE Trans. on Computers, 39(3):349- 359, March 1990.

Digital Library

[26]

G. S. Sohi and M. Franklin. "High-Bandwidth Data Memory Systems for Superscalar Processors," Proc. of the Fourth lnt'i Cottf on Architectural Support for Programming Languages and Operating Systems, pp. 53-62, April 1991.

Digital Library

[27]

The Standard Performance Evaluation Corporation, ht tp: //www, specbench, org.

[28]

G. Tyson and T. M. Austin. "Improving the Accuracy and Performance of Memory Communication Through Renaming," Proc. of the 30th Annual lnt'l Syrup. on Microarch., pp. 218 - 227, Dec. 1997.

Digital Library

[29]

K. M. Wilson, K. Olukotun, and M. Rosenblum. "Increasing Cache Port Efficiency for Dynamic Superscalar Microprocessors," Proc. of the 23th Int'l Syrup. on Computer Arch., pp. 147 - 157, May 1996.

Digital Library

[30]

K. C. Yeager. "The MIPS R10000 Superscalar Microprocessor," IEEE Micro, Volume 16, Number 2, pp. 28- 40, April 1996.

Digital Library

[31]

T-Y. Yeh, D. T. Mart, and Y. N. Patt. "Increasing the Instruction Fetch Rate via Multiple Branch Prediction and a Branch Address Cache," Proc. ~' the 7th lnt'l Conf. on Supercomputing, pp. 67 - 76, July 1993.

Digital Library

Cited By

Kang SNicopoulos CGavrilovska AKim J(2015)Subtleties of Run-Time VirtualAddress StacksIEEE Computer Architecture Letters10.1109/LCA.2014.233729914:2(152-155)Online publication date: 1-Jul-2015
https://dl.acm.org/doi/10.1109/LCA.2014.2337299
Le GShi Y(2009)Access region cache with register guided memory reference partitioningJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2009.09.00255:10-12(434-445)Online publication date: 1-Oct-2009
https://dl.acm.org/doi/10.1016/j.sysarc.2009.09.002
Shi YLee G(2005)Dynamic partition of memory reference instructions – a register guided approachProceedings of the 11th international Euro-Par conference on Parallel Processing10.1007/11549468_58(508-518)Online publication date: 30-Aug-2005
https://dl.acm.org/doi/10.1007/11549468_58
Show More Cited By

Index Terms

Access region locality for high-bandwidth processor memory system design
1. Hardware

Recommendations

High Bandwidth On-Chip Cache Design

In this paper, we evaluate the performance of high bandwidth cache organizations employing multiple cache ports, multiple cycle hit times, and cache port efficiency enhancements, such as load all and line buffer, to find the organization that provides ...
A High-Bandwidth Memory Pipeline for Wide Issue Processors

Providing adequate data bandwidth is extremely important for a future wide-issue processor to achieve its full performance potential. Adding a large number of ports to a data cache, however, becomes increasingly inefficient and can add to the hardware ...
Access region cache with register guided memory reference partitioning

Wide-issue and high-frequency processors require not only a low-latency but also high-bandwidth memory system to achieve high performance. Previous studies have shown that using multiple small single-ported caches instead of a monolithic large multi-...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO 32: Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture

November 1999

299 pages

ISBN:076950437X

Chairmen:
Ronny Ronen
Intel Israel
,
Matthew Farrens
Univ. of California, Davis
,
Ilan Spillinger
Intel Israel

Copyright © Copyright (c) 1998 Institute of Electrical and Electronics Engineers, Inc. All rights reserved.

Sponsors

IEEE TC - MICRO: IEEE TC - MICRO
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

IEEE Computer Society

United States

Publication History

Published: 16 November 1999

Check for updates

Qualifiers

Article

Conference

MICRO99

Sponsor:

IEEE TC - MICRO
SIGMICRO

MICRO99: 32nd Annual ACM/IEEE International Symposium on Microarchitecture

November 16 - 18, 1999

Haifa, Israel

Acceptance Rates

MICRO 32 Paper Acceptance Rate 27 of 131 submissions, 21%;

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
301
Total Downloads

Downloads (Last 12 months)57
Downloads (Last 6 weeks)13

Reflects downloads up to 20 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kang SNicopoulos CGavrilovska AKim J(2015)Subtleties of Run-Time VirtualAddress StacksIEEE Computer Architecture Letters10.1109/LCA.2014.233729914:2(152-155)Online publication date: 1-Jul-2015
https://dl.acm.org/doi/10.1109/LCA.2014.2337299
Le GShi Y(2009)Access region cache with register guided memory reference partitioningJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2009.09.00255:10-12(434-445)Online publication date: 1-Oct-2009
https://dl.acm.org/doi/10.1016/j.sysarc.2009.09.002
Shi YLee G(2005)Dynamic partition of memory reference instructions – a register guided approachProceedings of the 11th international Euro-Par conference on Parallel Processing10.1007/11549468_58(508-518)Online publication date: 30-Aug-2005
https://dl.acm.org/doi/10.1007/11549468_58
Balasubramonian RDwarkadas SAlbonesi D(2003)Dynamically managing the communication-parallelism trade-off in future clustered processorsACM SIGARCH Computer Architecture News10.1145/871656.85965031:2(275-287)Online publication date: 1-May-2003
https://dl.acm.org/doi/10.1145/871656.859650
Balasubramonian RDwarkadas SAlbonesi DGottlieb ALi K(2003)Dynamically managing the communication-parallelism trade-off in future clustered processorsProceedings of the 30th annual international symposium on Computer architecture10.1145/859618.859650(275-287)Online publication date: 9-Jun-2003
https://dl.acm.org/doi/10.1145/859618.859650
Racunas PPatt YBanerjee UGallivan KGonzalez A(2003)Partitioned first-level cache design for clustered microarchitecturesProceedings of the 17th annual international conference on Supercomputing10.1145/782814.782820(22-31)Online publication date: 23-Jun-2003
https://dl.acm.org/doi/10.1145/782814.782820
Cho SYew PLee G(2001)A High-Bandwidth Memory Pipeline for Wide Issue ProcessorsIEEE Transactions on Computers10.1109/12.93623750:7(709-723)Online publication date: 1-Jul-2001
https://dl.acm.org/doi/10.1109/12.936237

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents