[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/320080.320101acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article
Free access

Access region locality for high-bandwidth processor memory system design

Published: 16 November 1999 Publication History

Abstract

This paper studies an interesting yet less explored behavior of memory access instructions, called access region locality. Unlike the traditional temporal and spatial data locality that focuses on individual memory locations and how accesses to the locations are inter-related, the access region locality concerns with each static memory instruction and its range of access locations at run time. We consider program's data, heap, and stack regions in this paper. Our experimental study using a set of SPEC95 benchmark programs shows that most memory reference instructions access a single region at run time. Also shown is that it is possible to accurately predict the access region of a memory instruction at run time by scrutinizing the addressing mode of the instruction and the past access region history of it. A simple run-time access region predictor is developed that is similar to a branch predictor in structure. We describe and evaluate a superscalar processor with two distinct sets of memory pipelines, driven by the access region predictor. Experimentalresultsindicate that the proposed mechanism is very effective in providing high memory bandwidth to the processor, resulting in comparable or better performance than a conventional memory design with a heavily multi-ported data cache that can lead to much higher hardware complexity.

References

[1]
A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools, Addison-Wesley, 1986.
[2]
T.M. Austin and G. S. Sohi. "Zero-Cycle Loads: Microarchitecture Support for Reducing Load Latency," Proc. of the 28th Annual Int'l Syrup. on Microarch., pp. 82- 92, Nov. 1995.
[3]
D. Burger and T. M. Austin. "The SimpleScalar Tool Set, Version 2.0," Computer Sciences Department Technical Report. No. 1342, Univ. of Wisconsin, June 1997.
[4]
S. Cho, P.-C. Yew, and G. Lee. "Decoupling Local Variable Accesses in a Wide-Issue Superscalar Processor," Proc. of the 26th lnt'! Syrup. on Computer Arch., pp. 100- 110, May 1999.
[5]
G. Chrysos and I. Emer. "Memory Dependence Prediction Using Store Sets," Proc. of the 25th Int'l Syrup. on Computer Arch., pp. 142 - 153, July 1998.
[6]
D. Ditzel and R. McLellan. "Register Allocation for Free: The C Machine Stack Cache," Proc. of' the Syrup. on Architectural Support for Prog. Lang. and Operating Systems, pp. 48 - 56, March 1982.
[7]
R. J. Eickemeyer and S. Vassiliadis. "A Load-Instruction Unit for Pipelined Processors," IBM J. of Research and Development, 1993.
[8]
EGCS Project. ht tp: / / egcs. cygnus, com.
[9]
L. Gwennap. "intel's P6 Uses Decoupled Superscalar Design," Microprocessor Report, Vol. 9, No. 2, Feb. 1995.
[10]
L. Gwennap. "Digital 21264 Sets New Standard," Microprocessor Report, Volume 10, issue 14, Oct. 1996.
[11]
D, Hunt. "Advanced Performance Features of the 64-bit PA-8000," Proc. ofthe COMPCON, pp. 123- 128, 1995.
[12]
M. Johnson. Superscalar Microprocessor Design, Prentice Hall, 1991.
[13]
M. H. Lipasti and J. P. Shen. "Superspeculative Microarchitecture for Beyond AD 2000," IEEE Computer, pp. 59 - 66, Sept. 1997.
[14]
M. H. Lipasti, C. B. Wilkerson, and J. P. Shen. "Value Locality and Load Value Prediction," Proc. of the 7th inl'l Syrup. on Architectural Support for Programming Languages and Operating Systems, pp. 138 - 147, Oct. 1996.
[15]
S. McFarling. "Combining Branch Predictors," WRL Technical Note TN-36, Digital Equipment Corp., June 1993.
[16]
A. Moshovos, S. E. Breach, T. N. Vijaykumar, and G. S. Sohi. "Dynamic Speculation and Synchronization of Data Dependences," Proc. ofthe 24th lnt'lSymp, on Computer Arch., pp. 181 - 193, June 1997.
[17]
A. Moshovos and G. S. Sohi. "Streamlining Inter-operation Memory Communication via Data Dependence Prediction," Proc. of the 30th Annual Int'l Syrup. on Microarch., pp. 235 - 245, Dec. 1997.
[18]
S. Parlacharla, N. P. Jouppi, and J. E. Smith. "Complexity-Effective Superscalar Processors," Proc. of the 24th lnt'l Symp. on Computer Arch., pp. 206-218, June 1997.
[19]
Y. N. Part, S. J. Patel, D. H. Friendly, and J. Stark. "One Billion Transistors, One Uniprocessor, One Chip," IEEE Computer, pp. 51 - 57, Sept. 1997.
[20]
I. A. Rivers, G. S. Tyson, E. S. Davidson, and T. M. Austin. "On High-Bandwidth Data Cache Design for Multi-lssue Processors," Proc. of the 30th Annual Int'! Syrup. on Microarch., pp. 46- 56, Dec. t997.
[21]
E. Rotenberg, S. Bennet, and I. E. Smith. "Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching," Proc. of the 29th Annual lnt'l Syrup. on Microarch., pp. 24- 34, Dec. 1996.
[22]
E. Rotenberg, Q. jacobson, and J. E. Smith. "Trace Processors," Proc. of the 30th Annual Int'l Symp, on Microarch., pp. 138 - 148, Dec. 1997.
[23]
Y. Sazeides and I. E. Smith. "The Predictability of Data Values," Proc. of the 30th Annual Int'l Syrup. on Microarch., pp. 248 - 258, Dec. 1997.
[24]
A.J. Smith. "Cache Memories," Computing Surveys 14:3, pp. 473 - 530, Sept. 1982.
[25]
G.S. Sohi. "Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers," IEEE Trans. on Computers, 39(3):349- 359, March 1990.
[26]
G. S. Sohi and M. Franklin. "High-Bandwidth Data Memory Systems for Superscalar Processors," Proc. of the Fourth lnt'i Cottf on Architectural Support for Programming Languages and Operating Systems, pp. 53-62, April 1991.
[27]
The Standard Performance Evaluation Corporation, ht tp: //www, specbench, org.
[28]
G. Tyson and T. M. Austin. "Improving the Accuracy and Performance of Memory Communication Through Renaming," Proc. of the 30th Annual lnt'l Syrup. on Microarch., pp. 218 - 227, Dec. 1997.
[29]
K. M. Wilson, K. Olukotun, and M. Rosenblum. "Increasing Cache Port Efficiency for Dynamic Superscalar Microprocessors," Proc. of the 23th Int'l Syrup. on Computer Arch., pp. 147 - 157, May 1996.
[30]
K. C. Yeager. "The MIPS R10000 Superscalar Microprocessor," IEEE Micro, Volume 16, Number 2, pp. 28- 40, April 1996.
[31]
T-Y. Yeh, D. T. Mart, and Y. N. Patt. "Increasing the Instruction Fetch Rate via Multiple Branch Prediction and a Branch Address Cache," Proc. ~' the 7th lnt'l Conf. on Supercomputing, pp. 67 - 76, July 1993.

Cited By

View all
  • (2015)Subtleties of Run-Time VirtualAddress StacksIEEE Computer Architecture Letters10.1109/LCA.2014.233729914:2(152-155)Online publication date: 1-Jul-2015
  • (2009)Access region cache with register guided memory reference partitioningJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2009.09.00255:10-12(434-445)Online publication date: 1-Oct-2009
  • (2005)Dynamic partition of memory reference instructions – a register guided approachProceedings of the 11th international Euro-Par conference on Parallel Processing10.1007/11549468_58(508-518)Online publication date: 30-Aug-2005
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO 32: Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
November 1999
299 pages
ISBN:076950437X

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 16 November 1999

Check for updates

Qualifiers

  • Article

Conference

MICRO99
Sponsor:

Acceptance Rates

MICRO 32 Paper Acceptance Rate 27 of 131 submissions, 21%;
Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)57
  • Downloads (Last 6 weeks)13
Reflects downloads up to 20 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2015)Subtleties of Run-Time VirtualAddress StacksIEEE Computer Architecture Letters10.1109/LCA.2014.233729914:2(152-155)Online publication date: 1-Jul-2015
  • (2009)Access region cache with register guided memory reference partitioningJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2009.09.00255:10-12(434-445)Online publication date: 1-Oct-2009
  • (2005)Dynamic partition of memory reference instructions – a register guided approachProceedings of the 11th international Euro-Par conference on Parallel Processing10.1007/11549468_58(508-518)Online publication date: 30-Aug-2005
  • (2003)Dynamically managing the communication-parallelism trade-off in future clustered processorsACM SIGARCH Computer Architecture News10.1145/871656.85965031:2(275-287)Online publication date: 1-May-2003
  • (2003)Dynamically managing the communication-parallelism trade-off in future clustered processorsProceedings of the 30th annual international symposium on Computer architecture10.1145/859618.859650(275-287)Online publication date: 9-Jun-2003
  • (2003)Partitioned first-level cache design for clustered microarchitecturesProceedings of the 17th annual international conference on Supercomputing10.1145/782814.782820(22-31)Online publication date: 23-Jun-2003
  • (2001)A High-Bandwidth Memory Pipeline for Wide Issue ProcessorsIEEE Transactions on Computers10.1109/12.93623750:7(709-723)Online publication date: 1-Jul-2001

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media