[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Data prefetching by dependence graph precomputation

Published: 01 May 2001 Publication History

Abstract

Data cache misses reduce the performance of wide-issue processors by stalling the data supply to the processor. Prefetching data by predicting the miss address is one way to tolerate the cache miss latencies. But current applications with irregular access patterns make it difficult to accurately predict the address sufficiently early to mask large cache miss latencies. This paper explores an alternative to predicting prefetch addresses, namely precomputing them. The Dependence Graph Precomputation scheme (DGP) introduced in this paper is a novel approach for dynamically identifying and precomputing the instructions that determine the addresses accessed by those load/store instructions marked as being responsible for most data cache misses. DGP's dependence graph generator efficiently generates the required dependence graphs at run time. A separate precomputation engine executes these graphs to generate the data addresses of the marked load/store instructions early enough for accurate prefetching. Our results show that 94% of the prefetches issued by DGP are useful, reducing the D-cache miss stall time by 47%. Thus DGP takes us about half way from an already highly tuned baseline system toward perfect D-cache performance. DGP improves the overall performance of a wide range of applications by 7% over tagged next line prefetching, by 13% over a baseline processor with no prefetching, and is within 15% of the perfect D-cache performance.

References

[1]
M. Annavaram, G. Tyson, and E. Davidson. Instruction Overhead and Data Locality Effects in Superscalar Processors. In Proceedings of the International Symposium on Performance Analysis of Systems and Software, pages 95-100, April 2000.
[2]
D. Bitton, D. DeWitt, and C. Turbyfill. Benchmarking Database Systems A Systematic Approach. In 9th International Conference on Very Large Data Bases, pages 8-19, October 1983.
[3]
D. Burger and T. Austin. The SimpleScalar Tool Set. Technical report, University of Wisconsin-Madison, Computer ScienceDepartment Technical Report #1342, June 1997.
[4]
M. Carey, D. DeWitt, M. Franklin, N. Hall, M. McAuliffe, J. Naughton, D. Schuh, M. Solomon, C. Tan, O. Tsatalos, S. White, and M. Zwilling. Shoring Up Persistent Applications. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, pages 383-394, May 1994.
[5]
R. Chappell, J. Stark, S. Kim, S. Reinhardt, and Y. Patt. Simultaneous Subordinate Microthreading (SSMT). In Proceedings of the 26th International Symposium on Computer Architecture, pages 186-195, June 1999.
[6]
J. D. Collins, H. Wang, D. M. Tullsen, H. J. Christopher, Y. Lee, D. Lavery, and J. Shen. Speculative Precomputation: Long-range Prefetching of Delinquent Loads. In Proceedings of the 28th Annual International Symposium on Computer Architecture, page ??, July 2001.
[7]
T. P. P. Council. TPC Benchmark H Standard Specification (Decision Support). In Revision 1.1.0, June 1999.
[8]
A. Farcy, O. Temam, R. Espasa, and T. Juan. Dataflow Analysis of Branch Mispredictions and Its Applications to Early Resolution of Branch Outcomes. In Proceedings of the 31st International Symposium on Microarchitecture, pages 59- 68, Dec 1998.
[9]
Y. Patt, S. Patel, M. Evers, D. Friendly, and J. Stark. One Billion Transistors, One Uniprocessor, One Chip. In IEEE COMPUTER, volume 30(9), pages 51-57, Sept. 1997.
[10]
A. Roth and G. Sohi. Speculative Data-Driven Multithreading. In Proceedings of the High Performance Computer Architecture, pages 37-48, Jan 2001.
[11]
C. Selvidge. Compilation-Based Prefetching for Memory Latency Tolerance. PhD thesis, MIT, May 1992.
[12]
A. Srivastava and D. Wall. A Practical System for Intermodule Code Optimization at Link-Time. Technical Report Technical Report 92/6, Digital Western Research Laboratory, June 1992.
[13]
M. Weiser. Program Slicing. IEEE Transactions on Software Engineering, 11(4):352-357, 1984.
[14]
C. Zilles and G. Sohi. Understanding the Backward Slices of Performance Degrading Instructions. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 172-181, June 2000.

Cited By

View all
  • (2024)A dependence graph pattern mining method for processor performance analysisPerformance Evaluation10.1016/j.peva.2024.102409164(102409)Online publication date: May-2024
  • (2021)Vector runaheadProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00024(195-208)Online publication date: 14-Jun-2021
  • (2019)Stream-based memory access specialization for general purpose processorsProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322229(736-749)Online publication date: 22-Jun-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 29, Issue 2
Special Issue: Proceedings of the 28th annual international symposium on Computer architecture (ISCA '01)
May 2001
262 pages
ISSN:0163-5964
DOI:10.1145/384285
Issue’s Table of Contents
  • cover image ACM Conferences
    ISCA '01: Proceedings of the 28th annual international symposium on Computer architecture
    June 2001
    289 pages
    ISBN:0769511627
    DOI:10.1145/379240

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2001
Published in SIGARCH Volume 29, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)2
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A dependence graph pattern mining method for processor performance analysisPerformance Evaluation10.1016/j.peva.2024.102409164(102409)Online publication date: May-2024
  • (2021)Vector runaheadProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00024(195-208)Online publication date: 14-Jun-2021
  • (2019)Stream-based memory access specialization for general purpose processorsProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322229(736-749)Online publication date: 22-Jun-2019
  • (2017)Energy-efficient data prefetch buffering for low-end embedded processorsMicroelectronics Journal10.1016/j.mejo.2017.01.01462(57-64)Online publication date: Apr-2017
  • (2016)Continuous runaheadThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195712(1-12)Online publication date: 15-Oct-2016
  • (2015)Accelerating asynchronous programs through event sneak peekACM SIGARCH Computer Architecture News10.1145/2872887.275037343:3S(642-654)Online publication date: 13-Jun-2015
  • (2015)Accelerating asynchronous programs through event sneak peekProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750373(642-654)Online publication date: 13-Jun-2015
  • (2015)Software-Controlled Instruction Prefetch Buffering for Low-End ProcessorsJournal of Circuits, Systems and Computers10.1142/S021812661550161324:10(1550161)Online publication date: Dec-2015
  • (2010)Cashing in on hints for better prefetching and caching in PVFS and MPI-IOProceedings of the 19th ACM International Symposium on High Performance Distributed Computing10.1145/1851476.1851499(191-202)Online publication date: 21-Jun-2010
  • (2009)Taxonomy of Data Prefetching for Multicore ProcessorsJournal of Computer Science and Technology10.1007/s11390-009-9233-424:3(405-417)Online publication date: 26-May-2009
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media