[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Spatio-temporal memory streaming

Published: 20 June 2009 Publication History

Abstract

Recent research advocates memory streaming techniques to alleviate the performance bottleneck caused by the high latencies of off-chip memory accesses. Temporal memory streaming replays previously observed miss sequences to eliminate long chains of dependent misses. Spatial memory streaming predicts repetitive data layout patterns within fixed-size memory regions. Because each technique targets a different subset of misses, their effectiveness varies across workloads and each leaves a significant fraction of misses unpredicted.
In this paper, we propose Spatio-Temporal Memory Streaming (STeMS) to exploit the synergy between spatial and temporal streaming. We observe that the order of spatial accesses repeats both within and across regions. STeMS records and replays the temporal sequence of region accesses and uses spatial relationships within each region to dynamically reconstruct a predicted total miss order. Using trace-driven and cycle-accurate simulation across a suite of commercial workloads, we demonstrate that with similar implementation complexity as temporal streaming, STeMS achieves equal or higher coverage than spatial or temporal memory streaming alone, and improves performance by 31%, 3%, and 18% over stride, spatial, and temporal prediction, respectively.

References

[1]
Anastassia Ailamaki, David J. DeWitt, Mark D. Hill, and David A. Wood. DBMSs on a modern processor: Where does time go? In The VLDB Journal, pages 266--277, Sep. 1999.
[2]
Ioana Burcea, Stephen Somogyi, Andreas Moshovos, and Babak Falsafi. Predictor virtualization. In Proceedings of the 13th Conference on Architectural Support for Programming Languages and Operations Systems, Mar. 2008.
[3]
Jason F. Cantin, Mikko H. Lipasti, and James E. Smith. Stealth prefetching. In Proceedings of the 12th Conference on Architectural Support for Programming Languages and Operations Systems, Oct. 2006.
[4]
Chi F. Chen, Se-Hyun Yang, Babak Falsafi, and Andreas Moshovos. Accurate and complexity-effective spatial pattern prediction. In Proceedings of the 10th Symposium on High-Performance Computer Architecture, Feb. 2004.
[5]
Trishul M. Chilimbi and Martin Hirzel. Dynamic hot data stream prefetching for general-purpose programs. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI), June 2002.
[6]
Yuan Chou. Low-cost epoch-based correlation prefetching for commercial applications. In Proceedings of the 40th International Symposium on Microarchitecture, Dec. 2007.
[7]
Pedro Diaz and Marcelo Cintra. Stream chaining: Exploiting multiple levels of correlation in data prefetching. In Proceedings of the 36th International Symposium on Computer Architecture, June 2009.
[8]
Michael Ferdman, Thomas F. Wenisch, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos. Temporal instruction fetch streaming. In Proceedings of the 41st International Symposium on Microarchitecture, Nov. 2008.
[9]
Craig G. Nevill-Manning and Ian H. Witten. Identifying hierarchical structure in sequences: A linear-time algorithm. Journal of Artificial Intelligence Research, 7, 1997.
[10]
Richard Hankins, Trung Diep, Murali Annavaram, Brian Hirano, Harald Eri, Hubert Nueckel, and John P. Shen. Scaling and characterizing database workloads: Bridging the gap between research and practice. In Proceedings of the 36th International Symposium on Microarchitecture, Dec. 2003.
[11]
Nikos Hardavellas, Ippokratis Pandis, Ryan Johnson, Naju G. Mancheril, Anastassia Ailamaki, and Babak Falsafi. Database servers on chip multiprocessors: Limitations and opportunities. In Proceedings of the 3rd Conference on Innovative Data Systems Research, Jan. 2007.
[12]
Ibrahim Hur and Calvin Lin. Memory prefetching using adaptive stream detection. In Proceedings of the 39th International Symposium on Microarchitecture, Dec. 2006.
[13]
Doug Joseph and Dirk Grunwald. Prefetching using Markov Predictors. In Proceedings of the 24th International Symposium on Computer Architecture, June 1997.
[14]
Norman P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proceedings of the 17th International Symposium on Computer Architecture, May 1990.
[15]
Sanjeev Kumar and Christopher Wilkerson. Exploiting spatial locality in data caches using spatial footprints. In Proceedings of the 25th International Symposium on Computer Architecture, June 1998.
[16]
Jack L. Lo, Luiz Andre Barroso, Susan J. Eggers, Kourosh Gharachorloo, Henry M. Levy, and Sujay S. Parekh. An analysis of database workload performance on simultaneous multithreaded processors. In Proceedings of the 25th International Symposium on Computer Architecture, June 1998.
[17]
Kyle J. Nesbit and James E. Smith. Data cache prefetching using a global history buffer. In Proceedings of the 10th Symposium on High-Performance Computer Architecture, Feb. 2004.
[18]
Minglong Shao, Anastassia Ailamaki, and Babak Falsafi. DBmbench: Fast and accurate database workload representation on modern microarchitecture. In Proceedings of the IBM Center for Advanced Studies Conference, Oct. 2005.
[19]
Timothy Sherwood, Suleyman Sair, and Brad Calder. Predictor-directed stream buffers. In Proceedings of the 33rd International Symposium on Microarchitecture, Dec. 2000.
[20]
Yan Solihin, Jaejin Lee, and Josep Torrellas. Using a user-level memory thread for correlation prefetching. In Proceedings of the 29th International Symposium on Computer Architecture, May 2002.
[21]
Stephen Somogyi, Thomas F. Wenisch, Anastassia Ailamaki, Babak Falsafi, and Andreas Moshovos. Spatial memory streaming. In Proceedings of the 33rd International Symposium on Computer Architecture, June 2006.
[22]
Pedro Trancoso, Josep-L. Larriba-Pey, Zheng Zhang, and Josep Torellas. The memory performance of DSS commercial workloads in shared-memory multiprocessors. In Proceedings of the 3rd Symposium on High-Performance Computer Architecture, Feb. 1997.
[23]
Thomas F. Wenisch, Anastassia Ailamaki, Babak Falsafi, and Andreas Moshovos. Mechanisms for store-wait-free multiprocessors. In Proceedings of the 34th International Symposium on Computer Architecture, Junr 2007.
[24]
Thomas F. Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos. Temporal streams in commercial server applications. In Proceedings of the International Symposium on Workload Characterization, Sep. 2008.
[25]
Thomas F. Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos. Practical off-chip meta-data for address-correlated prefetching. In Proceedings of the 15th Symposium on High-Performance Computer Architecture, Feb. 2009.
[26]
Thomas F. Wenisch, Stephen Somogyi, Nikolaos Hardavellas, Jangwoo Kim, Anastassia Ailamaki, and Babak Falsafi. Temporal streaming of shared memory. In Proceedings of the 32nd International Symposium on Computer Architecture, June 2005.
[27]
Thomas F. Wenisch, Roland E. Wunderlich, Michael Ferdman, Anastassia Ailamaki, Babak Falsafi, and James C. Hoe. SimFlex: statistical sampling of computer system simulation. IEEE Micro, 26(4):18--31, July-Aug. 2006.
[28]
Roland Wunderlich, Thomas Wenisch, Babak Falsafi, and James Hoe. SMARTS: Accelerating microarchitecture simulation through rigorous statistical sampling. In Proceedings of the 30th International Symposium on Computer Architecture, June 2003.
[29]
Weifeng Zhang, Brad Calder, and Dean M. Tullsen. A self-repairing prefetcher in an event-driven dynamic optimization framework. In Proceedings of the 4th International Symposium on Code Generation and Optimization, Mar. 2006.
[30]
Weifeng Zhang, Dean M. Tullsen, and Brad Calder. Accelerating and adapting precomputation threads for effcient prefetching. In Proceedings of the 13th Symposium on High-Performance Computer Architecture, Feb. 2007.

Cited By

View all
  • (2024)Exploiting Vector Code Semantics for Efficient Data Cache PrefetchingProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656635(98-109)Online publication date: 30-May-2024
  • (2024)PATHFINDER: Practical Real-Time Learning for Data PrefetchingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651332(785-800)Online publication date: 27-Apr-2024
  • (2024)PARS: A Pattern-Aware Spatial Data Prefetcher Supporting Multiple Region SizesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.344298143:11(3638-3649)Online publication date: Nov-2024
  • Show More Cited By

Index Terms

  1. Spatio-temporal memory streaming

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 37, Issue 3
    June 2009
    495 pages
    ISSN:0163-5964
    DOI:10.1145/1555815
    Issue’s Table of Contents
    • cover image ACM Conferences
      ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture
      June 2009
      510 pages
      ISBN:9781605585260
      DOI:10.1145/1555754
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 June 2009
    Published in SIGARCH Volume 37, Issue 3

    Check for updates

    Author Tags

    1. prefetching
    2. spatial correlation
    3. temporal correlation

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)120
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 11 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Exploiting Vector Code Semantics for Efficient Data Cache PrefetchingProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656635(98-109)Online publication date: 30-May-2024
    • (2024)PATHFINDER: Practical Real-Time Learning for Data PrefetchingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651332(785-800)Online publication date: 27-Apr-2024
    • (2024)PARS: A Pattern-Aware Spatial Data Prefetcher Supporting Multiple Region SizesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.344298143:11(3638-3649)Online publication date: Nov-2024
    • (2024)A New Formulation of Neural Data Prefetching2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00088(1173-1187)Online publication date: 29-Jun-2024
    • (2024)Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction Execution2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00017(88-102)Online publication date: 29-Jun-2024
    • (2024)Circuit Design of Multi-Level Adaptive Load Prefetcher2024 4th International Conference on Electronics, Circuits and Information Engineering (ECIE)10.1109/ECIE61885.2024.10627725(145-154)Online publication date: 24-May-2024
    • (2024)RL-CoPref: a reinforcement learning-based coordinated prefetching controller for multiple prefetchersThe Journal of Supercomputing10.1007/s11227-024-05938-980:9(13001-13026)Online publication date: 1-Jun-2024
    • (2023)Building Efficient Neural PrefetcherProceedings of the International Symposium on Memory Systems10.1145/3631882.3631903(1-12)Online publication date: 2-Oct-2023
    • (2023)Cache in Hand: Expander-Driven CXL Prefetcher for Next Generation CXL-SSDProceedings of the 15th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3599691.3603406(24-30)Online publication date: 9-Jul-2023
    • (2023)Object Fingerprint Cache for Heterogeneous Memory SystemIEEE Transactions on Computers10.1109/TC.2023.325185272:9(2496-2507)Online publication date: 1-Sep-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media