More Web Proxy on the site http://driver.im/

research-article

Prefetch throttling and data pinning for improving performance of shared caches

Authors:

Mahmut Kandemir,

Mustafa KarakoyAuthors Info & Claims

SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing

Article No.: 59, Pages 1 - 12

Published: 15 November 2008 Publication History

Abstract

In this paper, we (i) quantify the impact of compiler-directed I/O prefetching on shared caches at I/O nodes. The experimental data collected shows that while I/O prefetching brings some benefits, its effectiveness reduces significantly as the number of clients (compute nodes) is increased; (ii) identify interclient misses due to harmful I/O prefetches as one of the main sources for this reduction in performance with increased number of clients; and (iii) propose and experimentally evaluate prefetch throttling and data pinning schemes to improve performance of I/O prefetching. Prefetch throttling prevents one or more clients from issuing further prefetches if such prefetches are predicted to be harmful, i.e., replace from the memory cache the useful data accessed by other clients. Data pinning on the other hand makes selected data blocks immune to harmful prefetches by pinning them in the memory cache. We show that these two schemes can be applied in isolation or combined together, and they can be applied at a coarse or fine granularity. Our experiments with these two optimizations using four disk-intensive applications reveal that they can improve performance by 9.7% and 15.1% on average, over standard compiler-directed I/O prefetching and no-prefetch case, respectively, when 8 clients are used.

References

[1]

D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks, Summary and Preliminary Results. In Proceedings of the ACM/IEEE Conference on Supercomputing, pages 158--165, 1991.

Digital Library

[2]

S. Bansal and D. S. Modha. CAR: Clock with Adaptive Replacement. In Proceedings of the USENIX Conference on File and Storage Technologies, pages 187--200, 2004.

Digital Library

[3]

A. D. Brown and T. C. Mowry. Taming the Memory Hogs: Using Compiler-Inserted Releases to Manage Physical Memory Intelligently. In Proceedings of the Symposium on Operating System Design & Implementation, pages 31--44, 2000.

Digital Library

[4]

P. H. Carns, W. B. L. III, R. B. Ross, and R. Thakur. PVFS: A Parallel File System for Linux Clusters. In Proceedings of the Annual Linux Showcase and Conference, pages 317--327, 2000.

Digital Library

[5]

C.-L. Chee, H. Lu, H. Tang, and C. V. Ramamoorthy. Improving I/O Response Times via Prefetching and Storage System Reorganization. In Proceedings of the International Computer Software and Applications Conference, pages 143--148, 1997.

Digital Library

[6]

Z. Chen, Y. Zhou, and K. Li. Eviction-based Cache Placement for Storage Caches. In Proceedings of the USENIX Annual Technical Conference, pages 269--281, 2003.

[7]

F. J. Corbato. A Paging Experiment with the Multics System, 1969.

[8]

P. J. Denning. Working Sets Past and Present. IEEE Trans. Software Eng., 6(1):64--84, 1980.

Digital Library

[9]

X. Ding, S. Jiang, F. Chen, K. Davis, and X. Zhang. DiskSeen: Exploiting Disk Layout and Access History to Enhance I/O Prefetch. In Proceedings of the USENIX Annual Technical Conference, pages 261--274, 2007.

Digital Library

[10]

B. S. Gill and L. A. D. Bathen. AMP: Adaptive Multi-stream Prefetching in a Shared Cache. In Proceedings of the 5th USENIX Conference on File and Storage Technologies, pages 185--198, 2007.

Digital Library

[11]

B. S. Gill and D. S. Modha. SARC: Sequential Prefetching in Adaptive Replacement Cache. In Proceedings of the USENIX Annual Technical Conference, pages 293--308, 2005.

Digital Library

[12]

B. C. Gunter, W. C. Reiley, and R. A. V. D. Geijn. Parallel Out-of-core Cholesky and QR Factorizations with Pooclapack. In Proceedings of the International Parallel and Distributed Processing Symposium, pages 1885--1894, 2001.

Digital Library

[13]

J. L. Henning. SPEC CPU2000: Measuring CPU Performance in the New Millennium. Computer, 33(7):28--35, 2000.

Digital Library

[14]

S. Jiang, F. Chen, and X. Zhang. CLOCK-Pro: an effective improvement of the CLOCK replacement. In Proceedings of the USENIX Annual Technical Conference, pages 35--35, 2005.

Digital Library

[15]

S. Jiang, X. Ding, F. Chen, E. Tan, and X. Zhang. DULO: An Effective Buffer Cache Management Scheme to Exploit Both Temporal and Spatial Localities. In Proceedings of the USENIX Conference on File and Storage Technologies, 2005.

Digital Library

[16]

S. Jiang and X. Zhang. LIRS: An Efficient Low Inter-Reference Recency Set Replacement Policy to Improve Buffer Cache Performance. In Proceedings of the SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pages 31--42, 2002.

Digital Library

[17]

T. Johnson and D. Shasha. 2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm. In Proceedings of the International Conference on Very Large Data Bases, pages 439--450, 1994.

Digital Library

[18]

M. Kallahalla and P. J. Varman. Optimal Prefetching and Caching for Parallel I/O Sytems. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures, pages 219--228, 2001.

Digital Library

[19]

M. S. Lam and M. E. Wolf. A Data Locality Optimizing Algorithm. SIGPLAN Not., 39(4):442--459, 2004.

Digital Library

[20]

D. T. Larose. Data Mining Methods and Models. John Wiley & Sons, 2006.

Digital Library

[21]

D. Lee, J. Choi, J.-H. Kim, S. H. Noh, S. L. Min, Y. Cho, and C. S. Kim. On the Existence of a Spectrum of Policies that Subsumes the Least Recently Used (LRU) and Least Frequently Used (LFU) Policies. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pages 134--143, 1999.

Digital Library

[22]

C. Li and K. Shen. Managing Prefetch Memory for Data-Intensive Online Servers. In Proceedings of the USENIX Conference on File and Storage Technologies, 2005.

Digital Library

[23]

X. Li, A. Aboulnaga, K. Salem, A. Sachedina, and S. Gao. Second-Tier Cache Management Using Write Hints. In Proceedings of the USENIX Conference on File and Storage Technologies, 2005.

Digital Library

[24]

N. Megiddo and D. S. Modha. ARC: A Self-Tuning, Low Overhead Replacement Cache. In Proceedings of the USENIX Conference on File and Storage Technologies, pages 115--130, 2003.

Digital Library

[25]

T. C. Mowry, A. K. Demke, and O. Krieger. Automatic Compiler-Inserted I/O Prefetching for Out-of-Core Applications. In Proceedings of the Symposium on Operating Systems Design and Implementation, pages 3--17, 1996.

Digital Library

[26]

T. C. Mowry, M. S. Lam, and A. Gupta. Design and Evaluation of a Compiler Algorithm for Prefetching. SIGPLAN Not., 27(9):62--73, 1992.

Digital Library

[27]

R. H. Patterson, G. A. Gibson, E. Ginting, D. Stodolsky, and J. Zelenka. Informed Prefetching and Caching. In Proceedings of the ACM Symposium on Operating Systems Principles, pages 79--95, 1995.

Digital Library

[28]

T. Kimbrel et al. A Trace-Driven Comparison of Algorithms for Parallel Prefetching and Caching. In OSDI, pages 19--34, 1996.

Digital Library

[29]

R. Thakur and A. Choudhary. An Extended Two-Phase Method for Accessing Sections of Out-of-Core Arrays. In Scientific Programming, pages 301--317, 1996.

Digital Library

[30]

R. Thakur, W. Gropp, and E. Lusk. Data sieving and collective I/O in ROMIO. In Proceedings of the Symposium on the Frontiers of Massively Parallel Computation, pages 182--189, 1999.

Digital Library

[31]

A. Tomkins, R. H. Patterson, and G. Gibson. Informed Multi-Process Prefetching and Caching. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pages 100--114, 1997.

Digital Library

[32]

N. Tran and D. A. Reed. Automatic ARIMA Time Series Modeling for Adaptive I/O Prefetching. IEEE Trans. Parallel Distrib. Syst., 15(4):362--377, 2004.

Digital Library

[33]

M. Vilayannur, A. Sivasubramaniam, M. T. Kandemir, R. Thakur, and R. B. Ross. Discretionary Caching for I/O on Clusters. In IEEE International Symposium on Cluster Computing and the Grid, pages 96--103, 2003.

Digital Library

[34]

R. P. Wilson, R. S. French, C. S. Wilson, S. P. Amarasinghe, J. M. Anderson, S. W. K. Tjiang, S.-W. Liao, C.-W. Tseng, M. W. Hall, M. S. Lam, and J. L. Hennessy. SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers. SIGPLAN Not., 29(12):31--37, 1994.

Digital Library

[35]

M. J. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1995.

Digital Library

[36]

T. M. Wong and J. Wilkes. My Cache or Yours? Making Storage More Exclusive. In Proceedings of the USENIX Annual Technical Conference, pages 161--175, 2002.

Digital Library

[37]

G. Yadgar, M. Factor, and A. Schuster. Karma: Know-it-All Replacement for a Multilevel Cache. In Proceedings of the USENIX Conference on File and Storage Technologies, pages 25--25, 2007.

Digital Library

[38]

Y. Zhou, J. Philbin, and K. Li. The Multi-Queue Replacement Algorithm for Second Level Buffer Caches. In Proceedings of the USENIX Annual Technical Conference, pages 91--104, 2001.

Digital Library

Index Terms

Prefetch throttling and data pinning for improving performance of shared caches

Recommendations

Near-side prefetch throttling: adaptive prefetching for high-performance many-core processors
PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques

In modern processors, prefetching is an essential component for hiding long-latency memory accesses. However, prefetching too aggressively can easily degrade performance by evicting useful data from cache, or by saturating precious memory bandwidth. ...
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
Special Issue: Proceedings of the 17th annual international symposium on Computer Architecture

Projections of computer technology forecast processors with peak performance of 1,000 MIPS in the relatively near future. These processors could easily lose half or more of their performance in the memory hierarchy if the hierarchy design is based on ...
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
ISCA '90: Proceedings of the 17th annual international symposium on Computer Architecture

Projections of computer technology forecast processors with peak performance of 1,000 MIPS in the relatively near future. These processors could easily lose half or more of their performance in the memory hierarchy if the hierarchy design is based on ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing

November 2008

739 pages

ISBN:9781424428359

Conference Chair:
Patricia Teller

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS: Computer Society

Publisher

IEEE Press

Publication History

Published: 15 November 2008

Check for updates

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

SC '08

Sponsor:

SIGARCH
IEEE-CS

SC '08: International Conference for High Performance Computing, Networking, Storage and Analysis

November 15 - 21, 2008

Texas, Austin

Acceptance Rates

SC '08 Paper Acceptance Rate 59 of 277 submissions, 21%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
339
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents