[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/1413370.1413430acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Prefetch throttling and data pinning for improving performance of shared caches

Published: 15 November 2008 Publication History

Abstract

In this paper, we (i) quantify the impact of compiler-directed I/O prefetching on shared caches at I/O nodes. The experimental data collected shows that while I/O prefetching brings some benefits, its effectiveness reduces significantly as the number of clients (compute nodes) is increased; (ii) identify interclient misses due to harmful I/O prefetches as one of the main sources for this reduction in performance with increased number of clients; and (iii) propose and experimentally evaluate prefetch throttling and data pinning schemes to improve performance of I/O prefetching. Prefetch throttling prevents one or more clients from issuing further prefetches if such prefetches are predicted to be harmful, i.e., replace from the memory cache the useful data accessed by other clients. Data pinning on the other hand makes selected data blocks immune to harmful prefetches by pinning them in the memory cache. We show that these two schemes can be applied in isolation or combined together, and they can be applied at a coarse or fine granularity. Our experiments with these two optimizations using four disk-intensive applications reveal that they can improve performance by 9.7% and 15.1% on average, over standard compiler-directed I/O prefetching and no-prefetch case, respectively, when 8 clients are used.

References

[1]
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks, Summary and Preliminary Results. In Proceedings of the ACM/IEEE Conference on Supercomputing, pages 158--165, 1991.
[2]
S. Bansal and D. S. Modha. CAR: Clock with Adaptive Replacement. In Proceedings of the USENIX Conference on File and Storage Technologies, pages 187--200, 2004.
[3]
A. D. Brown and T. C. Mowry. Taming the Memory Hogs: Using Compiler-Inserted Releases to Manage Physical Memory Intelligently. In Proceedings of the Symposium on Operating System Design & Implementation, pages 31--44, 2000.
[4]
P. H. Carns, W. B. L. III, R. B. Ross, and R. Thakur. PVFS: A Parallel File System for Linux Clusters. In Proceedings of the Annual Linux Showcase and Conference, pages 317--327, 2000.
[5]
C.-L. Chee, H. Lu, H. Tang, and C. V. Ramamoorthy. Improving I/O Response Times via Prefetching and Storage System Reorganization. In Proceedings of the International Computer Software and Applications Conference, pages 143--148, 1997.
[6]
Z. Chen, Y. Zhou, and K. Li. Eviction-based Cache Placement for Storage Caches. In Proceedings of the USENIX Annual Technical Conference, pages 269--281, 2003.
[7]
F. J. Corbato. A Paging Experiment with the Multics System, 1969.
[8]
P. J. Denning. Working Sets Past and Present. IEEE Trans. Software Eng., 6(1):64--84, 1980.
[9]
X. Ding, S. Jiang, F. Chen, K. Davis, and X. Zhang. DiskSeen: Exploiting Disk Layout and Access History to Enhance I/O Prefetch. In Proceedings of the USENIX Annual Technical Conference, pages 261--274, 2007.
[10]
B. S. Gill and L. A. D. Bathen. AMP: Adaptive Multi-stream Prefetching in a Shared Cache. In Proceedings of the 5th USENIX Conference on File and Storage Technologies, pages 185--198, 2007.
[11]
B. S. Gill and D. S. Modha. SARC: Sequential Prefetching in Adaptive Replacement Cache. In Proceedings of the USENIX Annual Technical Conference, pages 293--308, 2005.
[12]
B. C. Gunter, W. C. Reiley, and R. A. V. D. Geijn. Parallel Out-of-core Cholesky and QR Factorizations with Pooclapack. In Proceedings of the International Parallel and Distributed Processing Symposium, pages 1885--1894, 2001.
[13]
J. L. Henning. SPEC CPU2000: Measuring CPU Performance in the New Millennium. Computer, 33(7):28--35, 2000.
[14]
S. Jiang, F. Chen, and X. Zhang. CLOCK-Pro: an effective improvement of the CLOCK replacement. In Proceedings of the USENIX Annual Technical Conference, pages 35--35, 2005.
[15]
S. Jiang, X. Ding, F. Chen, E. Tan, and X. Zhang. DULO: An Effective Buffer Cache Management Scheme to Exploit Both Temporal and Spatial Localities. In Proceedings of the USENIX Conference on File and Storage Technologies, 2005.
[16]
S. Jiang and X. Zhang. LIRS: An Efficient Low Inter-Reference Recency Set Replacement Policy to Improve Buffer Cache Performance. In Proceedings of the SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pages 31--42, 2002.
[17]
T. Johnson and D. Shasha. 2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm. In Proceedings of the International Conference on Very Large Data Bases, pages 439--450, 1994.
[18]
M. Kallahalla and P. J. Varman. Optimal Prefetching and Caching for Parallel I/O Sytems. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures, pages 219--228, 2001.
[19]
M. S. Lam and M. E. Wolf. A Data Locality Optimizing Algorithm. SIGPLAN Not., 39(4):442--459, 2004.
[20]
D. T. Larose. Data Mining Methods and Models. John Wiley & Sons, 2006.
[21]
D. Lee, J. Choi, J.-H. Kim, S. H. Noh, S. L. Min, Y. Cho, and C. S. Kim. On the Existence of a Spectrum of Policies that Subsumes the Least Recently Used (LRU) and Least Frequently Used (LFU) Policies. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pages 134--143, 1999.
[22]
C. Li and K. Shen. Managing Prefetch Memory for Data-Intensive Online Servers. In Proceedings of the USENIX Conference on File and Storage Technologies, 2005.
[23]
X. Li, A. Aboulnaga, K. Salem, A. Sachedina, and S. Gao. Second-Tier Cache Management Using Write Hints. In Proceedings of the USENIX Conference on File and Storage Technologies, 2005.
[24]
N. Megiddo and D. S. Modha. ARC: A Self-Tuning, Low Overhead Replacement Cache. In Proceedings of the USENIX Conference on File and Storage Technologies, pages 115--130, 2003.
[25]
T. C. Mowry, A. K. Demke, and O. Krieger. Automatic Compiler-Inserted I/O Prefetching for Out-of-Core Applications. In Proceedings of the Symposium on Operating Systems Design and Implementation, pages 3--17, 1996.
[26]
T. C. Mowry, M. S. Lam, and A. Gupta. Design and Evaluation of a Compiler Algorithm for Prefetching. SIGPLAN Not., 27(9):62--73, 1992.
[27]
R. H. Patterson, G. A. Gibson, E. Ginting, D. Stodolsky, and J. Zelenka. Informed Prefetching and Caching. In Proceedings of the ACM Symposium on Operating Systems Principles, pages 79--95, 1995.
[28]
T. Kimbrel et al. A Trace-Driven Comparison of Algorithms for Parallel Prefetching and Caching. In OSDI, pages 19--34, 1996.
[29]
R. Thakur and A. Choudhary. An Extended Two-Phase Method for Accessing Sections of Out-of-Core Arrays. In Scientific Programming, pages 301--317, 1996.
[30]
R. Thakur, W. Gropp, and E. Lusk. Data sieving and collective I/O in ROMIO. In Proceedings of the Symposium on the Frontiers of Massively Parallel Computation, pages 182--189, 1999.
[31]
A. Tomkins, R. H. Patterson, and G. Gibson. Informed Multi-Process Prefetching and Caching. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pages 100--114, 1997.
[32]
N. Tran and D. A. Reed. Automatic ARIMA Time Series Modeling for Adaptive I/O Prefetching. IEEE Trans. Parallel Distrib. Syst., 15(4):362--377, 2004.
[33]
M. Vilayannur, A. Sivasubramaniam, M. T. Kandemir, R. Thakur, and R. B. Ross. Discretionary Caching for I/O on Clusters. In IEEE International Symposium on Cluster Computing and the Grid, pages 96--103, 2003.
[34]
R. P. Wilson, R. S. French, C. S. Wilson, S. P. Amarasinghe, J. M. Anderson, S. W. K. Tjiang, S.-W. Liao, C.-W. Tseng, M. W. Hall, M. S. Lam, and J. L. Hennessy. SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers. SIGPLAN Not., 29(12):31--37, 1994.
[35]
M. J. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1995.
[36]
T. M. Wong and J. Wilkes. My Cache or Yours? Making Storage More Exclusive. In Proceedings of the USENIX Annual Technical Conference, pages 161--175, 2002.
[37]
G. Yadgar, M. Factor, and A. Schuster. Karma: Know-it-All Replacement for a Multilevel Cache. In Proceedings of the USENIX Conference on File and Storage Technologies, pages 25--25, 2007.
[38]
Y. Zhou, J. Philbin, and K. Li. The Multi-Queue Replacement Algorithm for Second Level Buffer Caches. In Proceedings of the USENIX Annual Technical Conference, pages 91--104, 2001.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing
November 2008
739 pages
ISBN:9781424428359

Sponsors

Publisher

IEEE Press

Publication History

Published: 15 November 2008

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

SC '08
Sponsor:

Acceptance Rates

SC '08 Paper Acceptance Rate 59 of 277 submissions, 21%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 339
    Total Downloads
  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media