[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Understanding the behavior and implications of context switch misses

Published: 30 December 2010 Publication History

Abstract

One of the essential features in modern computer systems is context switching, which allows multiple threads of execution to time-share a limited number of processors. While very useful, context switching can introduce high performance overheads, with one of the primary reasons being the cache perturbation effect. Between the time a thread is switched out and when it resumes execution, parts of its working set in the cache may be perturbed by other interfering threads, leading to (context switch) cache misses to recover from the perturbation.
The goal of this article is to understand how cache parameters and application behavior influence the number of context switch misses the application suffers from. We characterize a previously unreported type of context switch misses that occur as the artifact of the interaction of cache replacement policy and an application's temporal reuse behavior. We characterize the behavior of these “reordered misses” for various applications, cache sizes, and various amount of cache perturbation. As a second contribution, we develop an analytical model that reveals the mathematical relationship between cache design parameters, an application's temporal reuse pattern, and the number of context switch misses the application suffers from. We validate the model against simulation studies and find that it is sufficiently accurate in predicting the trends of context switch misses with regard to various cache perturbation amount.
The mathematical relationship provided by the model allows us to derive insights into precisely why some applications are more vulnerable to context switch misses than others. Through a case study on prefetching, we find that prefetching tends to aggravate the number of context switch misses and a less aggresive prefetching technique can reduce the number of context switch misses the application suffers from. We also investigate how cache sizes affect context switch misses. Our study shows that under relatively heavy workloads in the system, the worst-case number of context switch misses an application suffers from tends to increase proportionally with cache sizes, to the extent that may completely negate the reduction in other types of cache misses.

References

[1]
Agarwal, A., Hennessy, J., and Horowitz, M. 1989. An analytical cache model. ACM Trans. Comput. Syst. 7, 2, 184--215.
[2]
Agarwal, A., Hennessy, J., and Horowitz, M. 1988. Cache performance of operating system and multiprogramming workloads. ACM Trans. Comput. Syst. 6, 4, 393--431.
[3]
Cascaval, C. and Padua, D. A. 2003. Estimating cache misses and locality using stack distances. In Proceedings of the 17th Annual International Conference on Supercomputing. 150--159.
[4]
Cascaval, C., DeRose, L., Padua, D. A., and Reed, D. 1999. Compile-Time based performance prediction. In Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing. 365--379.
[5]
David, F. M., Carlyle, J., and Campbell, R. H. 2007. Context switch overheads for linux on arm platforms. In Proceedings of the Workshop on Experimental Computer Science.
[6]
Fromm, R. and Treuhaft, N. 1996. Revisiting the cache interference costs of context switches. http://citeseer.ist.psu.edu/252861.html.
[7]
Guo, F. and Solihin, Y. 2006. An analytical model for cache replacement policy performance. In Proceedings of the ACM SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer System. 228--239.
[8]
Hartstein, A., Srinivasan, V., Puzak, T., and Emma, P. 2008. On the nature of cache miss behavior: Is it … ? J. Instruction-Level Parall.10.
[9]
Hinton, G., Sager, D., Upton, M., Boggs, D., Carmean, D., Kyker, A., and Roussel, P. 2001. The microarchitecture of the pentium 4 processor. Intel Technol. J. 1Q.
[10]
Hwu, W. W. and Conte, T. 1994. The susceptibility of programs to context switching. IEEE Trans. Comput. 43, 9, 994--1003.
[11]
IBM. 2002. IBM power4 system architecture. White paper.
[12]
Jouppi, N. 1990. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proceedings of the 17th International Symposium on Computer Architecture. 364--373.
[13]
Koka, P. and Lipasti, M. H. 2005. Opportunities for cache friendly process scheduling. In Workshop on Interaction Between Operating Systems and Computer Architecture.
[14]
Kwak, H., Lee, B., Hurson, A., Yoon, S., and Hahn, W. 1999. E cache performance. IEEE Trans. Comput. 48, 2, 176--184.
[15]
Li, C., Ding, C., and Shen, K. 2007. Quantifying the cost of context switch. In Proceedings of the Workshop on Experimental Computer Science.
[16]
Magnusson, P. S., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., and Werner, B. 2002. Simics: A full system simulation platform. IEEE Comput. Soc. 35, 2, 50--58.
[17]
Mattson, R. L., Gecsei, J., Slutz, D., and Traiger, I. 1970. Evaluation techniques for storage hierarchies. IBM Syst. J. 9, 2, 78--117.
[18]
Mogul, J. and Borg, A. 1991. The effect of context switches on cache performance. In Proceedings of the 4thInternational Conference on Architectural Support for Programming Languages and Operating Systems. 75--84.
[19]
Palacharla, S. and Kessler, R. 1994. Evaluating stream buffers as a secondary cache replacement. In Proceedings of the 21st International Symposium on Computer Architecture. 24--33.
[20]
Rogers, B., Krishna, A., Bell, G., Jiang, X., and Solihin, Y. 2009. Scaling the bandwidth wall: Challenges in and avenues for cmp scaling. In Proceedings of the 36th International Conference on Computer Architecture (ISCA).
[21]
Standard Performance Evaluation Corporation. 2006. Spec cpu2006 benchmarks. http://www.{} spec.org
[22]
Suh, G. E., Devadas, S., and Rudolph, L. 2002. A new memory monitoring scheme for memory-aware scheduling and partitioning. In Proceedings of International Symposium on High Performance Computer Architecture. 117--126.
[23]
Suh, G. E., Devadas, S., and Rudolph, L. 2001. Analytical Cache Models with Applications to Cache Partitioning. In Proceedings of the International Conference on Supercomputing. 1--12.
[24]
Thiebaut, D. and Stone, H. S. 1987. Footprints in the cache. ACM Trans. Comput. Syst. 5, 4, 305--329.
[25]
Tsafrir, D. 2007. The context-switching overhead inflicted by handling hardware interrupts. In Proceedings of the Workshop on Experimental Computer Science.

Cited By

View all
  • (2022)Exploring the Impact of Virtualization on the Usability of Deep Learning Applications2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid54584.2022.00054(442-451)Online publication date: May-2022
  • (2021)Quantifying context switch overhead of artificial intelligence workloads on the cloud and edgesProceedings of the 36th Annual ACM Symposium on Applied Computing10.1145/3412841.3441993(1182-1189)Online publication date: 22-Mar-2021
  • (2020)Scope-Aware Useful Cache Block Calculation for Cache-Related Pre-Emption Delay Analysis With Set-Associative Data CachesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2019.293780739:10(2333-2346)Online publication date: Oct-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 7, Issue 4
December 2010
167 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/1880043
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 December 2010
Accepted: 01 August 2010
Received: 01 March 2010
Published in TACO Volume 7, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Context switch misses
  2. analytical model
  3. prefetching
  4. stack distance profiling

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)226
  • Downloads (Last 6 weeks)14
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Exploring the Impact of Virtualization on the Usability of Deep Learning Applications2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid54584.2022.00054(442-451)Online publication date: May-2022
  • (2021)Quantifying context switch overhead of artificial intelligence workloads on the cloud and edgesProceedings of the 36th Annual ACM Symposium on Applied Computing10.1145/3412841.3441993(1182-1189)Online publication date: 22-Mar-2021
  • (2020)Scope-Aware Useful Cache Block Calculation for Cache-Related Pre-Emption Delay Analysis With Set-Associative Data CachesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2019.293780739:10(2333-2346)Online publication date: Oct-2020
  • (2019)BRB: Mitigating Branch Predictor Side-Channels.2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2019.00058(466-477)Online publication date: Feb-2019
  • (2018)Fast and Accurate Performance Analysis of SynchronizationProceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3178442.3178446(31-40)Online publication date: 24-Feb-2018
  • (2017)CSALTProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3124549(449-462)Online publication date: 14-Oct-2017
  • (2017)Achieving Versatile and Simultaneous Cache Optimizations With Nonvolatile SRAMIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2016.258287236:2(241-254)Online publication date: 1-Feb-2017
  • (2017)Scope-Aware Useful Cache Block Analysis for Data Cache Related Preemption Delay2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS.2017.35(63-74)Online publication date: Apr-2017
  • (2015)A Novel Cost Based Model for Energy Consumption in Cloud ComputingThe Scientific World Journal10.1155/2015/7245242015:1Online publication date: 15-Jan-2015
  • (2014)TACOScientific Programming10.1155/2014/42308422:3(223-237)Online publication date: 1-Jul-2014
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media