Techniques are developed in this dissertation to efficiently evaluate direct-mapped and set-associative caches. These techniques are used to study associativity in CPU caches and examine instruction caches for single-chip RISC microprocessors. This research is motivated in general by the importance of cache memories to computer performance, and more specifically by work done to design the caches in SPUR, a multiprocessor workstation designed at U.C. Berkeley. The studies focus not only on abstract measures of performance such as miss ratios, but also include, when appropriate, detailed implementation factors, such as access times and gate delays.
The simulation algorithms developed compute miss ratios for numerous alternative caches with one pass through an address trace, provided all caches have the same block size, and use demand fetching and LRU replacement. One algorithm (forest simulation) simulates direct-mapped caches by relying on inclusion, a property that all larger caches contain a superset of the data in smaller caches. The other algorithm (all associativity simulation) simulates a broader class of direct-mapped and set-associative caches than could previously be studied with a one-pass algorithm, although somewhat less efficiently than forest simulation, since inclusion does not hold.
The analysis of set-associative caches yields two major results. First, constant factors are obtained which relate to the miss ratios for set-associative caches to miss ratios for other set-associative caches. Then those results are combined with sample cache implementations to show that above certain cache sizes, direct-mapped caches have lower effective access times than set-associative caches, despite having higher miss ratios.
Finally, instruction buffers and target instruction buffers are examined as organizations for instruction memory on single-chip microprocessors. The analysis focuses closely on implementation considerations, including the interaction between instruction fetches, instruction prefetches and data references, and uses the SPUR RISC design as the case study. Results show the effects of varying numerous design parameters, suggest some superior designs, and demonstrate that instruction buffers will be preferred to target instruction buffers in future RISC microprocessors implemented on single CMOS chips.
Cited By
- Li P, Pronovost C, Wilson W, Tait B, Zhou J, Ding C and Criswell J Beating OPT with Statistical Clairvoyance and Variable Size Caching Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, (243-256)
- Xiang X, Ding C, Luo H and Bao B (2013). HOTL, ACM SIGPLAN Notices, 48:4, (343-356), Online publication date: 23-Apr-2013.
- Xiang X, Ding C, Luo H and Bao B (2013). HOTL, ACM SIGARCH Computer Architecture News, 41:1, (343-356), Online publication date: 29-Mar-2013.
- Xiang X, Ding C, Luo H and Bao B HOTL Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems, (343-356)
- Gu X and Ding C (2012). A generalized theory of collaborative caching, ACM SIGPLAN Notices, 47:11, (109-120), Online publication date: 8-Jan-2013.
- Cain H and Lipasti M Edge chasing delayed consistency Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability, (15-24)
- Gu X and Ding C A generalized theory of collaborative caching Proceedings of the 2012 international symposium on Memory Management, (109-120)
- Svärd P, Hudzia B, Tordsson J and Elmroth E (2011). Evaluation of delta compression techniques for efficient live migration of large virtual machines, ACM SIGPLAN Notices, 46:7, (111-120), Online publication date: 15-Jul-2011.
- Svärd P, Hudzia B, Tordsson J and Elmroth E Evaluation of delta compression techniques for efficient live migration of large virtual machines Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, (111-120)
- Geelen B, Ferentinos V, Catthoor F, Toulatos S, Lafruit G, Stouraitis T, Lauwereins R and Verkest D (2009). Exploiting Varying Resource Requirements in Wavelet-based Applications in Dynamic Execution Environments, Journal of Signal Processing Systems, 56:2-3, (125-139), Online publication date: 1-Sep-2009.
- Srikantaiah S, Kandemir M and Irwin M (2008). Adaptive set pinning, ACM SIGPLAN Notices, 43:3, (135-144), Online publication date: 25-Mar-2008.
- Srikantaiah S, Kandemir M and Irwin M (2008). Adaptive set pinning, ACM SIGOPS Operating Systems Review, 42:2, (135-144), Online publication date: 25-Mar-2008.
- Srikantaiah S, Kandemir M and Irwin M (2008). Adaptive set pinning, ACM SIGARCH Computer Architecture News, 36:1, (135-144), Online publication date: 25-Mar-2008.
- Srikantaiah S, Kandemir M and Irwin M Adaptive set pinning Proceedings of the 13th international conference on Architectural support for programming languages and operating systems, (135-144)
- Zhong Y, Dropsho S, Shen X, Studer A and Ding C (2007). Miss Rate Prediction Across Program Inputs and Cache Configurations, IEEE Transactions on Computers, 56:3, (328-343), Online publication date: 1-Mar-2007.
- Chen H, Li K and Wei B (2005). Memory Performance Optimizations For Real-Time Software HDTV Decoding, Journal of VLSI Signal Processing Systems, 41:2, (193-207), Online publication date: 1-Sep-2005.
- Spjuth M, Karlsson M and Hagersten E Skewed caches from a low-power perspective Proceedings of the 2nd conference on Computing frontiers, (152-160)
- Zhu Q and Zhou Y (2005). Power-Aware Storage Cache Management, IEEE Transactions on Computers, 54:5, (587-602), Online publication date: 1-May-2005.
- Ding C and Orlovich M The Potential of Computation Regrouping for Improving Locality Proceedings of the 2004 ACM/IEEE conference on Supercomputing
- Shen X, Zhong Y and Ding C Phase-Based miss rate prediction across program inputs Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing, (42-55)
- Zhu Q, Shankar A and Zhou Y PB-LRU Proceedings of the 18th annual international conference on Supercomputing, (79-88)
- Kerns D and Eggers S (2004). Balanced scheduling, ACM SIGPLAN Notices, 39:4, (515-527), Online publication date: 1-Apr-2004.
- Ding C and Zhong Y Predicting whole-program locality through reuse distance analysis Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, (245-257)
- Ding C and Zhong Y (2003). Predicting whole-program locality through reuse distance analysis, ACM SIGPLAN Notices, 38:5, (245-257), Online publication date: 9-May-2003.
- Braun G, Wieferink A, Schliebusch O, Leupers R, Meyr H and Nohl A Processor/Memory Co-Exploration on Multiple Abstraction Levels Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
- Hu Z, Kaxiras S and Martonosi M Timekeeping in the memory system Proceedings of the 29th annual international symposium on Computer architecture, (209-220)
- Hu Z, Kaxiras S and Martonosi M (2002). Timekeeping in the memory system, ACM SIGARCH Computer Architecture News, 30:2, (209-220), Online publication date: 1-May-2002.
- Collins J, Wang H, Tullsen D, Hughes C, Lee Y, Lavery D and Shen J Speculative precomputation Proceedings of the 28th annual international symposium on Computer architecture, (14-25)
- Collins J, Wang H, Tullsen D, Hughes C, Lee Y, Lavery D and Shen J (2001). Speculative precomputation, ACM SIGARCH Computer Architecture News, 29:2, (14-25), Online publication date: 1-May-2001.
- Pereira P, Heutte L and Lecourtier Y (2019). Source-to-Source Instrumentation for the Optimization of an Automatic Reading System, The Journal of Supercomputing, 18:1, (89-104), Online publication date: 1-Jan-2001.
- Sánchez J and González A (2000). Analyzing Data Locality in Numeric Applications, IEEE Micro, 20:4, (58-66), Online publication date: 1-Jul-2000.
- Collins J and Tullsen D Hardware identification of cache conflict misses Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture, (126-135)
- Harper J, Kerbyson D and Nudd G (1999). Analytical Modeling of Set-Associative Cache Behavior, IEEE Transactions on Computers, 48:10, (1009-1024), Online publication date: 1-Oct-1999.
- Ghosh S, Martonosi M and Malik S (1999). Cache miss equations, ACM Transactions on Programming Languages and Systems (TOPLAS), 21:4, (703-746), Online publication date: 1-Jul-1999.
- Chen Y, Bilas A, Damianakis S, Dubnicki C and Li K (1998). UTLB, ACM SIGOPS Operating Systems Review, 32:5, (193-204), Online publication date: 1-Dec-1998.
- Seidl M and Zorn B (1998). Segregating heap objects by reference behavior and lifetime, ACM SIGOPS Operating Systems Review, 32:5, (12-23), Online publication date: 1-Dec-1998.
- Chen Y, Bilas A, Damianakis S, Dubnicki C and Li K (2019). UTLB, ACM SIGPLAN Notices, 33:11, (193-204), Online publication date: 1-Nov-1998.
- Seidl M and Zorn B (2019). Segregating heap objects by reference behavior and lifetime, ACM SIGPLAN Notices, 33:11, (12-23), Online publication date: 1-Nov-1998.
- Chen Y, Bilas A, Damianakis S, Dubnicki C and Li K UTLB Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, (193-204)
- Seidl M and Zorn B Segregating heap objects by reference behavior and lifetime Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, (12-23)
- Jouppi N Improving direct-mapped cache performance by the addition of a small fully-associative cache prefetch buffers 25 years of the international symposia on Computer architecture (selected papers), (388-397)
- Jouppi N Retrospective: improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers 25 years of the international symposia on Computer architecture (selected papers), (71-73)
- Olukotun K, Mudge T and Brown R (1997). Multilevel Optimization of Pipelined Caches, IEEE Transactions on Computers, 46:10, (1093-1102), Online publication date: 1-Oct-1997.
- Lee M, Min S, Shin H, Kim C and Park C (2019). Threaded Prefetching, Real-Time Systems, 13:1, (47-65), Online publication date: 1-Jul-1997.
- Burger D, Kaxiras S and Goodman J DataScalar architectures Proceedings of the 24th annual international symposium on Computer architecture, (338-349)
- Michaud P, Seznec A and Uhlig R Trading conflict and capacity aliasing in conditional branch predictors Proceedings of the 24th annual international symposium on Computer architecture, (292-303)
- Burger D, Kaxiras S and Goodman J (1997). DataScalar architectures, ACM SIGARCH Computer Architecture News, 25:2, (338-349), Online publication date: 1-May-1997.
- Michaud P, Seznec A and Uhlig R (1997). Trading conflict and capacity aliasing in conditional branch predictors, ACM SIGARCH Computer Architecture News, 25:2, (292-303), Online publication date: 1-May-1997.
- Pierce J and Mudge T Wrong-path instruction prefetching Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture, (165-175)
- Xu Z Simulation of Heterogeneous Networks of Workstations Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
- Talluri M, Hill M and Khalidi Y (1995). A new page table for 64-bit address spaces, ACM SIGOPS Operating Systems Review, 29:5, (184-200), Online publication date: 3-Dec-1995.
- Talluri M, Hill M and Khalidi Y A new page table for 64-bit address spaces Proceedings of the fifteenth ACM symposium on Operating systems principles, (184-200)
- Lipasti M, Schmidt W, Kunkel S and Roediger R SPAID Proceedings of the 28th annual international symposium on Microarchitecture, (231-236)
- Lim S, Bae Y, Jang G, Rhee B, Min S, Park C, Shin H, Park K, Moon S and Kim C (1995). An Accurate Worst Case Timing Analysis for RISC Processors, IEEE Transactions on Software Engineering, 21:7, (593-604), Online publication date: 1-Jul-1995.
- Drach N, Seznec A and Windheiser D Direct-mapped versus set-associative pipelined caches Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, (79-88)
- Reinhold M Cache performance of garbage-collected programs Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation, (206-217)
- Gee J and Smith A The effectiveness of caches for vector processors Proceedings of the 8th international conference on Supercomputing, (333-343)
- Reinhold M (2019). Cache performance of garbage-collected programs, ACM SIGPLAN Notices, 29:6, (206-217), Online publication date: 1-Jun-1994.
- Temam O, Fricker C and Jalby W (2019). Cache interference phenomena, ACM SIGMETRICS Performance Evaluation Review, 22:1, (261-271), Online publication date: 1-May-1994.
- Temam O, Fricker C and Jalby W Cache interference phenomena Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, (261-271)
- Farrens M, Tyson G and Pleszkun A A study of single-chip processor/cache organizations for large numbers of transistors Proceedings of the 21st annual international symposium on Computer architecture, (338-347)
- Farrens M, Tyson G and Pleszkun A (1994). A study of single-chip processor/cache organizations for large numbers of transistors, ACM SIGARCH Computer Architecture News, 22:2, (338-347), Online publication date: 1-Apr-1994.
- Chen J and Bershad B The impact of operating system structure on memory system performance Proceedings of the fourteenth ACM symposium on Operating systems principles, (120-133)
- Chen J and Bershad B (1993). The impact of operating system structure on memory system performance, ACM SIGOPS Operating Systems Review, 27:5, (120-133), Online publication date: 1-Dec-1993.
- Jouppi N Cache write policies and performance Proceedings of the 20th annual international symposium on computer architecture, (191-201)
- Seznec A A case for two-way skewed-associative caches Proceedings of the 20th annual international symposium on computer architecture, (169-178)
- Jouppi N (1993). Cache write policies and performance, ACM SIGARCH Computer Architecture News, 21:2, (191-201), Online publication date: 1-May-1993.
- Seznec A (1993). A case for two-way skewed-associative caches, ACM SIGARCH Computer Architecture News, 21:2, (169-178), Online publication date: 1-May-1993.
- Taylor V Sparse matrix computations Proceedings of the 1992 ACM/IEEE conference on Supercomputing, (598-607)
- McFarling S Cache replacement with dynamic exclusion Proceedings of the 19th annual international symposium on Computer architecture, (191-200)
- Olukotun K, Mudge T and Brown R Performance optimization of pipelined primary cache Proceedings of the 19th annual international symposium on Computer architecture, (181-190)
- McFarling S (2019). Cache replacement with dynamic exclusion, ACM SIGARCH Computer Architecture News, 20:2, (191-200), Online publication date: 1-May-1992.
- Olukotun K, Mudge T and Brown R (2019). Performance optimization of pipelined primary cache, ACM SIGARCH Computer Architecture News, 20:2, (181-190), Online publication date: 1-May-1992.
- Farrens M and Park A Workload and implementation considerations for dynamic base register caching Proceedings of the 24th annual international symposium on Microarchitecture, (62-68)
- Becker J, Park A and Farrens M An analysis of the information content of address reference streams Proceedings of the 24th annual international symposium on Microarchitecture, (19-24)
- Miller E and Katz R Input/output behavior of supercomputing applications Proceedings of the 1991 ACM/IEEE conference on Supercomputing, (567-576)
- Baer J and Chen T An effective on-chip preloading scheme to reduce data access penalty Proceedings of the 1991 ACM/IEEE conference on Supercomputing, (176-186)
- Koldinger E, Eggers S and Levy H (1991). On the validity of trace-driven simulation for multiprocessors, ACM SIGARCH Computer Architecture News, 19:3, (244-253), Online publication date: 1-May-1991.
- Olukotun O, Mudge T and Brown R (1991). Implementing a cache for a high-performance GaAs microprocessor, ACM SIGARCH Computer Architecture News, 19:3, (138-147), Online publication date: 1-May-1991.
- Farrens M and Park A (1991). Dynamic base register caching, ACM SIGARCH Computer Architecture News, 19:3, (128-137), Online publication date: 1-May-1991.
- Klaiber A and Levy H (1991). An architecture for software-controlled data prefetching, ACM SIGARCH Computer Architecture News, 19:3, (43-53), Online publication date: 1-May-1991.
- Koldinger E, Eggers S and Levy H On the validity of trace-driven simulation for multiprocessors Proceedings of the 18th annual international symposium on Computer architecture, (244-253)
- Olukotun O, Mudge T and Brown R Implementing a cache for a high-performance GaAs microprocessor Proceedings of the 18th annual international symposium on Computer architecture, (138-147)
- Farrens M and Park A Dynamic base register caching Proceedings of the 18th annual international symposium on Computer architecture, (128-137)
- Klaiber A and Levy H An architecture for software-controlled data prefetching Proceedings of the 18th annual international symposium on Computer architecture, (43-53)
- Park A and Farrens M Address compression through base register caching Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture, (193-199)
- Jouppi N (2019). Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers, ACM SIGARCH Computer Architecture News, 18:2SI, (364-373), Online publication date: 1-Jun-1990.
- Jouppi N Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers Proceedings of the 17th annual international symposium on Computer Architecture, (364-373)
- Eggers S, Keppel D, Koldinger E and Levy H (2019). Techniques for efficient inline tracing on a shared-memory multiprocessor, ACM SIGMETRICS Performance Evaluation Review, 18:1, (37-47), Online publication date: 1-Apr-1990.
- Wang W and Baer J (2019). Efficient trace-driven simulation method for cache performance analysis, ACM SIGMETRICS Performance Evaluation Review, 18:1, (27-36), Online publication date: 1-Apr-1990.
- Eggers S, Keppel D, Koldinger E and Levy H Techniques for efficient inline tracing on a shared-memory multiprocessor Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems, (37-47)
- Wang W and Baer J Efficient trace-driven simulation method for cache performance analysis Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems, (27-36)
- Kessler R, Jooss R, Lebeck A and Hill M (1989). Inexpensive implementations of set-associativity, ACM SIGARCH Computer Architecture News, 17:3, (131-139), Online publication date: 1-Jun-1989.
- Przybylski S, Horowitz M and Hennessy J (1989). Characteristics of performance-optimal multi-level cache hierarchies, ACM SIGARCH Computer Architecture News, 17:3, (114-121), Online publication date: 1-Jun-1989.
- Kessler R, Jooss R, Lebeck A and Hill M Inexpensive implementations of set-associativity Proceedings of the 16th annual international symposium on Computer architecture, (131-139)
- Przybylski S, Horowitz M and Hennessy J Characteristics of performance-optimal multi-level cache hierarchies Proceedings of the 16th annual international symposium on Computer architecture, (114-121)
- McFarling S Program optimization for instruction caches Proceedings of the third international conference on Architectural support for programming languages and operating systems, (183-191)
- McFarling S (1989). Program optimization for instruction caches, ACM SIGARCH Computer Architecture News, 17:2, (183-191), Online publication date: 1-Apr-1989.
- Prybylski S, Horowitz M and Hennessy J Performance tradeoffs in cache design Proceedings of the 15th Annual International Symposium on Computer architecture, (290-298)
- Nguyen T, Srini V and Despain A A two-tier memory architecture for high-performance multiprocessor systems Proceedings of the 2nd international conference on Supercomputing, (326-336)
- Patterson D (1987). A progress report on SPUR: February 1, 1987, ACM SIGARCH Computer Architecture News, 15:1, (15-21), Online publication date: 1-Mar-1987.
Recommendations
A Performance Study of Instruction Cache Prefetching Methods
Prefetching methods for instruction caches are studied via trace-driven simulation. The two primary methods are "fall-through" prefetch (sometimes referred to as "one block lookahead") and "target" prefetch. Fall-through prefetches are for sequential ...