Most computers that support virtual memory translate virtual addresses to physical addresses using a translation lookaside buffer (TLB) and a page table. Time spent in TLB miss handling--number of TLB misses times average TLB miss penalty--is increasing due to workload, architectural, and technological trends. This thesis studies TLB architectures that reduce the number of TLB misses by increasing TLB reach--the maximum address space mapped by a TLB--and page table designs that decrease TLB miss penalty or support new TLB architectures without increasing TLB miss penalty.
First, this thesis evaluates two TLB architectures in commercial use--superpages and complete subblocking. This thesis studies the benefits of superpages and the issues involved in modifying operating systems and page tables to support superpages. Complete subblocking allows processor designers to use larger chip areas to build large TLBs within cycle time constraints. Simulation results show that for comparable chip area, complete-subblock TLBs have faster access times and incur fewer TLB misses than single-page-size TLBs without requiring operating system changes.
Second, this thesis proposes a new TLB architecture, partial subblocking, that combines the best features of complete subblocking and superpages. Simulation results show that superpage and subblock TLBs, for comparable chip area, incur fewer TLB misses than single-page-size TLBs. Further, partial-subblock TLBs require simpler operating systems and incur fewer misses than superpage TLBs.
Third, superpage and partial-subblock TLBs are ineffective without operating system support. This thesis identifies the policies and mechanisms required to support these TLBs. In particular, this thesis proposes a physical memory allocation algorithm, page reservation, that makes partial-subblock TLBs effective or eliminates page copying in superpage creation.
Fourth, this thesis suggests modifications to conventional page tables to support superpage and subblock TLBs and proposes a new page table structure, clustered page table, that augments hashed page tables with subblocking. Simulation results show that clustered page tables are smaller and have a faster access time than conventional page tables when using single-page-size TLBs. A clustered page table improves on these advantages when storing superpage and subblock PTEs.
Cited By
- Bhattacharjee A and Martonosi M (2010). Inter-core cooperative TLB for chip multiprocessors, ACM SIGARCH Computer Architecture News, 38:1, (359-370), Online publication date: 5-Mar-2010.
- Bhattacharjee A and Martonosi M (2010). Inter-core cooperative TLB for chip multiprocessors, ACM SIGPLAN Notices, 45:3, (359-370), Online publication date: 5-Mar-2010.
- Bhattacharjee A and Martonosi M Inter-core cooperative TLB for chip multiprocessors Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems, (359-370)
- Kandiraju G and Sivasubramaniam A Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, (129-139)
- Kandiraju G and Sivasubramaniam A (2019). Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks, ACM SIGMETRICS Performance Evaluation Review, 30:1, (129-139), Online publication date: 1-Jun-2002.
- Kandiraju G and Sivasubramaniam A (2002). Going the distance for TLB prefetching, ACM SIGARCH Computer Architecture News, 30:2, (195-206), Online publication date: 1-May-2002.
- Kandiraju G and Sivasubramaniam A Going the distance for TLB prefetching Proceedings of the 29th annual international symposium on Computer architecture, (195-206)
- Subramanian I, Mather C, Peterson K and Raghunath B Implementation of multiple pagesize support in HP-UX Proceedings of the annual conference on USENIX Annual Technical Conference, (9-9)
- Talluri M, Hill M and Khalidi Y (1995). A new page table for 64-bit address spaces, ACM SIGOPS Operating Systems Review, 29:5, (184-200), Online publication date: 3-Dec-1995.
- Talluri M, Hill M and Khalidi Y A new page table for 64-bit address spaces Proceedings of the fifteenth ACM symposium on Operating systems principles, (184-200)
Recommendations
Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors
MICRO '43: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on MicroarchitectureTranslation Look-aside Buffers (TLBs) are vital hardware support for virtual memory management in high performance computer systems and have a momentous influence on overall system performance. Numerous techniques to reduce TLB miss latencies including ...
Efficient Address Translation for Architectures with Multiple Page Sizes
ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating SystemsProcessors and operating systems (OSes) support multiple memory page sizes. Superpages increase Translation Lookaside Buffer (TLB) hits, while small pages provide fine-grained memory protection. Ideally, TLBs should perform well for any distribution of ...
Efficient Address Translation for Architectures with Multiple Page Sizes
Asplos'17Processors and operating systems (OSes) support multiple memory page sizes. Superpages increase Translation Lookaside Buffer (TLB) hits, while small pages provide fine-grained memory protection. Ideally, TLBs should perform well for any distribution of ...