[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3631882.3631902acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmemsysConference Proceedingsconference-collections
research-article

An Empirical Evaluation of PTE Coalescing

Published: 08 April 2024 Publication History

Abstract

Superpages (also known as huge pages) are an effective technique for reducing the latency of virtual-to-physical address translation on modern processors. However, the large size of the 2 MB and 1 GB superpages supported by x86-64 processors continues to present a challenge to the operating system’s ability to form superpages, given the mandatory contiguity, alignment, and attribute requirements of a superpage. Recent work proposes medium-sized superpages as a potential solution, by allowing the creation of smaller superpages where 2 MB and larger superpages have not formed or will not be possible to form. Notably, AMD processors starting with the Zen microarchitecture have offered a “PTE Coalescing” feature where the hardware opportunistically and transparently creates, from underlying consecutive and aligned 4 KB mappings in the page table, 16 KB or 32 KB mappings to be cached in the TLB. On the surface, this feature requires no modifications to the operating system or the compiler toolchain, exploiting only coincidental contiguity and alignment. Nonetheless, there are ways that either the operating system or the toolchain can be made coalescing-aware and hence make better use of PTE Coalescing. This paper first investigates undocumented aspects of PTE Coalescing, and then evaluates some operating system and toolchain optimizations which explicitly take advantage of it. We find that an operating system that is coalescing-friendly reduces L1 ITLB misses by 50%-80% compared to an operating system that is coalescing-unaware. For a Clang compilation workload, a coalescing-friendly operating system coupled with PTE Coalescing all but eliminates L2 ITLB misses. Last but not least, we evaluate the impact of granularity (16 KB vs 32 KB) on the effectiveness of PTE Coalescing. We find that reducing the coalescing granularity from 32 KB to 16 KB leads to a 1.3x-20.5x reduction in 4 KB L2 DTLB misses in a wide variety of workloads.

References

[1]
2016. React server-side rendering benchmark. https://www.npmjs.com/package/react-ssr-benchmarks.
[2]
2019. Processor Programming Reference (PPR) for AMD Family 17h Models 01h,08h, Revision B2 Processors.
[3]
2020. Software Optimization Guide for AMD Family 19h Processors, Revision 3.00. https://www.amd.com/en/support/tech-docs/56665-software-optimization-guide-for-amd-family-19h-processors-pub.
[4]
2021. Processor Programming Reference (PPR) for AMD Family 19h Model 21h, Revision B0 Processors.
[5]
2021. The RISC-V Instruction Set Manual, Volume II: Privileged Architecture.
[6]
2021. Software Optimization Guide for AMD Family 17h Processors, Revision 3.01. https://www.amd.com/en/support/tech-docs/software-optimization-guide-for-amd-family-17h-processors.
[7]
2023. Arm Architecture Reference Manual for A-profile architecture.
[8]
2023. The FreeBSD Handbook: Linux Binary Compatibility. https://docs.freebsd.org/en/books/handbook/linuxemu/.
[9]
2023. Node.js: an open-source, cross-platform JavaScript runtime environment. https://nodejs.org.
[10]
2023. perf events. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/pmu-events/arch/x86.
[11]
2023. perf: Linux profiling with performance counters. https://perf.wiki.kernel.org/index.php/Main_Page.
[12]
2023. pmcstat - configure processor sets. https://man.freebsd.org/cgi/man.cgi?query=cpuset&sektion=1.
[13]
2023. pmcstat - performance measurementwith performance monitoring hardware. https://man.freebsd.org/cgi/man.cgi?query=pmcstat&sektion=8.
[14]
2023. pmcstat events. https://cgit.freebsd.org/src/tree/lib/libpmc/pmu-events/arch/x86.
[15]
2023. QEMU. https://www.qemu.org/.
[16]
2023. The SQLite Amalgamation. https://www.sqlite.org/amalgamation.html.
[17]
2023. taskset - set or retrieve a process’s CPU affinity. https://www.man7.org/linux/man-pages/man1/taskset.1.html.
[18]
2023. V8. https://v8.dev.
[19]
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (Toronto, Ontario, Canada) (PACT ’08). Association for Computing Machinery, New York, NY, USA, 72–81. https://doi.org/10.1145/1454115.1454128
[20]
Y. Du, M. Zhou, B. R. Childers, and D. Mossénd R. Melhem. 2015. Supporting superpages in non-contiguous physical memory. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). 223–234. https://doi.org/10.1109/HPCA.2015.7056035
[21]
Mel Gorman and Patrick Healy. 2008. Supporting Superpage Allocation Without Additional Hardware Support. In Proceedings of the 7th International Symposium on Memory Management (Tucson, AZ, USA) (ISMM ’08). ACM, New York, NY, USA, 41–50. https://doi.org/10.1145/1375634.1375641
[22]
G. Grahne and J. Zhu. 2003. Efficiently Using Prefix-trees in Mining Frequent Itemsets. (2003).
[23]
Faruk Guvenilir and Yale N. Patt. 2020. Tailored Page Sizes. In Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture (Virtual Event) (ISCA ’20). IEEE Press, 900–912. https://doi.org/10.1109/ISCA45697.2020.00078
[24]
S Ritter GJ Gordon J Stamper, A Niculescu-Mizil and KR Koedinger. 2010. Bridge to algebra 2008–2009(Challenge data set from KDD Cup).
[25]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a Social Network or a News Media?. In Proceedings of the 19th International Conference on World Wide Web (Raleigh, North Carolina, USA) (WWW ’10). Association for Computing Machinery, New York, NY, USA, 591–600. https://doi.org/10.1145/1772690.1772751
[26]
Youngjin Kwon, Hangchen Yu, Simon Peter, Christopher J. Rossbach, and Emmett Witchel. 2016. Coordinated and Efficient Huge Page Management with Ingens. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (Savannah, GA, USA) (OSDI’16). USENIX Association, USA, 705–721.
[27]
Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-Scale Graph Computation on Just a PC. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (Hollywood, CA, USA) (OSDI’12). USENIX Association, USA, 31–46.
[28]
Linux hugepage text 2015. hugepage_text.cc. https://chromium.googlesource.com/experimental/chromium/src/+/refs/wip/bajones/webvr_1/chromeos/hugepage_text/hugepage_text.cc.
[29]
H.J. Lu, Kshitij Doshi, Rohit Seth, and Jantz Tran. 2006. Using Hugetlbfs for Mapping Application Text Regions. In Linux Symposium.
[30]
Juan Navarro, Sitararn Iyer, Peter Druschel, and Alan Cox. 2003. Practical, Transparent Operating System Support for Superpages. SIGOPS Oper. Syst. Rev. 36, SI (dec 2003), 89–104. https://doi.org/10.1145/844128.844138
[31]
Ashish Panwar, Naman Patel, and K. Gopinath. 2016. A Case for Protecting Huge Pages from the Kernel. In Proceedings of the 7th ACM SIGOPS Asia-Pacific Workshop on Systems (Hong Kong, Hong Kong) (APSys ’16). Association for Computing Machinery, New York, NY, USA, Article 15, 8 pages. https://doi.org/10.1145/2967360.2967371
[32]
Ashish Panwar, Aravinda Prasad, and K. Gopinath. 2018. Making Huge Pages Actually Useful. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (Williamsburg, VA, USA) (ASPLOS ’18). ACM, New York, NY, USA, 679–692. https://doi.org/10.1145/3173162.3173203
[33]
C. H. Park, T. Heo, J. Jeong, and J. Huh. 2017. Hybrid TLB coalescing: Improving TLB translation coverage under diverse fragmented memory allocations. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). 444–456. https://doi.org/10.1145/3079856.3080217
[34]
Binh Pham, Abhishek Bhattacharjee, Yasuko Eckert, and Gabriel H. Loh. 2014. Increasing TLB reach by exploiting clustering in page translations. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA). 558–567. https://doi.org/10.1109/HPCA.2014.6835964
[35]
Binh Pham, Viswanathan Vaidyanathan, Aamer Jaleel, and Abhishek Bhattacharjee. 2012. CoLT: Coalesced Large-Reach TLBs. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. 258–269. https://doi.org/10.1109/MICRO.2012.32
[36]
John Tramm, Andrew Siegel, Tanzima Islam, and Martin Schulz. 2014. XSBench - The development and verification of a performance abstraction for Monte Carlo reactor analysis.
[37]
Zi Yan, Daniel Lustig, David Nellans, and Abhishek Bhattacharjee. 2019. Translation Ranger: Operating System Support for Contiguity-Aware TLBs. In Proceedings of the 46th International Symposium on Computer Architecture (Phoenix, Arizona) (ISCA ’19). Association for Computing Machinery, New York, NY, USA, 698–710. https://doi.org/10.1145/3307650.3322223
[38]
Hsiang-Fu Yu, Cho-Jui Hsieh, Kai-Wei Chang, and Chih-Jen Lin. 2012. Large Linear Classification When Data Cannot Fit in Memory. ACM Trans. Knowl. Discov. Data 5, 4, Article 23 (feb 2012), 23 pages. https://doi.org/10.1145/2086737.2086743
[39]
Yufeng Zhou, Alan L. Cox, Sandhya Dwarkadas, and Xiaowan Dong. 2023. The Impact of Page Size and Microarchitecture on Instruction Address Translation Overhead. ACM Trans. Archit. Code Optim. 20, 3, Article 38 (jul 2023), 25 pages. https://doi.org/10.1145/3600089
[40]
Weixi Zhu, Alan L. Cox, and Scott Rixner. 2020. A Comprehensive Analysis of Superpage Management Mechanisms and Policies. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, 829–842. https://www.usenix.org/conference/atc20/presentation/zhu-weixi

Cited By

View all
  • (2024)Elastic Translations: Fast Virtual Memory with Multiple Translation Sizes2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00012(17-35)Online publication date: 2-Nov-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
MEMSYS '23: Proceedings of the International Symposium on Memory Systems
October 2023
231 pages
ISBN:9798400716447
DOI:10.1145/3631882
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 April 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. page table entry coalescing
  2. superpages
  3. translation look-aside buffer
  4. virtual memory
  5. virtual-to-physical address translation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

MEMSYS '23
MEMSYS '23: The International Symposium on Memory Systems
October 2 - 5, 2023
VA, Alexandria, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)10
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Elastic Translations: Fast Virtual Memory with Multiple Translation Sizes2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00012(17-35)Online publication date: 2-Nov-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media