More Web Proxy on the site http://driver.im/

tutorial

Efficient Memory Virtualization: Reducing Dimensionality of Nested Page Walks

Authors:

Jayneel Gandhi,

Arkaprava Basu,

Michael M. SwiftAuthors Info & Claims

MICRO-47: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 178 - 189

https://doi.org/10.1109/MICRO.2014.37

Published: 13 December 2014 Publication History

Abstract

Virtualization provides value for many workloads, but its cost rises for workloads with poor memory access locality. This overhead comes from translation look aside buffer (TLB) misses where the hardware performs a 2D page walk (up to 24 memory references on x86-64) rather than a native TLB miss (up to only 4 memory references). The first dimension translates guest virtual addresses to guest physical addresses, while the second translates guest physical addresses to host physical addresses.

This paper proposes new hardware using direct segments with three new virtualized modes of operation that significantly speed-up virtualized address translation. Further, this paper proposes two novel techniques to address important limitations of original direct segments. First, self-ballooning reduces fragmentation in physical memory, and addresses the architectural input/output (I/O) gap in x86-64. Second, an escape filter provides alternate translations for exceptional pages within a direct segment (e.g., Physical pages with permanent hard faults).

We emulate the proposed hardware and prototype the software in Linux with KVM on x86-64. One mode --- VMM Direct --- reduces address translation overhead to near-native without guest application or OS changes (2% slower than native on average), while a more aggressive mode --- Dual Direct --- on big-memory workloads performs better-than-native with near-zero translation overhead.

References

[1]

Adams, K. and Agesen, O. A comparison of software and hardware techniques for x86 virtualization. Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, (2006), pp. 2--13.

Digital Library

[2]

Agesen, O., Garthwaite, A., Sheldon, J., and Subrahmanyam, P. The evolution of an x86 virtual machine monitor. SIGOPS Oper. Syst. Rev. 44, 4 (2010), pp. 3--18.

Digital Library

[3]

Agesen, O., Mattson, J., Rugina, R., and Sheldon, J. Software techniques for avoiding hardware virtualization exits. Proceedings of the 2012 USENIX conference on Annual Technical Conference, USENET Association (2012), pp. 35--35.

Digital Library

[4]

Ahn, J., Jin, S., and Huh, J. Revisiting Hardware-Assisted Page Walks for Virtualized Systems. Proceedings of the 39th Annual International Symposium on Computer Architecture, (2012).

Digital Library

[5]

Amazon Elastic Compute Cloud (Amazon EC2), Cloud Computing Servers. http://aws.amazon.com/ec2/.

[6]

Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., and Warfield, A. Xen and the Art of Virtualization. Proceedings of the nineteenth ACM symposium on Operating systems principles and practice (SOSP '03), (2003), pp. 164--177.

Digital Library

[7]

Barr, T.W., Cox, A.L., and Rixner, S. Translation caching: skip, don't walk (the page table). Proceedings of the 37 th Annual International Symposium on Computer Architecture, (2010).

Digital Library

[8]

Barr, T.W., Cox, A.L., and Rixner, S. SpecTLB: a mechanism for speculative address translation. Proceedings of the 38th Annual International Symposium on Computer Architecture, (2011).

Digital Library

[9]

Basu, A., Gandhi, J., Chang, J., Hill, M.D., and Swift, M.M. Efficient Virtual Memory for Big Memory Servers. Proceedings of the 40th Annual International Symposium on Computer Architecture, IEEE Computer Society (2013).

Digital Library

[10]

Basu, A., Hill, M.D., and Swift, M.M. Reducing Memory Reference Energy With Opportunistic Virtual Caching. ISCA '12: Proceedings of the 39th annual international symposium on Computer architecture, (2012), pp. 297--308.

Digital Library

[11]

Bhargava, R., Serebrin, B., Spadini, F., and Manne, S. Accelerating two-dimensional page walks for virtualized systems. Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, (2008).

Digital Library

[12]

Bhattacharjee, A. Large-Reach Memory Management Unit Caches. Proceedings of the 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture, IEEE Computer Society (2013).

Digital Library

[13]

Bhattacharjee, A., Lustig, D., and Martonosi, M. Shared last-level TLBs for chip multiprocessors. Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture, IEEE Computer Society (2011), pp. 62--63.

Digital Library

[14]

Bhattacharjee, A. and Martonosi, M. Characterizing the TLB Behavior of Emerging Parallel Workloads on Chip Multiprocessors. Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques, IEEE Computer Society (2009), pp. 29--40.

Digital Library

[15]

Bhattacharjee, A. and Martonosi, M. Inter-core cooperative TLB for chip multiprocessors. Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, (2010).

Digital Library

[16]

Buell, J., Hecht, D., Heo, J., Saladi, K., and Taheri, R.H. Methodology for Performance Analysis of VMware vSphere under Tier-1 Applications. VMware Technical Journal, Summer 2013, pp. 19-- 28.

[17]

Bugnion, E., Devine, S., Govil, K., and Rosenblum, M. Disco: Running Commodity Operating Systems on Scalable Multiprocessors. ACM Transactions on Computer Systems 15, 4 (1997), pp. 319--349.

Digital Library

[18]

Chang, X., Franke, H., Ge, Y., Liu, T., Wang, K., Xenidis, J., Chen, F., and Zhang, Y. Improving virtualization in the presence of software managed translation lookaside buffers. Proceedings of the 40th Annual International Symposium on Computer Architecture, ACM (2013), pp. 120--129.

Digital Library

[19]

Corbet, J. Transparent huge pages. 2011. www.lwn.net/Articles/423584/.

[20]

Corbet, J. Memory compaction. http://lwn.net/Articles/368869/.

[21]

Daley, R.C. and Dennis, J.B. Virtual memory, processes, and sharing in Multics. Proceedings of the first ACM symposium on Operating System Principles, ACM (1967), 12.1--12.8.

Digital Library

[22]

Fang, Z., Zhang, L., Carter, J.B., Hsieh, W.C., and McKee, S.A. Reevaluating Online Superpage Promotion with Hardware Support. Proceedings of the 7th International Symposium on High-Performance Computer Architecture, IEEE Computer Society (2001), pp. 63--.

Digital Library

[23]

Ganapathy, N. and Schimmel, C. General purpose operating system support for multiple page sizes. Proceedings of the annual conference on USENIX Annual Technical Conference, USENIX Association (1998), pp. 8--8.

Digital Library

[24]

Gandhi, J., Basu, A., Swift, M.M., and Hill, M.D. BadgerTrap: A Tool to Instrument x86-64 TLB Misses. SIGARCH Computer Architecture News, (2014).

Digital Library

[25]

Goldberg, R.P. Survey of virtual machine research. Computer 7, 9 (1974), pp. 34--45.

Digital Library

[26]

Hwang, A.A., Ioan A. Stefanovici, and Schroeder, B. Cosmic rays don't strike twice: understanding the nature of DRAM errors and the implications for system design. Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, (2012), pp. 111--122.

Digital Library

[27]

Intel® Itanium® Architecture Developer's Manual, Vol. 2. http://www.intel.com/content/www/us/en/processors/itanium/itaniu m-architecture-software-developer-rev-2-3-vol-2-manual.html.

[28]

Intel 8086 - Wikipedia, the free encyclopedia. http://en.wikipedia.org/wiki/Intel 8086.

[29]

Intel Corp. Intel Chipset 4GB System Memory Support. 2005. http://www.polywell.com/us/support/faq/4gb_rev1.pdf.

[30]

Jacob, B. and Mudge, T. Virtual Memory in Contemporary Microprocessors. IEEE Micro 18, 4 (1998), pp. 60--75.

Digital Library

[31]

Jacob, B. and Mudge, T. Uniprocessor Virtual Memory without TLBs. IEEE Trans. Comput. 50, 5 (2001), pp. 482--499.

Digital Library

[32]

Kandiraju, G.B. and Sivasubramaniam, A. Going the distance for TLB prefetching: an application-driven study. Proceedings of the 29th Annual International Symposium on Computer Architecture, (2002).

Digital Library

[33]

Kivity, A., Kamay, Y., Laor, D., Lublin, U., and Liguori, A. kvm: the Linux Virtual Machine Monitor. Proceedings of the Linux Symposium, (2007), pp. 225--230.

[34]

Linux Perf Wiki. https://perf.wiki.kernel.org/index.php/Main_Page.

[35]

Linux Virtio Balloon Driver. http://lxr.free-electrons.com/source/drivers/virtio/virtio_balloonx.

[36]

Lowe, S. SPCS001: Intel Next-Generation Haswell Microarchitecture. http://blog.scottlowe.org/2012/09/11/spcs001-intel-next-generation-haswell-microarchitecture.

[37]

Lustig, D., Bhattacharje, A., and Martonosi, M. TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs. ACM Transactions on Architecture and Code Optimization, (2013).

Digital Library

[38]

Memory Hotplug. https://www.kernel.org/doc/Documentation/memory-hotplug.txt.

[39]

Microsystems, S. UltraSPARC T2™ Supplement to the UltraSPARC Architecture 2007. 2007.

[40]

PCI-SIG SR-IOV Primer: An Introduction to SR-IOV Technology. 2011. http://www.intel.com/content/www/us/en/pci-express/pci-sig-sr-iov-primer-sr-iov-technology-paper.html.

[41]

Performance Evaluation of Intel EPT Hardware Assist. 2008.

[42]

Pham, B., Bhattacharjee, A., Eckert, Y., and Loh, G.H. Increasing TLB reach by exploiting clustering in page translations. 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), (2014), pp. 558--567.

[43]

Pham, B., Vaidyanathan, V., Jaleel, A., and Bhattacharjee, A. CoLT: Coalesced Large-Reach TLBs. Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, IEEE Computer Society (2012), pp. 258--269.

Digital Library

[44]

Sanchez, D., Yen, L., Hill, M.D., and Sankaralingam, K. Implementing Signatures for Transactional Memory. Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, (2007).

Digital Library

[45]

Sembrant, A., Hagersten, E., and Black-Schaffer, D. The Direct-to-Data (D2D) Cache: Navigating the Cache Hierarchy with a Single Lookup. Proceeding of the 41st Annual International Symposium on Computer Architecuture, IEEE Press (2014), pp. 133--144.

Digital Library

[46]

Sembrant, A., Hagersten, E., and Black-Shaffer, D. TLC: A Tag-less Cache for Reducing Dynamic First Level Cache Energy. Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, ACM (2013), pp. 49--61.

Digital Library

[47]

Seznec, A. Concurrent Support for Multiple Page Sizes on a Skewed Associative TLB. IEEE Transactions on Computers 53(7), (2004), pp. 924--927.

Digital Library

[48]

Subramanian, I., Mather, C., Peterson, K., and Raghunath, B. Implementation of Multiple Pagesize Support in HP-UX. Proceedings of the Annual Conference on USENIX Annual Technical Conference, USENIX Association (1998), 9--9.

Digital Library

[49]

Swanson, M., Stoller, L., and Carter, J. Increasing TLB Reach Using Superpages Backed by Shadow Memory. Proceedings of the 25th Annual International Symposium on Computer Architecture, IEEE Computer Society (1998), pp. 204--213.

Digital Library

[50]

Talluri, M. and Hill, M.D. Surpassing the TLB performance of superpages with less operating system support. Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, (1994).

Digital Library

[51]

Talluri, M., Kong, S., Hill, M.D., and Patterson, D.A. Tradeoffs in Supporting Two Page Sizes. Proceedings of the 19th Annual International Symposium on Computer Architecture, (1992).

Digital Library

[52]

Waldspurger, C.A. Memory Resource Management in VMware ESX Server. Proceedings of the 2002 Symposium on Operating Systems Design and Implementation, (2002).

Digital Library

[53]

Wang, X., Zang, J., Wang, Z., Luo, Y., and Li, X. Selective hardware/software memory virtualization. Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, ACM (2011), pp. 217--226.

Digital Library

[54]

Wood, D.A., Eggers, S.J., Gibson, G., Hill, M.D., and Pendleton, J.M. An in-cache address translation mechanism. Proceedings of 13th annual international symposium on Computer architecture, (1986).

Digital Library

[55]

Yoshii, K., Iskra, K., Naik, H., Beckman, P., and Broekema, P. Characterizing the Performance of "Big Memory" on Blue Gene Linux. International Conference on Parallel Processing Workshops, 2009. ICPPW '09, (2009), pp. 65--72.

Digital Library

Cited By

Zhang JJia WChai SLiu PKim JXu TTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)Direct Memory Translation for Virtualized CloudsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640358(287-304)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640358
Du DYang BXia YChen H(2023)Accelerating Extra Dimensional Page Walks for Confidential ComputingProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614293(654-669)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614293
Chen DTong DYang CYi JCheng X(2023)FlexPointer: Fast Address Translation Based on Range TLB and Tagged PointersACM Transactions on Architecture and Code Optimization10.1145/357985420:2(1-24)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1145/3579854
Show More Cited By

Index Terms

Efficient Memory Virtualization: Reducing Dimensionality of Nested Page Walks
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Main memory

Recommendations

Large pages and lightweight memory management in virtualized environments: can you have it both ways?
MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture

Large pages have long been used to mitigate address translation overheads on big-memory systems, particularly in virtualized environments where TLB miss overheads are severe. We show, however, that far from being a panacea, large pages are used ...
Agile paging: exceeding the best of nested and shadow paging
ISCA '16: Proceedings of the 43rd International Symposium on Computer Architecture

Virtualization provides benefits for many workloads, but the overheads of virtualizing memory are not universally low. The cost comes from managing two levels of address translation---one in the guest virtual machine (VM) and the other in the host ...
Agile paging: exceeding the best of nested and shadow paging
ISCA'16

Virtualization provides benefits for many workloads, but the overheads of virtualizing memory are not universally low. The cost comes from managing two levels of address translation---one in the guest virtual machine (VM) and the other in the host ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO-47: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture

December 2014

697 pages

ISBN:9781479969982

General Chair:
Krisztian Flautner
ARM
,
Program Chairs:
Thomas F. Wenisch
University of Michigan
,
Emre Ozer
ARM
,
Publications Chair:
Michael Ferdman
Stony Brook University

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 13 December 2014

Check for updates

Author Tags

Qualifiers

Tutorial
Research
Refereed limited

Conference

MICRO-47

Sponsor:

SIGMICRO

MICRO-47: The 47th Annual IEEE/ACM International Symposium of Microarchitecture

December 13 - 17, 2014

Cambridge, United Kingdom

Acceptance Rates

MICRO-47 Paper Acceptance Rate 53 of 279 submissions, 19%;

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

48
Total Citations
View Citations
555
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)1

Reflects downloads up to 31 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang JJia WChai SLiu PKim JXu TTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)Direct Memory Translation for Virtualized CloudsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640358(287-304)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640358
Du DYang BXia YChen H(2023)Accelerating Extra Dimensional Page Walks for Confidential ComputingProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614293(654-669)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614293
Chen DTong DYang CYi JCheng X(2023)FlexPointer: Fast Address Translation Based on Range TLB and Tagged PointersACM Transactions on Architecture and Code Optimization10.1145/357985420:2(1-24)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1145/3579854
KP ASingh RMishra DChoppella VPhatak DLuxton-Reilly ACraig M(2023)Lens: Experiencing Multi-level Page Tables at Close QuartersProceedings of the ACM Conference on Global Computing Education Vol 110.1145/3576882.3617912(105-111)Online publication date: 5-Dec-2023
https://dl.acm.org/doi/10.1145/3576882.3617912
Park CVougioukas ISandberg ABlack-Schaffer DFalsafi BFerdman MLu SWenisch T(2022)Every walk’s a hit: making page walks single-access cache hitsProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507718(128-141)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507718
B PJawalkar NBasu AHardavellas NCampanoni SGrot BKarpuzcu U(2022)Designing Virtual Memory System of MCM GPUsProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00036(404-422)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1109/MICRO56248.2022.00036
Ram VPanwar ABasu A(2021)Trident: Harnessing Architectural Resources for All Page Sizes in x86 ProcessorsMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480062(1106-1120)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480062
Ainsworth SJones TWang ZWrigstad T(2021)Compendia: reducing virtual-memory costs via selective densificationProceedings of the 2021 ACM SIGPLAN International Symposium on Memory Management10.1145/3459898.3463902(52-65)Online publication date: 22-Jun-2021
https://dl.acm.org/doi/10.1145/3459898.3463902
Bitchebe SMvondo DRéveillère Lde Palma NTchana ATitzer BXu HZhang I(2021)Extending Intel PML for hardware-assisted working set size estimation of VMsProceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3453933.3454018(111-124)Online publication date: 7-Apr-2021
https://dl.acm.org/doi/10.1145/3453933.3454018
Teabe BYuhala PTchana AHermenier FHagimont DMuller GTitzer BXu HZhang I(2021)(No)Compromis: paging virtualization is not a fatalityProceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3453933.3454013(43-56)Online publication date: 7-Apr-2021
https://dl.acm.org/doi/10.1145/3453933.3454013
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents