[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/MICRO.2014.37acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
tutorial

Efficient Memory Virtualization: Reducing Dimensionality of Nested Page Walks

Published: 13 December 2014 Publication History

Abstract

Virtualization provides value for many workloads, but its cost rises for workloads with poor memory access locality. This overhead comes from translation look aside buffer (TLB) misses where the hardware performs a 2D page walk (up to 24 memory references on x86-64) rather than a native TLB miss (up to only 4 memory references). The first dimension translates guest virtual addresses to guest physical addresses, while the second translates guest physical addresses to host physical addresses.
This paper proposes new hardware using direct segments with three new virtualized modes of operation that significantly speed-up virtualized address translation. Further, this paper proposes two novel techniques to address important limitations of original direct segments. First, self-ballooning reduces fragmentation in physical memory, and addresses the architectural input/output (I/O) gap in x86-64. Second, an escape filter provides alternate translations for exceptional pages within a direct segment (e.g., Physical pages with permanent hard faults).
We emulate the proposed hardware and prototype the software in Linux with KVM on x86-64. One mode --- VMM Direct --- reduces address translation overhead to near-native without guest application or OS changes (2% slower than native on average), while a more aggressive mode --- Dual Direct --- on big-memory workloads performs better-than-native with near-zero translation overhead.

References

[1]
Adams, K. and Agesen, O. A comparison of software and hardware techniques for x86 virtualization. Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, (2006), pp. 2--13.
[2]
Agesen, O., Garthwaite, A., Sheldon, J., and Subrahmanyam, P. The evolution of an x86 virtual machine monitor. SIGOPS Oper. Syst. Rev. 44, 4 (2010), pp. 3--18.
[3]
Agesen, O., Mattson, J., Rugina, R., and Sheldon, J. Software techniques for avoiding hardware virtualization exits. Proceedings of the 2012 USENIX conference on Annual Technical Conference, USENET Association (2012), pp. 35--35.
[4]
Ahn, J., Jin, S., and Huh, J. Revisiting Hardware-Assisted Page Walks for Virtualized Systems. Proceedings of the 39th Annual International Symposium on Computer Architecture, (2012).
[5]
Amazon Elastic Compute Cloud (Amazon EC2), Cloud Computing Servers. http://aws.amazon.com/ec2/.
[6]
Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., and Warfield, A. Xen and the Art of Virtualization. Proceedings of the nineteenth ACM symposium on Operating systems principles and practice (SOSP '03), (2003), pp. 164--177.
[7]
Barr, T.W., Cox, A.L., and Rixner, S. Translation caching: skip, don't walk (the page table). Proceedings of the 37 th Annual International Symposium on Computer Architecture, (2010).
[8]
Barr, T.W., Cox, A.L., and Rixner, S. SpecTLB: a mechanism for speculative address translation. Proceedings of the 38th Annual International Symposium on Computer Architecture, (2011).
[9]
Basu, A., Gandhi, J., Chang, J., Hill, M.D., and Swift, M.M. Efficient Virtual Memory for Big Memory Servers. Proceedings of the 40th Annual International Symposium on Computer Architecture, IEEE Computer Society (2013).
[10]
Basu, A., Hill, M.D., and Swift, M.M. Reducing Memory Reference Energy With Opportunistic Virtual Caching. ISCA '12: Proceedings of the 39th annual international symposium on Computer architecture, (2012), pp. 297--308.
[11]
Bhargava, R., Serebrin, B., Spadini, F., and Manne, S. Accelerating two-dimensional page walks for virtualized systems. Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, (2008).
[12]
Bhattacharjee, A. Large-Reach Memory Management Unit Caches. Proceedings of the 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture, IEEE Computer Society (2013).
[13]
Bhattacharjee, A., Lustig, D., and Martonosi, M. Shared last-level TLBs for chip multiprocessors. Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture, IEEE Computer Society (2011), pp. 62--63.
[14]
Bhattacharjee, A. and Martonosi, M. Characterizing the TLB Behavior of Emerging Parallel Workloads on Chip Multiprocessors. Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques, IEEE Computer Society (2009), pp. 29--40.
[15]
Bhattacharjee, A. and Martonosi, M. Inter-core cooperative TLB for chip multiprocessors. Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, (2010).
[16]
Buell, J., Hecht, D., Heo, J., Saladi, K., and Taheri, R.H. Methodology for Performance Analysis of VMware vSphere under Tier-1 Applications. VMware Technical Journal, Summer 2013, pp. 19-- 28.
[17]
Bugnion, E., Devine, S., Govil, K., and Rosenblum, M. Disco: Running Commodity Operating Systems on Scalable Multiprocessors. ACM Transactions on Computer Systems 15, 4 (1997), pp. 319--349.
[18]
Chang, X., Franke, H., Ge, Y., Liu, T., Wang, K., Xenidis, J., Chen, F., and Zhang, Y. Improving virtualization in the presence of software managed translation lookaside buffers. Proceedings of the 40th Annual International Symposium on Computer Architecture, ACM (2013), pp. 120--129.
[19]
Corbet, J. Transparent huge pages. 2011. www.lwn.net/Articles/423584/.
[20]
Corbet, J. Memory compaction. http://lwn.net/Articles/368869/.
[21]
Daley, R.C. and Dennis, J.B. Virtual memory, processes, and sharing in Multics. Proceedings of the first ACM symposium on Operating System Principles, ACM (1967), 12.1--12.8.
[22]
Fang, Z., Zhang, L., Carter, J.B., Hsieh, W.C., and McKee, S.A. Reevaluating Online Superpage Promotion with Hardware Support. Proceedings of the 7th International Symposium on High-Performance Computer Architecture, IEEE Computer Society (2001), pp. 63--.
[23]
Ganapathy, N. and Schimmel, C. General purpose operating system support for multiple page sizes. Proceedings of the annual conference on USENIX Annual Technical Conference, USENIX Association (1998), pp. 8--8.
[24]
Gandhi, J., Basu, A., Swift, M.M., and Hill, M.D. BadgerTrap: A Tool to Instrument x86-64 TLB Misses. SIGARCH Computer Architecture News, (2014).
[25]
Goldberg, R.P. Survey of virtual machine research. Computer 7, 9 (1974), pp. 34--45.
[26]
Hwang, A.A., Ioan A. Stefanovici, and Schroeder, B. Cosmic rays don't strike twice: understanding the nature of DRAM errors and the implications for system design. Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, (2012), pp. 111--122.
[27]
Intel® Itanium® Architecture Developer's Manual, Vol. 2. http://www.intel.com/content/www/us/en/processors/itanium/itaniu m-architecture-software-developer-rev-2-3-vol-2-manual.html.
[28]
Intel 8086 - Wikipedia, the free encyclopedia. http://en.wikipedia.org/wiki/Intel 8086.
[29]
Intel Corp. Intel Chipset 4GB System Memory Support. 2005. http://www.polywell.com/us/support/faq/4gb_rev1.pdf.
[30]
Jacob, B. and Mudge, T. Virtual Memory in Contemporary Microprocessors. IEEE Micro 18, 4 (1998), pp. 60--75.
[31]
Jacob, B. and Mudge, T. Uniprocessor Virtual Memory without TLBs. IEEE Trans. Comput. 50, 5 (2001), pp. 482--499.
[32]
Kandiraju, G.B. and Sivasubramaniam, A. Going the distance for TLB prefetching: an application-driven study. Proceedings of the 29th Annual International Symposium on Computer Architecture, (2002).
[33]
Kivity, A., Kamay, Y., Laor, D., Lublin, U., and Liguori, A. kvm: the Linux Virtual Machine Monitor. Proceedings of the Linux Symposium, (2007), pp. 225--230.
[34]
Linux Perf Wiki. https://perf.wiki.kernel.org/index.php/Main_Page.
[35]
Linux Virtio Balloon Driver. http://lxr.free-electrons.com/source/drivers/virtio/virtio_balloonx.
[36]
Lowe, S. SPCS001: Intel Next-Generation Haswell Microarchitecture. http://blog.scottlowe.org/2012/09/11/spcs001-intel-next-generation-haswell-microarchitecture.
[37]
Lustig, D., Bhattacharje, A., and Martonosi, M. TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs. ACM Transactions on Architecture and Code Optimization, (2013).
[38]
Memory Hotplug. https://www.kernel.org/doc/Documentation/memory-hotplug.txt.
[39]
Microsystems, S. UltraSPARC T2™ Supplement to the UltraSPARC Architecture 2007. 2007.
[40]
PCI-SIG SR-IOV Primer: An Introduction to SR-IOV Technology. 2011. http://www.intel.com/content/www/us/en/pci-express/pci-sig-sr-iov-primer-sr-iov-technology-paper.html.
[41]
Performance Evaluation of Intel EPT Hardware Assist. 2008.
[42]
Pham, B., Bhattacharjee, A., Eckert, Y., and Loh, G.H. Increasing TLB reach by exploiting clustering in page translations. 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), (2014), pp. 558--567.
[43]
Pham, B., Vaidyanathan, V., Jaleel, A., and Bhattacharjee, A. CoLT: Coalesced Large-Reach TLBs. Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, IEEE Computer Society (2012), pp. 258--269.
[44]
Sanchez, D., Yen, L., Hill, M.D., and Sankaralingam, K. Implementing Signatures for Transactional Memory. Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, (2007).
[45]
Sembrant, A., Hagersten, E., and Black-Schaffer, D. The Direct-to-Data (D2D) Cache: Navigating the Cache Hierarchy with a Single Lookup. Proceeding of the 41st Annual International Symposium on Computer Architecuture, IEEE Press (2014), pp. 133--144.
[46]
Sembrant, A., Hagersten, E., and Black-Shaffer, D. TLC: A Tag-less Cache for Reducing Dynamic First Level Cache Energy. Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, ACM (2013), pp. 49--61.
[47]
Seznec, A. Concurrent Support for Multiple Page Sizes on a Skewed Associative TLB. IEEE Transactions on Computers 53(7), (2004), pp. 924--927.
[48]
Subramanian, I., Mather, C., Peterson, K., and Raghunath, B. Implementation of Multiple Pagesize Support in HP-UX. Proceedings of the Annual Conference on USENIX Annual Technical Conference, USENIX Association (1998), 9--9.
[49]
Swanson, M., Stoller, L., and Carter, J. Increasing TLB Reach Using Superpages Backed by Shadow Memory. Proceedings of the 25th Annual International Symposium on Computer Architecture, IEEE Computer Society (1998), pp. 204--213.
[50]
Talluri, M. and Hill, M.D. Surpassing the TLB performance of superpages with less operating system support. Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, (1994).
[51]
Talluri, M., Kong, S., Hill, M.D., and Patterson, D.A. Tradeoffs in Supporting Two Page Sizes. Proceedings of the 19th Annual International Symposium on Computer Architecture, (1992).
[52]
Waldspurger, C.A. Memory Resource Management in VMware ESX Server. Proceedings of the 2002 Symposium on Operating Systems Design and Implementation, (2002).
[53]
Wang, X., Zang, J., Wang, Z., Luo, Y., and Li, X. Selective hardware/software memory virtualization. Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, ACM (2011), pp. 217--226.
[54]
Wood, D.A., Eggers, S.J., Gibson, G., Hill, M.D., and Pendleton, J.M. An in-cache address translation mechanism. Proceedings of 13th annual international symposium on Computer architecture, (1986).
[55]
Yoshii, K., Iskra, K., Naik, H., Beckman, P., and Broekema, P. Characterizing the Performance of "Big Memory" on Blue Gene Linux. International Conference on Parallel Processing Workshops, 2009. ICPPW '09, (2009), pp. 65--72.

Cited By

View all
  • (2024)Direct Memory Translation for Virtualized CloudsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640358(287-304)Online publication date: 27-Apr-2024
  • (2023)Accelerating Extra Dimensional Page Walks for Confidential ComputingProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614293(654-669)Online publication date: 28-Oct-2023
  • (2023)FlexPointer: Fast Address Translation Based on Range TLB and Tagged PointersACM Transactions on Architecture and Code Optimization10.1145/357985420:2(1-24)Online publication date: 1-Mar-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO-47: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture
December 2014
697 pages
ISBN:9781479969982

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 13 December 2014

Check for updates

Author Tags

  1. translation lookaside buffer
  2. virtual machines
  3. virtual memory
  4. virtualization

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Conference

MICRO-47
Sponsor:

Acceptance Rates

MICRO-47 Paper Acceptance Rate 53 of 279 submissions, 19%;
Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)1
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Direct Memory Translation for Virtualized CloudsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640358(287-304)Online publication date: 27-Apr-2024
  • (2023)Accelerating Extra Dimensional Page Walks for Confidential ComputingProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614293(654-669)Online publication date: 28-Oct-2023
  • (2023)FlexPointer: Fast Address Translation Based on Range TLB and Tagged PointersACM Transactions on Architecture and Code Optimization10.1145/357985420:2(1-24)Online publication date: 1-Mar-2023
  • (2023)Lens: Experiencing Multi-level Page Tables at Close QuartersProceedings of the ACM Conference on Global Computing Education Vol 110.1145/3576882.3617912(105-111)Online publication date: 5-Dec-2023
  • (2022)Every walk’s a hit: making page walks single-access cache hitsProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507718(128-141)Online publication date: 28-Feb-2022
  • (2022)Designing Virtual Memory System of MCM GPUsProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00036(404-422)Online publication date: 1-Oct-2022
  • (2021)Trident: Harnessing Architectural Resources for All Page Sizes in x86 ProcessorsMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480062(1106-1120)Online publication date: 18-Oct-2021
  • (2021)Compendia: reducing virtual-memory costs via selective densificationProceedings of the 2021 ACM SIGPLAN International Symposium on Memory Management10.1145/3459898.3463902(52-65)Online publication date: 22-Jun-2021
  • (2021)Extending Intel PML for hardware-assisted working set size estimation of VMsProceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3453933.3454018(111-124)Online publication date: 7-Apr-2021
  • (2021)(No)Compromis: paging virtualization is not a fatalityProceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3453933.3454013(43-56)Online publication date: 7-Apr-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media