[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

TLB Shootdown Mitigation for Low-Power Many-Core Servers with L1 Virtual Caches

Published: 01 January 2018 Publication History

Abstract

Power efficiency has become one of the most important design constraints for high-performance systems. In this paper, we revisit the design of low-power virtually-addressed caches. While virtually-addressed caches enable significant power savings by obviating the need for Translation Lookaside Buffer (TLB) lookups, they suffer from several challenging design issues that curtail their widespread commercial adoption. We focus on one of these challenges–cache flushes due to virtual page remappings. We use detailed studies on an ARM many-core server to show that this problem degrades performance by up to 25 percent for a mix of multi-programmed and multi-threaded workloads. Interestingly, we observe that many of these flushes are spurious, and caused by an indiscriminate invalidation broadcast on ARM architecture. In response, we propose a low-overhead and readily implementable hardware mechanism using bloom filters to reduce spurious invalidations and mitigate their ill effects.

References

[1]
N. Agarwal, D. Nellans, M. O'Connor, S. Keckler, and T. Wenisch, “Unlocking bandwidth for GPUs in CC-NUMA systems,” in Proc. Int. Symp. High Performance Comput. Archit., 2015, pp. 354–365.
[2]
A. Basu, J. Gandhi, J. Chang, M. D. Hill, and M. M. Swift, “Efficient virtual memory for big memory servers,” in Proc. 40th Annu. Int. Symp. Comput. Archit., 2013, pp. 237–248.
[3]
A. Basu, M. D. Hill, and M. M. Swift, “Reducing memory reference energy with opportunistic virtual caching,” in Proc. 39th Annu. Int. Symp. Comput. Archit., 2012, pp. 297–308.
[4]
B. H. Bloom, “Space/time trade-offs in hash coding with allowable errors,” Commun. ACM, vol. Volume 13, no. Issue 7, pp. 422–426, 1970.
[5]
J. L. Carter and M. N. Wegman, “Universal classes of hash functions (extended abstract),” in Proc. 9th Annu. ACM Symp. Theory Comput., 1977, pp. 106–112.
[6]
, ThunderX Family of Workload Optimized Processors . 2015.
[7]
M. Cekleov and M. Dubois, “Virtual-address caches, part 2: Multiprocessor issues” IEEE Micro, vol. Volume 17, no. Issue 6, pp. 69–74, 1997.
[8]
J. L. Henning, “SPEC CPU2006 benchmark descriptions,” SIGARCH Comput. Archit. News, 2006.
[9]
S. Kaxiras and A. Ros, “A new perspective for efficient virtual-cache coherence,” in Proc. 40th Annu. Int. Symp. Comput. Archit., 2013, pp. 353–546.
[10]
R. C. Murphy, K. B. Wheele, B. W. Barrett, and J. A. Ang, “Introducing the graph 500,” 2010.
[11]
M. Oskin and G. H. Loh, “A software managed approach to die-stacked DRAM,” in Proc. Int. Conf. Parallel Archit. Compilation, 2015, pp. 188–200.
[12]
C. H. Park, T. Heo, and J. Huh, “Efficient synonym filtering and scalable delayed translation for hybrid virtual caching,” in Proc. ACM/IEEE 43rd Annu. Int. Symp. Comput. Archit., 2016, pp. 90–102.
[13]
B. Pham, A. Bhattacharjee, Y. Eckert, and G. Loh, “Increasing TLB reach by exploiting clustering in page translations,” in Proc. IEEE 20th Int. Symp. High Performance Comput. Archit., 2014, pp. 558–567.
[14]
B. Pham, V. Vaidyanathan, A. Jaleel, and A. Bhattacharjee, “CoLT: Coalesced large-reach TLBs,” in Proc. 45th Annu. IEEE/ACM Int. Symp. Microarchitecture, 2012, pp. 258–269.
[15]
B. Pham, J. Vesely, G. H. Loh, and A. Bhattacharjee, “Large pages and lightweight memory management in virtualized environments: Can you have it both ways?” in Proc. 48th Annu. IEEE/ACM Int. Symp. Microarchitecture, 2015, pp. 1–12.
[16]
C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis, “Evaluating MapReduce for multi-core and multiprocessor systems,” in Proc. IEEE 13th Int. Symp. High Performance Comput. Archit., 2007, pp. 13–24.
[17]
B. Romanescu, A. Lebeck, D. Sorin, and A. Bracy, “Unified instruction/translation/data (UNITD) coherence: One protocol to rule them all,” in Proc. 16th Int. Symp. High-Performance Comput. Archit., 2010, pp. 1–12.
[18]
D. Sanchez, L. Yen, M. D. Hill, and K. Sankaralingam, “Implementing signatures for transactional memory,” in Proc. 40th Annu. IEEE/ACM Int. Symp. Microarchitecture, 2007, pp. 123–133.
[19]
C. Villavieja et al., “Didi: Mitigating the performance impact of TLB shootdowns using a shared TLB directory,” in Proc. Int. Conf. Parallel Archit. Compilation Techn., 2011, pp. 340–349.
[20]
Z. Yan, J. Vesely, G. Cox, and A. Bhattacharjee, “Hardware translation coherence for virtualized systems,” in Proc. 40th Annu. Int. Symp. Comput. Archit, 2017.
[21]
H. Yoon and G. S. Sohi, “Revisiting virtual l1 caches: A practical design using dynamic synonym remapping” in Proc. IEEE Int. Symp. High Performance Comput. Archit., 2016, pp. 212–224.

Cited By

View all
  • (2020)ATTC (@C)Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques10.1145/3410463.3414653(481-492)Online publication date: 30-Sep-2020
  • (2020)Enhancing Address Translations in Throughput Processors via CompressionProceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques10.1145/3410463.3414633(191-204)Online publication date: 30-Sep-2020
  • (2020)Don't shoot down TLB shootdowns!Proceedings of the Fifteenth European Conference on Computer Systems10.1145/3342195.3387518(1-14)Online publication date: 15-Apr-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Computer Architecture Letters
IEEE Computer Architecture Letters  Volume 17, Issue 1
January 2018
99 pages

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 January 2018

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2020)ATTC (@C)Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques10.1145/3410463.3414653(481-492)Online publication date: 30-Sep-2020
  • (2020)Enhancing Address Translations in Throughput Processors via CompressionProceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques10.1145/3410463.3414633(191-204)Online publication date: 30-Sep-2020
  • (2020)Don't shoot down TLB shootdowns!Proceedings of the Fifteenth European Conference on Computer Systems10.1145/3342195.3387518(1-14)Online publication date: 15-Apr-2020
  • (2018)Scalable distributed last-level TLBs using low-latency interconnectsProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00030(271-284)Online publication date: 20-Oct-2018
  • (2018)SEESAWProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00026(193-206)Online publication date: 2-Jun-2018

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media