More Web Proxy on the site http://driver.im/

research-article

Public Access

Twig: Profile-Guided BTB Prefetching for Data Center Applications

Authors:

Tanvir Ahmed Khan,

Akshitha Sriraman,

Niranjan K Soundararajan,

Joseph Devietti,

Sreenivas Subramoney,

Gilles A Pokam,

Baris KasikciAuthors Info & Claims

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 816 - 829

https://doi.org/10.1145/3466752.3480124

Published: 17 October 2021 Publication History

All formats PDF

Abstract

Modern data center applications have deep software stacks, with instruction footprints that are orders of magnitude larger than typical instruction cache (I-cache) sizes. To efficiently prefetch instructions into the I-cache despite large application footprints, modern server-class processors implement a decoupled frontend with Fetch Directed Instruction Prefetching (FDIP). In this work, we first characterize the limitations of a decoupled frontend processor with FDIP and find that FDIP suffers from significant Branch Target Buffer (BTB) misses. We also find that existing techniques (e.g., stream prefetchers and predecoders) are unable to mitigate these misses, as they rely on an incomplete understanding of a program’s branching behavior.

To address the shortcomings of existing BTB prefetching techniques, we propose Twig, a novel profile-guided BTB prefetching mechanism. Twig analyzes a production binary’s execution profile to identify critical BTB misses and inject BTB prefetch instructions into code. Additionally, Twig coalesces multiple non-contiguous BTB prefetches to improve the BTB’s locality. Twig exposes these techniques via new BTB prefetch instructions. Since Twig prefetches BTB entries without modifying the underlying BTB organization, it is easy to adopt in modern processors. We study Twig’s behavior across nine widely-used data center applications, and demonstrate that it achieves an average 20.86% (up to 145%) performance speedup over a baseline 8K-entry BTB, outperforming the state-of-the-art BTB prefetch mechanism by 19.82% (on average).

References

[1]

[n. d.]. Adding Processor Trace support to Linux. https://lwn.net/Articles/648154/.

[2]

[n. d.]. Apache Cassandra. http://cassandra.apache.org/.

[3]

[n. d.]. Apache kafka. https://kafka.apache.org/powered-by.

[4]

[n. d.]. Apache Tomcat. https://tomcat.apache.org/.

[5]

[n. d.]. An Introduction to Last Branch Records. https://lwn.net/Articles/680985/.

[6]

[n. d.]. Scarab. https://github.com/hpsresearchgroup/scarab.

[7]

[n. d.]. Twitter Finagle. https://twitter.github.io/finagle/.

[8]

[n. d.]. Verilator. https://www.veripool.org/wiki/verilator.

[9]

2019. facebookarchive/oss-performance: Scripts for benchmarking various php implementations when running open source software. https://github.com/facebookarchive/oss-performance. (Online; last accessed 15-November-2019).

[10]

Keith Adams, Jason Evans, Bertrand Maher, Guilherme Ottoni, Andrew Paroski, Brett Simmers, Edwin Smith, and Owen Yamauchi. 2014. The hiphop virtual machine. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications. 777–790.

Digital Library

[11]

Ali Ansari, Fatemeh Golshan, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. 2020. MANA: Microarchitecting an instruction prefetcher. The First Instruction Prefetching Championship (2020).

[12]

Ali Ansari, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. 2020. Divide and Conquer Frontend Bottleneck. In Proceedings of the 47th Annual International Symposium on Computer Architecture.

Digital Library

[13]

Truls Asheim, Boris Grot, and Rakesh Kumar. 2021. BTB-X: A Storage-Effective BTB Organization. IEEE Computer Architecture Letters(2021).

[14]

Grant Ayers, Jung Ho Ahn, Christos Kozyrakis, and Parthasarathy Ranganathan. 2018. Memory hierarchy for web search. In 2018 IEEE International Symposium on High Performance Computer Architecture. IEEE, 643–656.

[15]

Grant Ayers, Heiner Litz, Christos Kozyrakis, and Parthasarathy Ranganathan. 2020. Classifying Memory Access Patterns for Prefetching. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 513–526.

Digital Library

[16]

Grant Ayers, Nayana Prasad Nagendra, David I August, Hyoun Kyu Cho, Svilen Kanev, Christos Kozyrakis, Trivikram Krishnamurthy, Heiner Litz, Tipp Moseley, and Parthasarathy Ranganathan. 2019. Asmdb: understanding and mitigating front-end stalls in warehouse-scale computers. In Proceedings of the 46th International Symposium on Computer Architecture. 462–473.

Digital Library

[17]

Stephen M Blackburn, Robin Garner, Chris Hoffmann, Asjad M Khang, Kathryn S McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z Guyer, 2006. The DaCapo benchmarks: Java benchmarking development and analysis. In Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications. 169–190.

Digital Library

[18]

James Bonanno, Adam Collura, Daniel Lipetz, Ulrich Mayer, Brian Prasky, and Anthony Saporito. 2013. Two level bulk preload branch prediction. In 2013 IEEE 19th International Symposium on High Performance Computer Architecture. IEEE, 71–82.

Digital Library

[19]

Peter Braun and Heiner Litz. 2019. Understanding memory access patterns for prefetching. In International Workshop on AI-assisted Design for Architecture (AIDArc), held in conjunction with ISCA.

[20]

Ioana Burcea and Andreas Moshovos. 2009. Phantom-BTB: a virtualized branch target buffer design. Acm Sigplan Notices 44, 3 (2009), 313–324.

Digital Library

[21]

Michael Butler, Leslie Barnes, Debjit Das Sarma, and Bob Gelinas. 2011. Bulldozer: An approach to multithreaded compute performance. IEEE Micro 31, 2 (2011), 6–15.

Digital Library

[22]

Dehao Chen, Tipp Moseley, and David Xinliang Li. 2016. AutoFDO: Automatic feedback-directed optimization for warehouse-scale applications. In CGO.

[23]

Robert Cohn and P Geoffrey Lowney. 1996. Hot cold optimization of large Windows/NT applications. In Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 80–89.

[24]

Barry Fagin. 1997. Partial resolution in branch target buffers. IEEE Trans. Comput. 46, 10 (1997), 1142–1145.

Digital Library

[25]

Michael Ferdman, Cansu Kaynak, and Babak Falsafi. 2011. Proactive instruction fetch. In International Symposium on Microarchitecture.

Digital Library

[26]

Michael Ferdman, Thomas F Wenisch, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos. 2008. Temporal instruction fetch streaming. In International Symposium on Microarchitecture.

Digital Library

[27]

Nathan Gober, Gino Chacon, Daniel Jiménez, and Paul V Gratz. [n. d.]. The Temporal Ancestry Prefetcher. ([n. d.]).

[28]

Daniel A Jiménez Paul V Gratz and Gino Chacon Nathan Gober. [n. d.]. BARCa: Branch Agnostic Region Searching Algorithm. ([n. d.]).

[29]

Brian Grayson, Jeff Rupley, Gerald Zuraski Zuraski, Eric Quinnell, Daniel A Jiménez, Tarun Nakra, Paul Kitchin, Ryan Hensley, Edward Brekelbaum, Vikas Sinha, 2020. Evolution of the samsung exynos CPU microarchitecture. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture. IEEE, 40–51.

Digital Library

[30]

Vishal Gupta, Neelu Shivprakash Kalani, and Biswabandan Panda. [n. d.]. Run-Jump-Run: Bouquet of Instruction Pointer Jumpers for High Performance Instruction Prefetching. ([n. d.]).

[31]

Stavros Harizopoulos and Anastassia Ailamaki. 2004. STEPS towards cache-resident transaction processing. In International conference on Very large data bases.

[32]

Milad Hashemi, Kevin Swersky, Jamie A Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, and Parthasarathy Ranganathan. 2018. Learning memory access patterns. arXiv preprint arXiv:1803.02329(2018).

[33]

Mark D Hill and Alan Jay Smith. 1989. Evaluating associativity in CPU caches. IEEE Trans. Comput. 38, 12 (1989), 1612–1630.

Digital Library

[34]

Intel. 2021. Front-End Bound. https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/reference/cpu-metrics-reference/front-end-bound.html.

[35]

Yasuo Ishii, Jaekyu Lee, Krishnendra Nathella, and Dam Sunwoo. 2020. Rebasing Instruction Prefetching: An Industry Perspective. IEEE Computer Architecture Letters(2020).

[36]

Yasuo Ishii, Jaekyu Lee, Krishnendra Nathella, and Dam Sunwoo. 2021. Re-establishing Fetch-Directed Instruction Prefetching: An Industry Perspective. IEEE International Symposium on Performance Analysis of Systems and Software (2021).

[37]

Daniel A Jiménez, Stephen W Keckler, and Calvin Lin. 2000. The impact of delay on the design of branch predictors. In Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture. 67–76.

Digital Library

[38]

Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks. 2015. Profiling a warehouse-scale computer. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. 158–169.

Digital Library

[39]

Cansu Kaynak, Boris Grot, and Babak Falsafi. 2013. Shift: Shared history instruction fetch for lean-core server processors. In International Symposium on Microarchitecture.

Digital Library

[40]

Cansu Kaynak, Boris Grot, and Babak Falsafi. 2015. Confluence: unified instruction supply for scale-out servers. In Proceedings of the 48th International Symposium on Microarchitecture. 166–177.

Digital Library

[41]

Tanvir Ahmed Khan, Akshitha Sriraman, Joseph Devietti, Gilles Pokam, Heiner Litz, and Baris Kasikci. 2020. I-SPY: Context-Driven Conditional Instruction Prefetching with Coalescing. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 146–159.

[42]

Tanvir Ahmed Khan, Dexin Zhang, Akshitha Sriraman, Joseph Devietti, Gilles Pokam, Heiner Litz, and Baris Kasikci. 2021. Ripple: Profile-guided instruction cache replacement for data center applications. In Proceedings of the 48th International Symposium on Computer Architecture.

Digital Library

[43]

Ryotaro Kobayashi, Yuji Yamada, Hideki Ando, and Toshio Shimada. 1999. A cost-effective branch target buffer with a two-level table organization. In Proceedings of the 2nd International Symposium of Low-Power and High-Speed Chips (COOL Chips II).

[44]

Aasheesh Kolli, Ali Saidi, and Thomas F Wenisch. 2013. RDIP: return-address-stack directed instruction prefetching. In 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 260–271.

Digital Library

[45]

Rakesh Kumar, Boris Grot, and Vijay Nagarajan. 2018. Blasting through the Front-End Bottleneck with Shotgun. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 30–42. https://doi.org/10.1145/3173162.3173178

Digital Library

[46]

Rakesh Kumar, Cheng-Chieh Huang, Boris Grot, and Vijay Nagarajan. 2017. Boomerang: A metadata-free architecture for control flow delivery. In 2017 IEEE International Symposium on High Performance Computer Architecture. IEEE, 493–504.

[47]

Lee and Smith. 1984. Branch Prediction Strategies and Branch Target Buffer Design. Computer 17, 1 (1984), 6–22. https://doi.org/10.1109/MC.1984.1658927

Digital Library

[48]

David Xinliang Li, Raksit Ashok, and Robert Hundt. 2010. Lightweight feedback-directed cross-module optimization. In Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization. 53–61.

Digital Library

[49]

Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: building customized program analysis tools with dynamic instrumentation. Acm sigplan notices 40, 6 (2005), 190–200.

Digital Library

[50]

Chi-Keung Luk and Todd C Mowry. 1998. Cooperative prefetching: Compiler and hardware support for effective instruction prefetching in modern processors. In International Symposium on Microarchitecture.

[51]

C-K Luk, Robert Muth, Harish Patil, Robert Cohn, and Geoff Lowney. 2004. Ispike: a post-link optimizer for the Intel/spl reg/Itanium/spl reg/architecture. In International Symposium on Code Generation and Optimization, 2004. CGO 2004. IEEE, 15–26.

[52]

Pierre Michaud. 2020. PIPS: Prefetching Instructions with Probabilistic Scouts. In The 1st Instruction Prefetching Championship.

[53]

Nayana Prasad Nagendra, Grant Ayers, David I August, Hyoun Kyu Cho, Svilen Kanev, Christos Kozyrakis, Trivikram Krishnamurthy, Heiner Litz, Tipp Moseley, and Parthasarathy Ranganathan. 2020. Asmdb: Understanding and mitigating front-end stalls in warehouse-scale computers. IEEE Micro 40, 3 (2020), 56–63.

[54]

Tomoki Nakamura, Toru Koizumi, Yuya Degawa, Hidetsugu Irie, Shuichi Sakai, and Ryota Shioya. [n. d.]. D-JOLT: Distant Jolt Prefetcher. ([n. d.]).

[55]

Guilherme Ottoni. 2018. HHVM JIT: A Profile-guided, Region-based Compiler for PHP and Hack. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. 151–165.

Digital Library

[56]

Guilherme Ottoni and Bin Liu. [n. d.]. HHVM Jump-Start: Boosting Both Warmup and Steady-State Performance at Scale. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 340–350.

[57]

Guilherme Ottoni and Bertrand Maher. 2017. Optimizing function placement for large-scale data-center applications. In 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 233–244.

[58]

Maksim Panchenko, Rafael Auler, Bill Nell, and Guilherme Ottoni. 2019. Bolt: a practical binary optimizer for data centers and beyond. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 2–14.

[59]

Maksim Panchenko, Rafael Auler, Laith Sakka, and Guilherme Ottoni. 2021. Lightning BOLT: powerful, fast, and scalable binary optimization. In Proceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction. 119–130.

Digital Library

[60]

Reena Panda, Paul V Gratz, and Daniel A Jiménez. 2011. B-fetch: Branch prediction directed prefetching for in-order processors. IEEE Computer Architecture Letters 11, 2 (2011), 41–44.

Digital Library

[61]

Andrea Pellegrini, Nigel Stephens, Magnus Bruce, Yasuo Ishii, Joseph Pusdesris, Abhishek Raja, Chris Abernathy, Jinson Koppanalil, Tushar Ringe, Ashok Tummala, 2020. The Arm Neoverse N1 Platform: Building Blocks for the Next-Gen Cloud-to-Edge Infrastructure SoC. IEEE Micro 40, 2 (2020), 53–62.

[62]

Chris H Perleberg and Alan Jay Smith. 1993. Branch target buffer design and optimization. IEEE transactions on computers 42, 4 (1993), 396–412.

Digital Library

[63]

Larry L Peterson. 2001. Architectural and compiler support for effective instruction prefetching: a cooperative approach. ACM Transactions on Computer Systems(2001).

[64]

Erez Petrank and Dror Rawitz. 2002. The Hardness of Cache Conscious Data Placement. In POPL.

[65]

Karl Pettis and Robert C Hansen. 1990. Profile guided code positioning. In Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation. 16–27.

Digital Library

[66]

Aleksandar Prokopec, Andrea Rosà, David Leopoldseder, Gilles Duboscq, Petr Tůma, Martin Studener, Lubomír Bulej, Yudi Zheng, Alex Villazón, Doug Simon, Thomas Würthinger, and Walter Binder. 2019. Renaissance: Benchmarking Suite for Parallel Applications on the JVM. In Programming Language Design and Implementation.

[67]

Alex Ramirez, Luiz André Barroso, Kourosh Gharachorloo, Robert Cohn, Josep Larriba-Pey, P Geoffrey Lowney, and Mateo Valero. 2001. Code layout optimizations for transaction processing workloads. ACM SIGARCH Computer Architecture News(2001).

[68]

Glenn Reinman, Todd Austin, and Brad Calder. 1999. A scalable front-end architecture for fast instruction delivery. ACM SIGARCH Computer Architecture News 27, 2 (1999), 234–245.

Digital Library

[69]

Glenn Reinman, Brad Calder, and Todd Austin. 1999. Fetch directed instruction prefetching. In Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture. IEEE, 16–27.

Digital Library

[70]

Alberto Ros and Alexandra Jimborean. 2020. The entangling instruction prefetcher. IEEE Computer Architecture Letters 19, 2 (2020), 84–87.

[71]

Eric Rotenberg, Steve Bennett, and James E Smith. 1996. Trace cache: a low latency approach to high bandwidth instruction fetching. In Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 24–34.

[72]

J Rupley. 2018. Samsung Exynos M3 Processor. IEEE Hot Chips 30(2018).

[73]

André Seznec. 2014. Tage-sc-l branch predictors. In JILP-Championship Branch Prediction.

[74]

André Seznec. 2020. The FNL+ MMA Instruction Cache Prefetcher. In IPC-1-First Instruction Prefetching Championship.

[75]

S Seznec. 1996. Don’t use the page number, but a pointer to it. In 23rd Annual International Symposium on Computer Architecture. IEEE, 104–104.

Digital Library

[76]

Alan Jay Smith. 1978. Sequential program prefetching in memory hierarchies. Computer12(1978), 7–21.

[77]

Stephen Somogyi, Thomas F Wenisch, Anastasia Ailamaki, and Babak Falsafi. 2009. Spatio-temporal memory streaming. ACM SIGARCH Computer Architecture News 37, 3 (2009), 69–80.

Digital Library

[78]

Niranjan Soundararajan, Peter Braun, Tanvir Khan, Baris Kasikci, Heiner Litz, and Sreenivas Subramoney. 2021. PDede: Partitioned, Deduplicated, Delta Branch Target Buffer. In Proceedings of the 54th Annual IEEE/ACM International Symposium on Microarchitecture.

Digital Library

[79]

Akshitha Sriraman, Abhishek Dhanotia, and Thomas F Wenisch. 2019. Softsku: Optimizing server architectures for microservice diversity@ scale. In Proceedings of the 46th International Symposium on Computer Architecture. 513–526.

Digital Library

[80]

David Suggs, Mahesh Subramony, and Dan Bouvier. 2020. The AMD “Zen 2” Processor. IEEE Micro 40, 2 (2020), 45–52.

[81]

Thomas F Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos. 2008. Temporal streams in commercial server applications. In 2008 IEEE International Symposium on Workload Characterization. IEEE, 99–108.

[82]

Thomas F Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos. 2009. Practical off-chip meta-data for temporal memory streaming. In 2009 IEEE 15th International Symposium on High Performance Computer Architecture. IEEE, 79–90.

[83]

Thomas F Wenisch, Stephen Somogyi, Nikolaos Hardavellas, Jangwoo Kim, Anastassia Ailamaki, and Babak Falsafi. 2005. Temporal streaming of shared memory. In 32nd International Symposium on Computer Architecture. IEEE, 222–233.

Digital Library

[84]

Wikipedia contributors. 2020. Apache Kafka — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Apache_Kafka&oldid=988898935. [Online; accessed 23-November-2020].

[85]

Wikipedia contributors. 2020. Verilator — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Verilator&oldid=989046249. [Online; accessed 8-April-2021].

[86]

Wikipedia contributors. 2021. Apache Cassandra — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Apache_Cassandra&oldid=1010524207. [Online; accessed 7-April-2021].

[87]

Wikipedia contributors. 2021. X86-64 — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=X86-64&oldid=1016690406. [Online; accessed 10-April-2021].

[88]

Ahmad Yasin. 2014. A top-down method for performance analysis and counters architecture. In ISPASS.

[89]

Tse-Yu Yeh and Yale N Patt. 1992. A comprehensive instruction fetch mechanism for a processor supporting speculative execution. ACM SIGMICRO Newsletter 23, 1-2 (1992), 129–139.

Digital Library

[90]

Jingren Zhou and Kenneth A Ross. 2004. Buffering databse operations for enhanced instruction cache performance. In International conference on Management of data.

Digital Library

Cited By

Nian JLiu HGao XZhang SYang M(2024)Enhancing Power Efficiency in Branch Target Buffer Design with a Two-Level Prediction MechanismElectronics10.3390/electronics1307118513:7(1185)Online publication date: 23-Mar-2024
https://doi.org/10.3390/electronics13071185
Brunner RKumar R(2024)Weeding out Front-End Stalls with Uneven Block Size Instruction Cache2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00102(1382-1396)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00102
Singh SPerais AJimborean ARos A(2024)Alternate Path μ-op Cache Prefetching2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00092(1230-1245)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00092
Show More Cited By

Recommendations

Thermometer: profile-guided btb replacement for data center applications
ISCA '22: Proceedings of the 49th Annual International Symposium on Computer Architecture

Modern processors employ a decoupled frontend with Fetch Directed Instruction Prefetching (FDIP) to avoid frontend stalls in data center applications. However, the large branch footprint of data center applications precipitates frequent Branch Target ...
Increasing hardware data prefetching performance using the second-level cache

Techniques to reduce or tolerate large memory latencies are critical for achieving high processor performance. Hardware data prefetching is one of the most heavily studied solutions, but it is essentially applied to first-level caches where it can ...
Kilo-instruction processors, runahead and prefetching
CF '06: Proceedings of the 3rd conference on Computing frontiers

There is a continuous research effort devoted to overcome the memory wall problem. Prefetching is one of the most frequently used techniques. A prefetch mechanism anticipates the processor requests by moving data into the lower levels of the memory ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

October 2021

1322 pages

ISBN:9781450385572

DOI:10.1145/3466752

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Intel Corporation
NSF
Applications Driving Architectures (ADA) Research Center, a JUMP Center co-sponsored by SRC and DARPA

Conference

MICRO '21

Sponsor:

SIGMICRO

MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture

October 18 - 22, 2021

Virtual Event, Greece

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
1,480
Total Downloads

Downloads (Last 12 months)535
Downloads (Last 6 weeks)98

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Nian JLiu HGao XZhang SYang M(2024)Enhancing Power Efficiency in Branch Target Buffer Design with a Two-Level Prediction MechanismElectronics10.3390/electronics1307118513:7(1185)Online publication date: 23-Mar-2024
https://doi.org/10.3390/electronics13071185
Brunner RKumar R(2024)Weeding out Front-End Stalls with Uneven Block Size Instruction Cache2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00102(1382-1396)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00102
Singh SPerais AJimborean ARos A(2024)Alternate Path μ-op Cache Prefetching2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00092(1230-1245)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00092
Oh SXu MKhan TKasikci BLitz H(2024)UDP: Utility-Driven Fetch Directed Instruction Prefetching2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00089(1188-1201)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00089
Liu YLi XZhang TLiu TGuo QZhang FWang J(2024)AVM-BTB: Adaptive and Virtualized Multi-level Branch Target Buffer2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00012(17-31)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00012
Asheim TGrot BKumar R(2023)A Storage-Effective BTB Organization for Servers2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070938(1153-1167)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10070938
Lin WQin JChen YJin ZXu JZhang YCai SFu LChen YChen W(2023)JACO: JAva Code Layout Optimizer Enabling Continuous Optimization without Pausing Application Services2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00032(295-306)Online publication date: 31-Oct-2023
https://doi.org/10.1109/CLUSTER52292.2023.00032
Ma JZuo GLoughlin KZhang HQuinn AKasikci BFalsafi BFerdman MLu SWenisch T(2022)Debugging in the brave new world of reconfigurable hardwareProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507701(946-962)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507701
Kumar RGrot B(2022)Shooting Down the Server Front-End BottleneckACM Transactions on Computer Systems10.1145/348449238:3-4(1-30)Online publication date: 4-Jan-2022
https://dl.acm.org/doi/10.1145/3484492
Khan TUgur MNathella KSunwoo DLitz HJiménez DKasikci BHardavellas NCampanoni SGrot BKarpuzcu U(2022)Whisper: Profile-Guided Branch Misprediction Elimination for Data Center ApplicationsProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00017(19-34)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1109/MICRO56248.2022.00017

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents