More Web Proxy on the site http://driver.im/

research-article

Public Access

NOVIA: A Framework for Discovering Non-Conventional Inline Accelerators

Authors:

John-David Wellman,

Alper Buyuktosunoglu,

Pradip BoseAuthors Info & Claims

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 507 - 521

https://doi.org/10.1145/3466752.3480094

Published: 17 October 2021 Publication History

All formats PDF

Abstract

Accelerators provide an increasingly valuable source of performance in modern computing systems. In most cases, accelerators are implemented as stand-alone, offload engines to which the processor can send large computation tasks. For many edge devices, as performance needs increase accelerators become essential, but the tight constraints on these devices limit the extent to which offload engines can be incorporated. An alternative is inline accelerators, which can be integrated as part of the core and provide performance with much smaller start-up times and area overheads. While inline accelerators allow greater flexibility in the interface and acceleration of finer grain code, determining good inline candidate accelerators is non-trivial. In this paper, we present NOVIA, a framework to derive inline accelerators by examining the workload source code and identifying inline accelerator candidates that provide benefits across many different regions of the workload. These NOVIA-derived accelerators are then integrated into an embedded core. For this core, NOVIA produces inline accelerators that improve the performance of various benchmark suites like EEMBC Autobench 2.0 and Mediabench by 1.37x with only a 3% core area increase.

References

[1]

J. R. Allen, Ken Kennedy, Carrie Porterfield, and Joe Warren. 1983. Conversion of Control Dependence to Data Dependence. In Proceedings of the 10th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages(POPL ’83). Association for Computing Machinery, New York, NY, USA, 177–189. https://doi.org/10.1145/567067.567085

Digital Library

[2]

Kubilay Atasu, Can C. Özturan, Günhan Dündar, Oskar Mencer, and Wayne Luk. 2008. CHIPS: Custom Hardware Instruction Processor Synthesis. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 27, 3(2008), 528–541. https://doi.org/10.1109/TCAD.2008.915536

Digital Library

[3]

Kubilay Atasu, Laura Pozzi, and Paolo Ienne. 2003. Automatic application-specific instruction-set extensions under microarchitectural constraints. In Proceedings of the 40th Design Automation Conference, DAC 2003, Anaheim, CA, USA, June 2-6, 2003. ACM, 256–261. https://doi.org/10.1145/775832.775897

Digital Library

[4]

David I. August, Daniel A. Connors, Scott A. Mahlke, John W. Sias, Kevin M. Crozier, Ben-Chung Cheng, Patrick R. Eaton, Qudus B. Olaniran, and Wen-mei W. Hwu. 1998. Integrated Predicated and Speculative Execution in the IMPACT EPIC Architecture. In Proceedings of the 25th Annual International Symposium on Computer Architecture, ISCA 1998, Barcelona, Spain, June 27 - July 1, 1998, Mateo Valero, Gurindar S. Sohi, and Doug DeGroot(Eds.). IEEE Computer Society, 227–237. https://doi.org/10.1109/ISCA.1998.694777

[5]

David I. August, Wen-mei W. Hwu, and Scott A. Mahlke. 1997. A Framework for Balancing Control Flow and Predication. In Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 30, Research Triangle Park, North Carolina, USA, December 1-3, 1997, Mark Smotherman and Tom Conte (Eds.). ACM/IEEE Computer Society, 92–103. https://doi.org/10.1109/MICRO.1997.645801

[6]

Jonathan Balkind, Michael McKeown, Yaosheng Fu, Tri Minh Nguyen, Yanqi Zhou, Alexey Lavrov, Mohammad Shahrad, Adi Fuchs, Samuel Payne, Xiaohua Liang, Matthew Matl, and David Wentzlaff. 2016. OpenPiton: An Open Source Manycore Research Framework. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’16, Atlanta, GA, USA, April 2-6, 2016, Tom Conte and Yuanyuan Zhou (Eds.). ACM, 217–232. https://doi.org/10.1145/2872362.2872414

Digital Library

[7]

Thomas Ball and James R. Larus. 1996. Efficient Path Profiling. In Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 29, Paris, France, December 2-4, 1996, Stephen W. Melvinand Steve Beaty (Eds.). ACM/IEEE Computer Society, 46–57. https://doi.org/10.1109/MICRO.1996.566449

[8]

Jesse Benson, Ryan Cofell, Chris Frericks, Chen-Han Ho, Venkatraman Govindaraju, Tony Nowatzki, and Karthikeyan Sankaralingam. 2012. Design, integration and implementation of the DySER hardware accelerator into OpenSPARC. In 18th IEEE International Symposium on High Performance Computer Architecture, HPCA 2012, New Orleans, LA, USA, 25-29 February, 2012. IEEE Computer Society, 115–126. https://doi.org/10.1109/HPCA.2012.6168949

Digital Library

[9]

Ramon Bertran, Alper Buyuktosunoglu, Meeta Sharma Gupta, Marc González, and Pradip Bose. 2012. Systematic Energy Characterization of CMP/SMT Processor Systems via Automated Micro-Benchmarks. In 45th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2012, Vancouver, BC, Canada, December 1-5, 2012. IEEE Computer Society, 199–211. https://doi.org/10.1109/MICRO.2012.27

Digital Library

[10]

Calvin Bulla, Lluc Alvarez, Miquel Moretó, Ramon Bertran, Alper Buyuktosunoglu, and Pradip Bose. 2018. ChopStiX: Systematic Extraction of Code-Representative Microbenchmarks. In 2018 IEEE International Symposium on Workload Characterization, IISWC 2018, Raleigh, NC, USA, September 30 - October 2, 2018. IEEE Computer Society, 80–81. https://doi.org/10.1109/IISWC.2018.8573473

[11]

Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Jason Helge Anderson, Stephen Dean Brown, and Tomasz S. Czajkowski. 2011. LegUp: high-level synthesis for FPGA-based processor/accelerator systems. In Proceedings of the ACM/SIGDA 19th International Symposium on Field Programmable Gate Arrays, FPGA 2011, Monterey, California, USA, February 27, March 1, 2011, John Wawrzynek and Katherine Compton (Eds.). ACM, 33–36. https://doi.org/10.1145/1950413.1950423

Digital Library

[12]

Nathan Clark, Jason A. Blome, Michael L. Chu, Scott A. Mahlke, Stuart Biles, and Krisztián Flautner. 2005. An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors. In 32st International Symposium on Computer Architecture (ISCA 2005), 4-8 June 2005, Madison, Wisconsin, USA. IEEE Computer Society, 272–283. https://doi.org/10.1109/ISCA.2005.9

Digital Library

[13]

Nathan Clark, Manjunath Kudlur, Hyunchul Park, Scott A. Mahlke, and Krisztián Flautner. 2004. Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization. In 37th Annual International Symposium on Microarchitecture (MICRO-37 2004), 4-8 December 2004, Portland, OR, USA. IEEE Computer Society, 30–40. https://doi.org/10.1109/MICRO.2004.5

Digital Library

[14]

Nathan Clark, Hongtao Zhong, and Scott A. Mahlke. 2003. Processor Acceleration Through Automated Instruction Set Customization. In Proceedings of the 36th Annual International Symposium on Microarchitecture, San Diego, CA, USA, December 3-5, 2003. IEEE Computer Society, 129–140. https://doi.org/10.1109/MICRO.2003.1253189

[15]

Nathan Clark, Hongtao Zhong, and Scott A. Mahlke. 2005. Automated Custom Instruction Generation for Domain-Specific Processor Acceleration. IEEE Trans. Computers 54, 10 (2005), 1258–1270. https://doi.org/10.1109/TC.2005.156

Digital Library

[16]

Lieven Eeckhout, Robert H. Bell Jr., Bastiaan Stougie, Koen De Bosschere, and Lizy Kurian John. 2004. Control Flow Modeling in Statistical Simulation for Accurate and Efficient Processor Design Studies. In 31st International Symposium on Computer Architecture (ISCA 2004), 19-23 June 2004, Munich, Germany. IEEE Computer Society, 350–363. https://doi.org/10.1109/ISCA.2004.1310787

[17]

EEMBC Consortium. [n. d.]. EEMBC Autobench 2.0 Website. https://www.eembc.org/autobench/.

[18]

Joseph A. Fisher, Paolo Faraboschi, and Giuseppe Desoli. 1996. Custom-fit Processors: Letting Applications Define Architectures. In Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 29, Paris, France, December 2-4, 1996, Stephen W. Melvin and Steve Beaty (Eds.). ACM/IEEE Computer Society, 324–335. https://doi.org/10.1109/MICRO.1996.566472

[19]

Karthik Ganesan, Jungho Jo, and Lizy K. John. 2010. Synthesizing memory-level parallelism aware miniature clones for SPEC CPU2006 and ImplantBench workloads. In IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2010, 28-30 March 2010, White Plains, NY, USA. IEEE Computer Society, 33–44. https://doi.org/10.1109/ISPASS.2010.5452076

[20]

Ricardo E. Gonzalez. 2000. Xtensa: A Configurable and Extensible Processor. IEEE Micro 20, 2 (2000), 60–70. https://doi.org/10.1109/40.848473

Digital Library

[21]

Cecilia González-Alvarez, Jennifer B. Sartor, Carlos Álvarez, Daniel Jiménez-González, and Lieven Eeckhout. 2013. Accelerating an application domain with specialized functional units. ACM Trans. Archit. Code Optim. 10, 4 (2013), 47:1–47:25. https://doi.org/10.1145/2541228.2555303

Digital Library

[22]

Cecilia González-Alvarez, Jennifer B. Sartor, Carlos Álvarez, Daniel Jiménez-González, and Lieven Eeckhout. 2015. Automatic design of domain-specific instructions for low-power processors. In 26th IEEE International Conference on Application-specific Systems, Architectures and Processors, ASAP 2015, Toronto, ON, Canada, July 27-29, 2015. IEEE Computer Society, 1–8. https://doi.org/10.1109/ASAP.2015.7245697

[23]

Cecilia González-Alvarez, Jennifer B. Sartor, Carlos Álvarez, Daniel Jiménez-González, and Lieven Eeckhout. 2016. MInGLE: An Efficient Framework for Domain Acceleration Using Low-Power Specialized Functional Units. ACM Trans. Archit. Code Optim. 13, 2 (2016), 17:1–17:26. https://doi.org/10.1145/2898356

Digital Library

[24]

Nathan Goulding-Hotta, Jack Sampson, Ganesh Venkatesh, Saturnino Garcia, Joe Auricchio, Po-Chao Huang, Manish Arora, Siddhartha Nath, Vikram Bhatt, Jonathan Babb, Steven Swanson, and Michael Bedford Taylor. 2011. The GreenDroid Mobile Application Processor: An Architecture for Silicon’s Dark Future. IEEE Micro 31, 2 (2011), 86–95. https://doi.org/10.1109/MM.2011.18

Digital Library

[25]

Venkatraman Govindaraju, Chen-Han Ho, Tony Nowatzki, Jatin Chhugani, Nadathur Satish, Karthikeyan Sankaralingam, and Changkyu Kim. 2012. DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing. IEEE Micro 32, 5 (2012), 38–51. https://doi.org/10.1109/MM.2012.51

Digital Library

[26]

Venkatraman Govindaraju, Chen-Han Ho, and Karthikeyan Sankaralingam. 2011. Dynamically Specialized Datapaths for energy efficient computing. In 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), February 12-16 2011, San Antonio, Texas, USA. IEEE Computer Society, 503–514. https://doi.org/10.1109/HPCA.2011.5749755

[27]

Shantanu Gupta, Shuguang Feng, Amin Ansari, Scott A. Mahlke, and David I. August. 2011. Bundled execution of recurring traces for energy-efficient general purpose processing. In 44rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2011, Porto Alegre, Brazil, December 3-7, 2011, Carlo Galuzzi, Luigi Carro, Andreas Moshovos, and Milos Prvulovic (Eds.). ACM, 12–23. https://doi.org/10.1145/2155620.2155623

Digital Library

[28]

Chen-Han Ho, Venkatraman Govindaraju, Tony Nowatzki, Ranjini Nagaraju, Zachary Marzec, Preeti Agarwal, Chris Frericks, Ryan Cofell, and Karthikeyan Sankaralingam. 2015. Performance evaluation of a DySER FPGA prototype system spanning the compiler, microarchitecture, and hardware implementation. In 2015 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2015, Philadelphia, PA, USA, March 29-31, 2015. IEEE Computer Society, 203–214. https://doi.org/10.1109/ISPASS.2015.7095806

[29]

Justin Holewinski, Ragavendar Ramamurthi, Mahesh Ravishankar, Naznin Fauzia, Louis-Noël Pouchet, Atanas Rountev, and P. Sadayappan. 2012. Dynamic trace-based analysis of vectorization potential of applications. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, Beijing, China - June 11 - 16, 2012, Jan Vitek, Haibo Lin, and Frank Tip(Eds.). ACM, 371–382. https://doi.org/10.1145/2254064.2254108

Digital Library

[30]

Wen-mei W. Hwu, Scott A. Mahlke, William Y. Chen, Pohua P. Chang, Nancy J. Warter, Roger A. Bringmann, Roland G. Ouellette, Richard E. Hank, Tokuzo Kiyohara, Grant E. Haab, John G. Holm, and Daniel M. Lavery. 1993. The superblock: An effective technique for VLIW and superscalar compilation. J. Supercomput. 7, 1-2 (1993), 229–248. https://doi.org/10.1007/BF01205185

Digital Library

[31]

Lizy Kurian John. 2011. Proprietary code to non-proprietary benchmarks: synthesis techniques for scalable benchmarks. In ICPE’11 - Second Joint WOSP/SIPEW International Conference on Performance Engineering, Karlsruhe, Germany, March 14-16, 2011, Samuel Kounev, Vittorio Cortellessa, Raffaela Mirandola, and David J. Lilja (Eds.). ACM, 1–2. https://doi.org/10.1145/1958746.1958748

Digital Library

[32]

Ajay Joshi, Lieven Eeckhout, Robert H. Bell Jr., and Lizy Kurian John. 2006. Performance Cloning: A Technique for Disseminating Proprietary Applications as Benchmarks. In Proceedings of the 2006 IEEE International Symposium on Workload Characterization, IISWC 2006, October 25-27, 2006, San Jose, California, USA. IEEE Computer Society, 105–115. https://doi.org/10.1109/IISWC.2006.302734

[33]

Ajay Joshi, Lieven Eeckhout, Robert H. Bell Jr., and Lizy Kurian John. 2008. Distilling the essence of proprietary workloads into miniature benchmarks. ACM Trans. Archit. Code Optim. 5, 2 (2008), 10:1–10:33. https://doi.org/10.1145/1400112.1400115

Digital Library

[34]

Ajay M. Joshi, Lieven Eeckhout, and Lizy Kurian John. 2007. Exploring the Application Behavior Space Using Parameterized Synthetic Benchmarks. In 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), Brasov, Romania, September 15-19, 2007. IEEE Computer Society, 412. https://doi.org/10.1109/PACT.2007.31

[35]

Ajay M. Joshi, Lieven Eeckhout, Lizy Kurian John, and Ciji Isen. 2008. Automated microprocessor stressmark generation. In 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 16-20 February 2008, Salt Lake City, UT, USA. IEEE Computer Society, 229–239. https://doi.org/10.1109/HPCA.2008.4658642

[36]

Robert H. Bell Jr., Rajiv R. Bhatia, Lizy K. John, Jeff Stuecheli, John Griswell, Paul Tu, Louis Capps, Anton Blanchard, and Ravel Thai. 2006. Automatic testcase synthesis and performance model validation for high performance PowerPC processors. In 2006 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2006, March 19-21, 2006, Austin, Texas, USA, Proceedings. IEEE Computer Society, 154–165. https://doi.org/10.1109/ISPASS.2006.1620800

[37]

Robert H. Bell Jr. and Lizy Kurian John. 2005. Improved automatic testcase synthesis for performance model validation. In Proceedings of the 19th Annual International Conference on Supercomputing, ICS 2005, Cambridge, Massachusetts, USA, June 20-22, 2005, Arvind and Larry Rudolph (Eds.). ACM, 111–120. https://doi.org/10.1145/1088149.1088164

Digital Library

[38]

Snehasish Kumar, Vijayalakshmi Srinivasan, Amirali Sharifian, Nick Sumner, and Arrvindh Shriraman. 2016. Peruse and Profit: Estimating the Accelerability of Loops. In Proceedings of the 2016 International Conference on Supercomputing, ICS 2016, Istanbul, Turkey, June 1-3, 2016, Ozcan Ozturk, Kemal Ebcioglu, Mahmut T. Kandemir, and Onur Mutlu (Eds.). ACM, 21:1–21:13. https://doi.org/10.1145/2925426.2926269

Digital Library

[39]

Snehasish Kumar, Nick Sumner, Vijayalakshmi Srinivasan, Steve Margerm, and Arrvindh Shriraman. 2017. Needle: Leveraging Program Analysis to Analyze and Extract Accelerators from Whole Programs. In 2017 IEEE International Symposium on High Performance Computer Architecture, HPCA 2017, Austin, TX, USA, February 4-8, 2017. IEEE Computer Society, 565–576. https://doi.org/10.1109/HPCA.2017.59

[40]

Snehasish Kumar, William N. Sumner, and Arrvindh Shriraman. 2016. SPEC-AX and PARSEC-AX: extracting accelerator benchmarks from microprocessor benchmarks. In 2016 IEEE International Symposium on Workload Characterization, IISWC 2016, Providence, RI, USA, September 25-27, 2016. IEEE Computer Society, 117–127. https://doi.org/10.1109/IISWC.2016.7581272

[41]

Chris Lattner and Vikram S. Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 20-24 March 2004, San Jose, CA, USA. IEEE Computer Society, 75–88. https://doi.org/10.1109/CGO.2004.1281665

[42]

Chunho Lee, Miodrag Potkonjak, and William H. Mangione-Smith. 1997. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communicatons Systems. In Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 30, Research Triangle Park, North Carolina, USA, December 1-3, 1997, Mark Smotherman and Tom Conte (Eds.). ACM/IEEE Computer Society, 330–335. https://doi.org/10.1109/MICRO.1997.645830

[43]

Feng Liu, Heejin Ahn, Stephen R. Beard, Taewook Oh, and David I. August. 2015. DynaSpAM: dynamic spatial architecture mapping using out of order instruction schedules. In Proceedings of the 42nd Annual International Symposium on Computer Architecture, Portland, OR, USA, June 13-17, 2015, Deborah T. Marr and David H. Albonesi (Eds.). ACM, 541–553. https://doi.org/10.1145/2749469.2750414

Digital Library

[44]

Scott A. Mahlke, Richard E. Hank, James E. McCormick, David I. August, and Wen-mei W. Hwu. 1995. A Comparison of Full and Partial Predicated Execution Support for ILP Processors. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, ISCA ’95, Santa Margherita Ligure, Italy, June 22-24, 1995, David A. Patterson (Ed.). ACM, 138–150. https://doi.org/10.1145/223982.225965

Digital Library

[45]

Scott A. Mahlke, David C. Lin, William Y. Chen, Richard E. Hank, and Roger A. Bringmann. 1992. Effective compiler support for predicated execution using the hyperblock. In Proceedings of the 25th Annual International Symposium on Microarchitecture, Portland, Oregon, USA, November 1992, Wen-mei W. Hwu (Ed.). ACM / IEEE Computer Society, 45–54. https://doi.org/10.1109/MICRO.1992.696999

[46]

Mahim Mishra, Timothy J. Callahan, Tiberiu Chelcea, Girish Venkataramani, Seth C. Goldstein, and Mihai Budiu. 2006. Tartan: Evaluating Spatial Computation for Whole Program Execution. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS XII). Association for Computing Machinery, New York, NY, USA, 163–174. https://doi.org/10.1145/1168857.1168878

Digital Library

[47]

P. Mokri and M. Hempstead. 2020. Early-stage Automated Identification Tool for Shared Accelerators. In 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 217–217. https://doi.org/10.1109/FCCM48280.2020.00048

[48]

Tony Nowatzki, Venkatraman Govindaraju, and Karthikeyan Sankaralingam. 2015. A Graph-Based Program Representation for Analyzing Hardware Specialization Approaches. IEEE Comput. Archit. Lett. 14, 2 (2015), 94–98. https://doi.org/10.1109/LCA.2015.2476801

Digital Library

[49]

Angshuman Parashar, Michael Pellauer, Michael Adler, Bushra Ahsan, Neal Clayton Crago, Daniel Lustig, Vladimir Pavlov, Antonia Zhai, Mohit Gambhir, Aamer Jaleel, Randy L. Allmon, Rachid Rayess, Stephen Maresh, and Joel S. Emer. 2013. Triggered instructions: a control paradigm for spatially-programmed architectures. In The 40th Annual International Symposium on Computer Architecture, ISCA’13, Tel-Aviv, Israel, June 23-27, 2013, Avi Mendelson (Ed.). ACM, 142–153. https://doi.org/10.1145/2485922.2485935

Digital Library

[50]

Laura Pozzi, Kubilay Atasu, and Paolo Ienne. 2006. Exact and approximate algorithms for the extension of embedded processor instruction sets. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 25, 7(2006), 1209–1229. https://doi.org/10.1109/TCAD.2005.855950

Digital Library

[51]

Tristan Ravitch. 2014. WLLVM : Whole-Program LLVM. https://github.com/travitch/whole-program-llvm.

[52]

Brandon Reagen, Yakun Sophia Shao, Gu-Yeon Wei, and David M. Brooks. 2013. Quantifying acceleration: Power/performance trade-offs of application kernels in hardware. In International Symposium on Low Power Electronics and Design (ISLPED), Beijing, China, September 4-6, 2013, Pai H. Chou, Ru Huang, Yuan Xie, and Tanay Karnik (Eds.). IEEE, 395–400. https://doi.org/10.1109/ISLPED.2013.6629329

[53]

Rodrigo C. O. Rocha, Pavlos Petoumenos, Zheng Wang, Murray Cole, and Hugh Leather. 2019. Function Merging by Sequence Alignment. In IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2019, Washington, DC, USA, February 16-20, 2019, Mahmut Taylan Kandemir, Alexandra Jimborean, and Tipp Moseley(Eds.). IEEE, 149–163. https://doi.org/10.1109/CGO.2019.8661174

[54]

Yakun Sophia Shao and David M. Brooks. 2013. ISA-independent workload characterization and its implications for specialized architectures. In 2012 IEEE International Symposium on Performance Analysis of Systems & Software, Austin, TX, USA, 21-23 April, 2013. IEEE Computer Society, 245–255. https://doi.org/10.1109/ISPASS.2013.6557175

[55]

Yakun Sophia Shao, Brandon Reagen, Gu-Yeon Wei, and David M. Brooks. 2014. Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures. In ACM/IEEE 41st International Symposium on Computer Architecture, ISCA 2014, Minneapolis, MN, USA, June 14-18, 2014. IEEE Computer Society, 97–108. https://doi.org/10.1109/ISCA.2014.6853196

[56]

Yakun Sophia Shao, Brandon Reagen, Gu-Yeon Wei, and David M. Brooks. 2015. The Aladdin Approach to Accelerator Design and Modeling. IEEE Micro 35, 3 (2015), 58–70. https://doi.org/10.1109/MM.2015.50

Digital Library

[57]

Yakun Sophia Shao, Sam Likun Xi, Vijayalakshmi Srinivasan, Gu-Yeon Wei, and David M. Brooks. 2016. Co-designing accelerators and SoC interfaces using gem5-Aladdin. In 49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2016, Taipei, Taiwan, October 15-19, 2016. IEEE Computer Society, 48:1–48:12. https://doi.org/10.1109/MICRO.2016.7783751

[58]

Amirali Sharifian, Snehasish Kumar, Apala Guha, and Arrvindh Shriraman. 2016. Chainsaw: Von-neumann accelerators to leverage fused instruction chains. In 49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2016, Taipei, Taiwan, October 15-19, 2016. IEEE Computer Society, 49:1–49:14. https://doi.org/10.1109/MICRO.2016.7783752

[59]

Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steven Swanson, and Michael Bedford Taylor. 2010. Conservation cores: reducing the energy of mature computations. (2010), 205–218. https://doi.org/10.1145/1736020.1736044

Digital Library

[60]

Ganesh Venkatesh, Jack Sampson, Nathan Goulding-Hotta, Sravanthi Kota Venkata, Michael Bedford Taylor, and Steven Swanson. 2011. QSCORES: Trading dark silicon for scalable energy efficiency with quasi-specific cores. In 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 163–174.

Digital Library

[61]

Andrew Waterman, Yunsup Lee, Rimas Avizienis, Henry Cook, David Patterson, and Krste Asanovic. 2013. The RISC-V instruction set. 1–1. https://doi.org/10.1109/HOTCHIPS.2013.7478332

[62]

Jian Weng, Sihao Liu, Vidushi Dadu, Zhengrong Wang, Preyas Shah, and Tony Nowatzki. 2020. DSAGEN: Synthesizing Programmable Spatial Accelerators. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 268–281. https://doi.org/10.1109/ISCA45697.2020.00032

Digital Library

[63]

Xilinx. [n. d.]. Vitis High-Level Synthesis. https://www.xilinx.com/products/design-tools/vivado/integration/esl-design.html.

[64]

Georgios Zacharopoulos, Lorenzo Ferretti, Giovanni Ansaloni, Giuseppe Di Guglielmo, Luca P. Carloni, and Laura Pozzi. 2019. Compiler-Assisted Selection of Hardware Acceleration Candidates from Application Source Code. In 37th IEEE International Conference on Computer Design, ICCD 2019, Abu Dhabi, United Arab Emirates, November 17-20, 2019. IEEE, 129–137. https://doi.org/10.1109/ICCD46524.2019.00024

[65]

Georgios Zacharopoulos, Lorenzo Ferretti, Emanuele Giaquinta, Giovanni Ansaloni, and Laura Pozzi. 2019. RegionSeeker: Automatically Identifying and Selecting Accelerators From Application Source Code. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 38, 4(2019), 741–754. https://doi.org/10.1109/TCAD.2018.2818689

Cited By

Hussein EWaschneck BMayr C(2024)Automating application-driven customization of ASIPsJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2024.103080148:COnline publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1016/j.sysarc.2024.103080
Tarasov IMirzoyan DSovietov P(2024)Designing a Graphics Accelerator with Heterogeneous ArchitectureHigh-Performance Computing Systems and Technologies in Scientific Research, Automation of Control and Production10.1007/978-3-031-51057-1_3(29-40)Online publication date: 26-Jan-2024
https://doi.org/10.1007/978-3-031-51057-1_3
Li CWang YLi HHan Y(2023)APPEND: Rethinking ASIP Synthesis in the Era of AI2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247872(1-6)Online publication date: 9-Jul-2023
https://doi.org/10.1109/DAC56929.2023.10247872
Show More Cited By

Recommendations

Petascale computing with accelerators
PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming

A trend is developing in high performance computing in which commodity processors are coupled to various types of computational accelerators. Such systems are commonly called hybrid systems. In this paper, we describe our experience developing an ...
Fingerprint image processing acceleration through run-time reconfigurable hardware

To the best of the authors' knowledge, this is the first brief that implements a complete automatic fingerprint-based authentication system (AFAS) application under a dynamically partial self-reconfigurable field-programmable gate array (FPGA). The main ...
Comparing Hardware Accelerators in Scientific Applications: A Case Study

Multicore processors and a variety of accelerators have allowed scientific applications to scale to larger problem sizes. We present a performance, design methodology, platform, and architectural comparison of several application accelerators executing ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

October 2021

1322 pages

ISBN:9781450385572

DOI:10.1145/3466752

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

DARPA

Conference

MICRO '21

Sponsor:

SIGMICRO

MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture

October 18 - 22, 2021

Virtual Event, Greece

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
994
Total Downloads

Downloads (Last 12 months)241
Downloads (Last 6 weeks)25

Reflects downloads up to 04 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hussein EWaschneck BMayr C(2024)Automating application-driven customization of ASIPsJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2024.103080148:COnline publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1016/j.sysarc.2024.103080
Tarasov IMirzoyan DSovietov P(2024)Designing a Graphics Accelerator with Heterogeneous ArchitectureHigh-Performance Computing Systems and Technologies in Scientific Research, Automation of Control and Production10.1007/978-3-031-51057-1_3(29-40)Online publication date: 26-Jan-2024
https://doi.org/10.1007/978-3-031-51057-1_3
Li CWang YLi HHan Y(2023)APPEND: Rethinking ASIP Synthesis in the Era of AI2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247872(1-6)Online publication date: 9-Jul-2023
https://doi.org/10.1109/DAC56929.2023.10247872
Koraei MCebrian JJahre M(2023)Near-optimal multi-accelerator architectures for predictive maintenance at the edgeFuture Generation Computer Systems10.1016/j.future.2022.10.030140:C(331-343)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1016/j.future.2022.10.030
Marino AFons FArostegui J(2022)The Future Roadmap of In-Vehicle Network Processing: A HW-Centric (R-)evolutionIEEE Access10.1109/ACCESS.2022.318670810(69223-69249)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3186708

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents