[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3466752.3480094acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article
Public Access

NOVIA: A Framework for Discovering Non-Conventional Inline Accelerators

Published: 17 October 2021 Publication History

Abstract

Accelerators provide an increasingly valuable source of performance in modern computing systems. In most cases, accelerators are implemented as stand-alone, offload engines to which the processor can send large computation tasks. For many edge devices, as performance needs increase accelerators become essential, but the tight constraints on these devices limit the extent to which offload engines can be incorporated. An alternative is inline accelerators, which can be integrated as part of the core and provide performance with much smaller start-up times and area overheads. While inline accelerators allow greater flexibility in the interface and acceleration of finer grain code, determining good inline candidate accelerators is non-trivial. In this paper, we present NOVIA, a framework to derive inline accelerators by examining the workload source code and identifying inline accelerator candidates that provide benefits across many different regions of the workload. These NOVIA-derived accelerators are then integrated into an embedded core. For this core, NOVIA produces inline accelerators that improve the performance of various benchmark suites like EEMBC Autobench 2.0 and Mediabench by 1.37x with only a 3% core area increase.

References

[1]
J. R. Allen, Ken Kennedy, Carrie Porterfield, and Joe Warren. 1983. Conversion of Control Dependence to Data Dependence. In Proceedings of the 10th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages(POPL ’83). Association for Computing Machinery, New York, NY, USA, 177–189. https://doi.org/10.1145/567067.567085
[2]
Kubilay Atasu, Can C. Özturan, Günhan Dündar, Oskar Mencer, and Wayne Luk. 2008. CHIPS: Custom Hardware Instruction Processor Synthesis. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 27, 3(2008), 528–541. https://doi.org/10.1109/TCAD.2008.915536
[3]
Kubilay Atasu, Laura Pozzi, and Paolo Ienne. 2003. Automatic application-specific instruction-set extensions under microarchitectural constraints. In Proceedings of the 40th Design Automation Conference, DAC 2003, Anaheim, CA, USA, June 2-6, 2003. ACM, 256–261. https://doi.org/10.1145/775832.775897
[4]
David I. August, Daniel A. Connors, Scott A. Mahlke, John W. Sias, Kevin M. Crozier, Ben-Chung Cheng, Patrick R. Eaton, Qudus B. Olaniran, and Wen-mei W. Hwu. 1998. Integrated Predicated and Speculative Execution in the IMPACT EPIC Architecture. In Proceedings of the 25th Annual International Symposium on Computer Architecture, ISCA 1998, Barcelona, Spain, June 27 - July 1, 1998, Mateo Valero, Gurindar S. Sohi, and Doug DeGroot(Eds.). IEEE Computer Society, 227–237. https://doi.org/10.1109/ISCA.1998.694777
[5]
David I. August, Wen-mei W. Hwu, and Scott A. Mahlke. 1997. A Framework for Balancing Control Flow and Predication. In Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 30, Research Triangle Park, North Carolina, USA, December 1-3, 1997, Mark Smotherman and Tom Conte (Eds.). ACM/IEEE Computer Society, 92–103. https://doi.org/10.1109/MICRO.1997.645801
[6]
Jonathan Balkind, Michael McKeown, Yaosheng Fu, Tri Minh Nguyen, Yanqi Zhou, Alexey Lavrov, Mohammad Shahrad, Adi Fuchs, Samuel Payne, Xiaohua Liang, Matthew Matl, and David Wentzlaff. 2016. OpenPiton: An Open Source Manycore Research Framework. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’16, Atlanta, GA, USA, April 2-6, 2016, Tom Conte and Yuanyuan Zhou (Eds.). ACM, 217–232. https://doi.org/10.1145/2872362.2872414
[7]
Thomas Ball and James R. Larus. 1996. Efficient Path Profiling. In Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 29, Paris, France, December 2-4, 1996, Stephen W. Melvinand Steve Beaty (Eds.). ACM/IEEE Computer Society, 46–57. https://doi.org/10.1109/MICRO.1996.566449
[8]
Jesse Benson, Ryan Cofell, Chris Frericks, Chen-Han Ho, Venkatraman Govindaraju, Tony Nowatzki, and Karthikeyan Sankaralingam. 2012. Design, integration and implementation of the DySER hardware accelerator into OpenSPARC. In 18th IEEE International Symposium on High Performance Computer Architecture, HPCA 2012, New Orleans, LA, USA, 25-29 February, 2012. IEEE Computer Society, 115–126. https://doi.org/10.1109/HPCA.2012.6168949
[9]
Ramon Bertran, Alper Buyuktosunoglu, Meeta Sharma Gupta, Marc González, and Pradip Bose. 2012. Systematic Energy Characterization of CMP/SMT Processor Systems via Automated Micro-Benchmarks. In 45th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2012, Vancouver, BC, Canada, December 1-5, 2012. IEEE Computer Society, 199–211. https://doi.org/10.1109/MICRO.2012.27
[10]
Calvin Bulla, Lluc Alvarez, Miquel Moretó, Ramon Bertran, Alper Buyuktosunoglu, and Pradip Bose. 2018. ChopStiX: Systematic Extraction of Code-Representative Microbenchmarks. In 2018 IEEE International Symposium on Workload Characterization, IISWC 2018, Raleigh, NC, USA, September 30 - October 2, 2018. IEEE Computer Society, 80–81. https://doi.org/10.1109/IISWC.2018.8573473
[11]
Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Jason Helge Anderson, Stephen Dean Brown, and Tomasz S. Czajkowski. 2011. LegUp: high-level synthesis for FPGA-based processor/accelerator systems. In Proceedings of the ACM/SIGDA 19th International Symposium on Field Programmable Gate Arrays, FPGA 2011, Monterey, California, USA, February 27, March 1, 2011, John Wawrzynek and Katherine Compton (Eds.). ACM, 33–36. https://doi.org/10.1145/1950413.1950423
[12]
Nathan Clark, Jason A. Blome, Michael L. Chu, Scott A. Mahlke, Stuart Biles, and Krisztián Flautner. 2005. An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors. In 32st International Symposium on Computer Architecture (ISCA 2005), 4-8 June 2005, Madison, Wisconsin, USA. IEEE Computer Society, 272–283. https://doi.org/10.1109/ISCA.2005.9
[13]
Nathan Clark, Manjunath Kudlur, Hyunchul Park, Scott A. Mahlke, and Krisztián Flautner. 2004. Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization. In 37th Annual International Symposium on Microarchitecture (MICRO-37 2004), 4-8 December 2004, Portland, OR, USA. IEEE Computer Society, 30–40. https://doi.org/10.1109/MICRO.2004.5
[14]
Nathan Clark, Hongtao Zhong, and Scott A. Mahlke. 2003. Processor Acceleration Through Automated Instruction Set Customization. In Proceedings of the 36th Annual International Symposium on Microarchitecture, San Diego, CA, USA, December 3-5, 2003. IEEE Computer Society, 129–140. https://doi.org/10.1109/MICRO.2003.1253189
[15]
Nathan Clark, Hongtao Zhong, and Scott A. Mahlke. 2005. Automated Custom Instruction Generation for Domain-Specific Processor Acceleration. IEEE Trans. Computers 54, 10 (2005), 1258–1270. https://doi.org/10.1109/TC.2005.156
[16]
Lieven Eeckhout, Robert H. Bell Jr., Bastiaan Stougie, Koen De Bosschere, and Lizy Kurian John. 2004. Control Flow Modeling in Statistical Simulation for Accurate and Efficient Processor Design Studies. In 31st International Symposium on Computer Architecture (ISCA 2004), 19-23 June 2004, Munich, Germany. IEEE Computer Society, 350–363. https://doi.org/10.1109/ISCA.2004.1310787
[17]
EEMBC Consortium. [n. d.]. EEMBC Autobench 2.0 Website. https://www.eembc.org/autobench/.
[18]
Joseph A. Fisher, Paolo Faraboschi, and Giuseppe Desoli. 1996. Custom-fit Processors: Letting Applications Define Architectures. In Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 29, Paris, France, December 2-4, 1996, Stephen W. Melvin and Steve Beaty (Eds.). ACM/IEEE Computer Society, 324–335. https://doi.org/10.1109/MICRO.1996.566472
[19]
Karthik Ganesan, Jungho Jo, and Lizy K. John. 2010. Synthesizing memory-level parallelism aware miniature clones for SPEC CPU2006 and ImplantBench workloads. In IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2010, 28-30 March 2010, White Plains, NY, USA. IEEE Computer Society, 33–44. https://doi.org/10.1109/ISPASS.2010.5452076
[20]
Ricardo E. Gonzalez. 2000. Xtensa: A Configurable and Extensible Processor. IEEE Micro 20, 2 (2000), 60–70. https://doi.org/10.1109/40.848473
[21]
Cecilia González-Alvarez, Jennifer B. Sartor, Carlos Álvarez, Daniel Jiménez-González, and Lieven Eeckhout. 2013. Accelerating an application domain with specialized functional units. ACM Trans. Archit. Code Optim. 10, 4 (2013), 47:1–47:25. https://doi.org/10.1145/2541228.2555303
[22]
Cecilia González-Alvarez, Jennifer B. Sartor, Carlos Álvarez, Daniel Jiménez-González, and Lieven Eeckhout. 2015. Automatic design of domain-specific instructions for low-power processors. In 26th IEEE International Conference on Application-specific Systems, Architectures and Processors, ASAP 2015, Toronto, ON, Canada, July 27-29, 2015. IEEE Computer Society, 1–8. https://doi.org/10.1109/ASAP.2015.7245697
[23]
Cecilia González-Alvarez, Jennifer B. Sartor, Carlos Álvarez, Daniel Jiménez-González, and Lieven Eeckhout. 2016. MInGLE: An Efficient Framework for Domain Acceleration Using Low-Power Specialized Functional Units. ACM Trans. Archit. Code Optim. 13, 2 (2016), 17:1–17:26. https://doi.org/10.1145/2898356
[24]
Nathan Goulding-Hotta, Jack Sampson, Ganesh Venkatesh, Saturnino Garcia, Joe Auricchio, Po-Chao Huang, Manish Arora, Siddhartha Nath, Vikram Bhatt, Jonathan Babb, Steven Swanson, and Michael Bedford Taylor. 2011. The GreenDroid Mobile Application Processor: An Architecture for Silicon’s Dark Future. IEEE Micro 31, 2 (2011), 86–95. https://doi.org/10.1109/MM.2011.18
[25]
Venkatraman Govindaraju, Chen-Han Ho, Tony Nowatzki, Jatin Chhugani, Nadathur Satish, Karthikeyan Sankaralingam, and Changkyu Kim. 2012. DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing. IEEE Micro 32, 5 (2012), 38–51. https://doi.org/10.1109/MM.2012.51
[26]
Venkatraman Govindaraju, Chen-Han Ho, and Karthikeyan Sankaralingam. 2011. Dynamically Specialized Datapaths for energy efficient computing. In 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), February 12-16 2011, San Antonio, Texas, USA. IEEE Computer Society, 503–514. https://doi.org/10.1109/HPCA.2011.5749755
[27]
Shantanu Gupta, Shuguang Feng, Amin Ansari, Scott A. Mahlke, and David I. August. 2011. Bundled execution of recurring traces for energy-efficient general purpose processing. In 44rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2011, Porto Alegre, Brazil, December 3-7, 2011, Carlo Galuzzi, Luigi Carro, Andreas Moshovos, and Milos Prvulovic (Eds.). ACM, 12–23. https://doi.org/10.1145/2155620.2155623
[28]
Chen-Han Ho, Venkatraman Govindaraju, Tony Nowatzki, Ranjini Nagaraju, Zachary Marzec, Preeti Agarwal, Chris Frericks, Ryan Cofell, and Karthikeyan Sankaralingam. 2015. Performance evaluation of a DySER FPGA prototype system spanning the compiler, microarchitecture, and hardware implementation. In 2015 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2015, Philadelphia, PA, USA, March 29-31, 2015. IEEE Computer Society, 203–214. https://doi.org/10.1109/ISPASS.2015.7095806
[29]
Justin Holewinski, Ragavendar Ramamurthi, Mahesh Ravishankar, Naznin Fauzia, Louis-Noël Pouchet, Atanas Rountev, and P. Sadayappan. 2012. Dynamic trace-based analysis of vectorization potential of applications. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, Beijing, China - June 11 - 16, 2012, Jan Vitek, Haibo Lin, and Frank Tip(Eds.). ACM, 371–382. https://doi.org/10.1145/2254064.2254108
[30]
Wen-mei W. Hwu, Scott A. Mahlke, William Y. Chen, Pohua P. Chang, Nancy J. Warter, Roger A. Bringmann, Roland G. Ouellette, Richard E. Hank, Tokuzo Kiyohara, Grant E. Haab, John G. Holm, and Daniel M. Lavery. 1993. The superblock: An effective technique for VLIW and superscalar compilation. J. Supercomput. 7, 1-2 (1993), 229–248. https://doi.org/10.1007/BF01205185
[31]
Lizy Kurian John. 2011. Proprietary code to non-proprietary benchmarks: synthesis techniques for scalable benchmarks. In ICPE’11 - Second Joint WOSP/SIPEW International Conference on Performance Engineering, Karlsruhe, Germany, March 14-16, 2011, Samuel Kounev, Vittorio Cortellessa, Raffaela Mirandola, and David J. Lilja (Eds.). ACM, 1–2. https://doi.org/10.1145/1958746.1958748
[32]
Ajay Joshi, Lieven Eeckhout, Robert H. Bell Jr., and Lizy Kurian John. 2006. Performance Cloning: A Technique for Disseminating Proprietary Applications as Benchmarks. In Proceedings of the 2006 IEEE International Symposium on Workload Characterization, IISWC 2006, October 25-27, 2006, San Jose, California, USA. IEEE Computer Society, 105–115. https://doi.org/10.1109/IISWC.2006.302734
[33]
Ajay Joshi, Lieven Eeckhout, Robert H. Bell Jr., and Lizy Kurian John. 2008. Distilling the essence of proprietary workloads into miniature benchmarks. ACM Trans. Archit. Code Optim. 5, 2 (2008), 10:1–10:33. https://doi.org/10.1145/1400112.1400115
[34]
Ajay M. Joshi, Lieven Eeckhout, and Lizy Kurian John. 2007. Exploring the Application Behavior Space Using Parameterized Synthetic Benchmarks. In 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), Brasov, Romania, September 15-19, 2007. IEEE Computer Society, 412. https://doi.org/10.1109/PACT.2007.31
[35]
Ajay M. Joshi, Lieven Eeckhout, Lizy Kurian John, and Ciji Isen. 2008. Automated microprocessor stressmark generation. In 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 16-20 February 2008, Salt Lake City, UT, USA. IEEE Computer Society, 229–239. https://doi.org/10.1109/HPCA.2008.4658642
[36]
Robert H. Bell Jr., Rajiv R. Bhatia, Lizy K. John, Jeff Stuecheli, John Griswell, Paul Tu, Louis Capps, Anton Blanchard, and Ravel Thai. 2006. Automatic testcase synthesis and performance model validation for high performance PowerPC processors. In 2006 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2006, March 19-21, 2006, Austin, Texas, USA, Proceedings. IEEE Computer Society, 154–165. https://doi.org/10.1109/ISPASS.2006.1620800
[37]
Robert H. Bell Jr. and Lizy Kurian John. 2005. Improved automatic testcase synthesis for performance model validation. In Proceedings of the 19th Annual International Conference on Supercomputing, ICS 2005, Cambridge, Massachusetts, USA, June 20-22, 2005, Arvind and Larry Rudolph (Eds.). ACM, 111–120. https://doi.org/10.1145/1088149.1088164
[38]
Snehasish Kumar, Vijayalakshmi Srinivasan, Amirali Sharifian, Nick Sumner, and Arrvindh Shriraman. 2016. Peruse and Profit: Estimating the Accelerability of Loops. In Proceedings of the 2016 International Conference on Supercomputing, ICS 2016, Istanbul, Turkey, June 1-3, 2016, Ozcan Ozturk, Kemal Ebcioglu, Mahmut T. Kandemir, and Onur Mutlu (Eds.). ACM, 21:1–21:13. https://doi.org/10.1145/2925426.2926269
[39]
Snehasish Kumar, Nick Sumner, Vijayalakshmi Srinivasan, Steve Margerm, and Arrvindh Shriraman. 2017. Needle: Leveraging Program Analysis to Analyze and Extract Accelerators from Whole Programs. In 2017 IEEE International Symposium on High Performance Computer Architecture, HPCA 2017, Austin, TX, USA, February 4-8, 2017. IEEE Computer Society, 565–576. https://doi.org/10.1109/HPCA.2017.59
[40]
Snehasish Kumar, William N. Sumner, and Arrvindh Shriraman. 2016. SPEC-AX and PARSEC-AX: extracting accelerator benchmarks from microprocessor benchmarks. In 2016 IEEE International Symposium on Workload Characterization, IISWC 2016, Providence, RI, USA, September 25-27, 2016. IEEE Computer Society, 117–127. https://doi.org/10.1109/IISWC.2016.7581272
[41]
Chris Lattner and Vikram S. Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 20-24 March 2004, San Jose, CA, USA. IEEE Computer Society, 75–88. https://doi.org/10.1109/CGO.2004.1281665
[42]
Chunho Lee, Miodrag Potkonjak, and William H. Mangione-Smith. 1997. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communicatons Systems. In Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 30, Research Triangle Park, North Carolina, USA, December 1-3, 1997, Mark Smotherman and Tom Conte (Eds.). ACM/IEEE Computer Society, 330–335. https://doi.org/10.1109/MICRO.1997.645830
[43]
Feng Liu, Heejin Ahn, Stephen R. Beard, Taewook Oh, and David I. August. 2015. DynaSpAM: dynamic spatial architecture mapping using out of order instruction schedules. In Proceedings of the 42nd Annual International Symposium on Computer Architecture, Portland, OR, USA, June 13-17, 2015, Deborah T. Marr and David H. Albonesi (Eds.). ACM, 541–553. https://doi.org/10.1145/2749469.2750414
[44]
Scott A. Mahlke, Richard E. Hank, James E. McCormick, David I. August, and Wen-mei W. Hwu. 1995. A Comparison of Full and Partial Predicated Execution Support for ILP Processors. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, ISCA ’95, Santa Margherita Ligure, Italy, June 22-24, 1995, David A. Patterson (Ed.). ACM, 138–150. https://doi.org/10.1145/223982.225965
[45]
Scott A. Mahlke, David C. Lin, William Y. Chen, Richard E. Hank, and Roger A. Bringmann. 1992. Effective compiler support for predicated execution using the hyperblock. In Proceedings of the 25th Annual International Symposium on Microarchitecture, Portland, Oregon, USA, November 1992, Wen-mei W. Hwu (Ed.). ACM / IEEE Computer Society, 45–54. https://doi.org/10.1109/MICRO.1992.696999
[46]
Mahim Mishra, Timothy J. Callahan, Tiberiu Chelcea, Girish Venkataramani, Seth C. Goldstein, and Mihai Budiu. 2006. Tartan: Evaluating Spatial Computation for Whole Program Execution. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS XII). Association for Computing Machinery, New York, NY, USA, 163–174. https://doi.org/10.1145/1168857.1168878
[47]
P. Mokri and M. Hempstead. 2020. Early-stage Automated Identification Tool for Shared Accelerators. In 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 217–217. https://doi.org/10.1109/FCCM48280.2020.00048
[48]
Tony Nowatzki, Venkatraman Govindaraju, and Karthikeyan Sankaralingam. 2015. A Graph-Based Program Representation for Analyzing Hardware Specialization Approaches. IEEE Comput. Archit. Lett. 14, 2 (2015), 94–98. https://doi.org/10.1109/LCA.2015.2476801
[49]
Angshuman Parashar, Michael Pellauer, Michael Adler, Bushra Ahsan, Neal Clayton Crago, Daniel Lustig, Vladimir Pavlov, Antonia Zhai, Mohit Gambhir, Aamer Jaleel, Randy L. Allmon, Rachid Rayess, Stephen Maresh, and Joel S. Emer. 2013. Triggered instructions: a control paradigm for spatially-programmed architectures. In The 40th Annual International Symposium on Computer Architecture, ISCA’13, Tel-Aviv, Israel, June 23-27, 2013, Avi Mendelson (Ed.). ACM, 142–153. https://doi.org/10.1145/2485922.2485935
[50]
Laura Pozzi, Kubilay Atasu, and Paolo Ienne. 2006. Exact and approximate algorithms for the extension of embedded processor instruction sets. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 25, 7(2006), 1209–1229. https://doi.org/10.1109/TCAD.2005.855950
[51]
Tristan Ravitch. 2014. WLLVM : Whole-Program LLVM. https://github.com/travitch/whole-program-llvm.
[52]
Brandon Reagen, Yakun Sophia Shao, Gu-Yeon Wei, and David M. Brooks. 2013. Quantifying acceleration: Power/performance trade-offs of application kernels in hardware. In International Symposium on Low Power Electronics and Design (ISLPED), Beijing, China, September 4-6, 2013, Pai H. Chou, Ru Huang, Yuan Xie, and Tanay Karnik (Eds.). IEEE, 395–400. https://doi.org/10.1109/ISLPED.2013.6629329
[53]
Rodrigo C. O. Rocha, Pavlos Petoumenos, Zheng Wang, Murray Cole, and Hugh Leather. 2019. Function Merging by Sequence Alignment. In IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2019, Washington, DC, USA, February 16-20, 2019, Mahmut Taylan Kandemir, Alexandra Jimborean, and Tipp Moseley(Eds.). IEEE, 149–163. https://doi.org/10.1109/CGO.2019.8661174
[54]
Yakun Sophia Shao and David M. Brooks. 2013. ISA-independent workload characterization and its implications for specialized architectures. In 2012 IEEE International Symposium on Performance Analysis of Systems & Software, Austin, TX, USA, 21-23 April, 2013. IEEE Computer Society, 245–255. https://doi.org/10.1109/ISPASS.2013.6557175
[55]
Yakun Sophia Shao, Brandon Reagen, Gu-Yeon Wei, and David M. Brooks. 2014. Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures. In ACM/IEEE 41st International Symposium on Computer Architecture, ISCA 2014, Minneapolis, MN, USA, June 14-18, 2014. IEEE Computer Society, 97–108. https://doi.org/10.1109/ISCA.2014.6853196
[56]
Yakun Sophia Shao, Brandon Reagen, Gu-Yeon Wei, and David M. Brooks. 2015. The Aladdin Approach to Accelerator Design and Modeling. IEEE Micro 35, 3 (2015), 58–70. https://doi.org/10.1109/MM.2015.50
[57]
Yakun Sophia Shao, Sam Likun Xi, Vijayalakshmi Srinivasan, Gu-Yeon Wei, and David M. Brooks. 2016. Co-designing accelerators and SoC interfaces using gem5-Aladdin. In 49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2016, Taipei, Taiwan, October 15-19, 2016. IEEE Computer Society, 48:1–48:12. https://doi.org/10.1109/MICRO.2016.7783751
[58]
Amirali Sharifian, Snehasish Kumar, Apala Guha, and Arrvindh Shriraman. 2016. Chainsaw: Von-neumann accelerators to leverage fused instruction chains. In 49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2016, Taipei, Taiwan, October 15-19, 2016. IEEE Computer Society, 49:1–49:14. https://doi.org/10.1109/MICRO.2016.7783752
[59]
Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steven Swanson, and Michael Bedford Taylor. 2010. Conservation cores: reducing the energy of mature computations. (2010), 205–218. https://doi.org/10.1145/1736020.1736044
[60]
Ganesh Venkatesh, Jack Sampson, Nathan Goulding-Hotta, Sravanthi Kota Venkata, Michael Bedford Taylor, and Steven Swanson. 2011. QSCORES: Trading dark silicon for scalable energy efficiency with quasi-specific cores. In 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 163–174.
[61]
Andrew Waterman, Yunsup Lee, Rimas Avizienis, Henry Cook, David Patterson, and Krste Asanovic. 2013. The RISC-V instruction set. 1–1. https://doi.org/10.1109/HOTCHIPS.2013.7478332
[62]
Jian Weng, Sihao Liu, Vidushi Dadu, Zhengrong Wang, Preyas Shah, and Tony Nowatzki. 2020. DSAGEN: Synthesizing Programmable Spatial Accelerators. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 268–281. https://doi.org/10.1109/ISCA45697.2020.00032
[63]
Xilinx. [n. d.]. Vitis High-Level Synthesis. https://www.xilinx.com/products/design-tools/vivado/integration/esl-design.html.
[64]
Georgios Zacharopoulos, Lorenzo Ferretti, Giovanni Ansaloni, Giuseppe Di Guglielmo, Luca P. Carloni, and Laura Pozzi. 2019. Compiler-Assisted Selection of Hardware Acceleration Candidates from Application Source Code. In 37th IEEE International Conference on Computer Design, ICCD 2019, Abu Dhabi, United Arab Emirates, November 17-20, 2019. IEEE, 129–137. https://doi.org/10.1109/ICCD46524.2019.00024
[65]
Georgios Zacharopoulos, Lorenzo Ferretti, Emanuele Giaquinta, Giovanni Ansaloni, and Laura Pozzi. 2019. RegionSeeker: Automatically Identifying and Selecting Accelerators From Application Source Code. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 38, 4(2019), 741–754. https://doi.org/10.1109/TCAD.2018.2818689

Cited By

View all
  • (2024)Automating application-driven customization of ASIPsJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2024.103080148:COnline publication date: 2-Jul-2024
  • (2024)Designing a Graphics Accelerator with Heterogeneous ArchitectureHigh-Performance Computing Systems and Technologies in Scientific Research, Automation of Control and Production10.1007/978-3-031-51057-1_3(29-40)Online publication date: 26-Jan-2024
  • (2023)APPEND: Rethinking ASIP Synthesis in the Era of AI2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247872(1-6)Online publication date: 9-Jul-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture
October 2021
1322 pages
ISBN:9781450385572
DOI:10.1145/3466752
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. accelerator discovery
  2. hardware-software co-design
  3. inline accelerator

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • DARPA

Conference

MICRO '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)241
  • Downloads (Last 6 weeks)25
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Automating application-driven customization of ASIPsJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2024.103080148:COnline publication date: 2-Jul-2024
  • (2024)Designing a Graphics Accelerator with Heterogeneous ArchitectureHigh-Performance Computing Systems and Technologies in Scientific Research, Automation of Control and Production10.1007/978-3-031-51057-1_3(29-40)Online publication date: 26-Jan-2024
  • (2023)APPEND: Rethinking ASIP Synthesis in the Era of AI2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247872(1-6)Online publication date: 9-Jul-2023
  • (2023)Near-optimal multi-accelerator architectures for predictive maintenance at the edgeFuture Generation Computer Systems10.1016/j.future.2022.10.030140:C(331-343)Online publication date: 1-Mar-2023
  • (2022)The Future Roadmap of In-Vehicle Network Processing: A HW-Centric (R-)evolutionIEEE Access10.1109/ACCESS.2022.318670810(69223-69249)Online publication date: 2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media