[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

HierCGRA: A Novel Framework for Large-scale CGRA with Hierarchical Modeling and Automated Design Space Exploration

Published: 10 May 2024 Publication History

Abstract

Coarse-grained reconfigurable arrays (CGRAs) are promising design choices in computation-intensive domains, since they can strike a balance between energy efficiency and flexibility. A typical CGRA comprises processing elements (PEs) that can execute operations in applications and interconnections between them. Nevertheless, most CGRAs suffer from the ineffectiveness of supporting flexible architecture design and solving large-scale mapping problems. To address these challenges, we introduce HierCGRA, a novel framework that integrates hierarchical CGRA modeling, Chisel-based Verilog generation, LLVM-based data flow graph (DFG) generation, DFG mapping, and design space exploration (DSE). With the graph homomorphism (GH) mapping algorithm, HierCGRA achieves a faster mapping speed and higher PE utilization rate compared with the existing state-of-the-art CGRA frameworks. The proposed hierarchical mapping strategy achieves 41× speedup on average compared with the ILP mapping algorithm in CGRA-ME. Furthermore, the automated DSE based on Bayesian optimization achieves a significant performance improvement by the heterogeneity of PEs and interconnections. With these features, HierCGRA enables the agile development for large-scale CGRA and accelerates the process of finding a better CGRA architecture.

References

[1]
Ensieh Aliagha and Diana Göhringer. 2022. Energy efficient design of coarse-grained reconfigurable architectures: Insights, trends and challenges. In International Conference on Field-Programmable Technology (ICFPT’22). 1–11. DOI:
[2]
Jason Anderson, Rami Beidas, Vimal Chacko, Hsuan Hsiao, Xiaoyi Ling, Omar Ragheb, Xinyuan Wang, and Tianyi Yu. 2021. CGRA-ME: An open-source framework for CGRA architecture and CAD research. In IEEE 32nd International Conference on Application-specific Systems, Architectures and Processors (ASAP’21). 156–162. DOI:
[3]
Mahesh Balasubramanian and Aviral Shrivastava. 2022. PathSeeker: A fast mapping algorithm for CGRAs. In Design, Automation & Test in Europe Conference & Exhibition (DATE’22). IEEE, 268–273.
[4]
Thilini Kaushalya Bandara, Dhananjaya Wijerathne, Tulika Mitra, and Li-Shiuan Peh. 2022. REVAMP: A systematic framework for heterogeneous CGRA realization. In 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’22). Association for Computing Machinery, New York, NY, 918–932. DOI:
[5]
Una Benlic and Jin-Kao Hao. 2011. An effective multilevel tabu search approach for balanced graph partitioning. Comput. Operat. Res. 38, 7 (2011), 1066–1075.
[6]
Una Benlic and Jin-Kao Hao. 2011. A multilevel memetic approach for improving graph k-partitions. IEEE Trans. Evolut. Computat. 15, 5 (2011), 624–642.
[7]
Dimo Brockhoff, Tobias Friedrich, and Frank Neumann. 2008. Analyzing hypervolume indicator based algorithms. In 10th International Conference on Parallel Problem Solving from Nature (PPSN’08). Springer, 651–660.
[8]
Stephen P. Brooks and Byron J. T. Morgan. 1995. Optimization using simulated annealing. J. R. Stat. Societ.: Series D (Statistic.) 44, 2 (1995), 241–257.
[9]
Michael Canesche, Marcelo Menezes, Westerley Carvalho, Frank Sill Torres, Peter Jamieson, José Augusto Nacif, and Ricardo Ferreira. 2020. Traversal: A fast and adaptive graph-based placement and routing for CGRAs. IEEE Trans. Comput.-aid. Des. Integ. Circ. Syst. 40, 8 (2020), 1600–1612.
[10]
Pak K. Chan, Martine D. F. Schlag, and Jason Y. Zien. 1993. Spectral k-way ratio-cut partitioning and clustering. In 30th international Design Automation Conference. 749–754.
[11]
George Charitopoulos, Ioannis Papaefstathiou, and Dionisios N. Pnevmatikatos. 2021. Creating customized CGRAs for scientific applications. Electronics 10, 4 (2021). DOI:
[12]
Liang Chen and Tulika Mitra. 2014. Graph minor approach for application mapping on CGRAs. ACM Trans. Reconfig. Technol. Syst. 7, 3 (2014), 1–25.
[13]
Lawrence T. Clark, Vinay Vashishtha, Lucian Shifren, Aditya Gujja, Saurabh Sinha, Brian Cline, Chandarasekaran Ramamurthy, and Greg Yeric. 2016. ASAP7: A 7-nm finFET predictive process design kit. Microelectron. J. 53 (2016), 105–115.
[14]
Luigi P. Cordella, Pasquale Foggia, Carlo Sansone, and Mario Vento. 2004. A (sub) graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26, 10 (2004), 1367–1372.
[15]
Shail Dave, Mahesh Balasubramanian, and Aviral Shrivastava. 2018. RAMP: Resource-aware mapping for CGRAs. In 55th ACM/ESDA/IEEE Design Automation Conference (DAC’18). 1–6. DOI:
[16]
K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evolut. Computat. 6, 2 (2002), 182–197. DOI:
[17]
Caleb Donovick, Makai Mann, Clark Barrett, and Pat Hanrahan. 2019. Agile SMT-based mapping for CGRAs with restricted routing networks. In International Conference on ReConFigurable Computing and FPGAs (ReConFig’19). IEEE, 1–8.
[18]
Yiran Du, Wei Li, Zibin Dai, and Longmei Nan. 2020. PVHArray: An energy-efficient reconfigurable cryptographic logic array with intelligent mapping. IEEE Trans. Very Large Scale Integ. Syst. 28, 5 (2020), 1302–1315.
[19]
Murali Emani, Venkatram Vishwanath, Corey Adams, Michael E. Papka, Rick Stevens, Laura Florescu, Sumti Jairath, William Liu, Tejas Nama, and Arvind Sujeeth. 2021. Accelerating scientific applications with SambaNova reconfigurable dataflow architecture. Comput. Sci. Eng. 23, 2 (2021), 114–119.
[20]
Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2011. Dark silicon and the end of multicore scaling. In 38th Annual International Symposium on Computer Architecture (ISCA’11). Association for Computing Machinery, New York, NY, 365–376. DOI:
[21]
Xitian Fan, Huimin Li, Wei Cao, and Lingli Wang. 2016. DT-CGRA: Dual-track coarse-grained reconfigurable architecture for stream applications. In 26th International Conference on Field Programmable Logic and Applications (FPL’16). IEEE, 1–9.
[22]
Stephen Friedman, Allan Carroll, Brian Van Essen, Benjamin Ylvisaker, Carl Ebeling, and Scott Hauck. 2009. SPR: An architecture-adaptive CGRA mapping tool. In ACM/SIGDA International Symposium on Field Programmable Gate Arrays. 191–200.
[23]
Tong Geng, Chunshu Wu, Cheng Tan, Bo Fang, Ang Li, and Martin Herbordt. 2020. CQNN: A CGRA-based QNN Framework. In IEEE High Performance Extreme Computing Conference (HPEC’20). 1–7. DOI:
[24]
Yijiang Guo, Jiarui Wang, Jiaxi Zhang, and Guojie Luo. 2021. Formulating data-arrival synchronizers in integer linear programming for CGRA mapping. In 58th ACM/IEEE Design Automation Conference (DAC’21). 943–948. DOI:
[25]
Dan Gusfield. 2002. Partition-distance: A problem and class of perfect graphs arising in clustering. Inf. Process. Lett. 82, 3 (2002), 159–164.
[26]
Geňa Hahn and Claude Tardif. 1997. Graph homomorphisms: Structure and symmetry. In Graph Symmetry. Springer, 107–166.
[27]
Mahdi Hamzeh, Aviral Shrivastava, and Sarma Vrudhula. 2012. EPIMap: Using epimorphism to map applications on CGRAs. In 49th Annual Design Automation Conference. 1284–1291.
[28]
Mahdi Hamzeh, Aviral Shrivastava, and Sarma Vrudhula. 2013. REGIMap: Register-aware application mapping on coarse-grained reconfigurable architectures (CGRAs). In 50th Annual Design Automation Conference. 1–10.
[29]
John L. Hennessy and David A. Patterson. 2019. A new golden age for computer architecture. Commun. ACM 62, 2 (2019), 48–60.
[30]
Li Jiao, Cheng Luo, Wei Cao, Xuegong Zhou, and Lingli Wang. 2017. Accelerating low bit-width convolutional neural networks with embedded FPGA. In 27th International Conference on Field Programmable Logic and Applications (FPL’17). IEEE, 1–4.
[31]
Manupa Karunaratne, Aditi Kulkarni Mohite, Tulika Mitra, and Li-Shiuan Peh. 2017. HyCUBE: A CGRA with reconfigurable single-cycle multi-hop interconnect. In 54th Annual Design Automation Conference. 1–6.
[32]
Reza Kazerooni-Zand, Mehdi Kamal, Ali Afzali-Kusha, and Massoud Pedram. 2023. Memristive-based mixed-signal CGRA for accelerating deep neural network inference. ACM Trans. Des. Autom. Electron. Syst. (May 2023). DOI:
[33]
B. W. Kernighan and S. Lin. 1970. An efficient heuristic procedure for partitioning graphs. Bell Syst. Technic. J. 49, 2 (1970), 291–307. DOI:
[34]
Takuya Kojima, Boma Adhi, Carlos Cortes, Yiyu Tan, and Kentaro Sano. 2022. An architecture- independent CGRA compiler enabling OpenMP applications. In IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW’22). 631–638. DOI:
[35]
Takuya Kojima, Ayaka Ohwada, and Hideharu Amano. 2022. Mapping-aware kernel partitioning method for CGRAs assisted by deep learning. IEEE Trans. Parallel Distrib. Syst. 33, 5 (2022), 1213–1230. DOI:
[36]
Kalhan Koul, Jackson Melchert, Kavya Sreedhar, Leonard Truong, Gedeon Nyengele, Keyi Zhang, Qiaoyi Liu, Jeff Setter, Po-Han Chen, Yuchen Mei, Maxwell Strange, Ross Daly, Caleb Donovick, Alex Carsello, Taeyoung Kong, Kathleen Feng, Dillon Huff, Ankita Nayak, Rajsekhar Setaluri, James Thomas, Nikhil Bhagdikar, David Durst, Zachary Myers, Nestan Tsiskaridze, Stephen Richardson, Rick Bahr, Kayvon Fatahalian, Pat Hanrahan, Clark Barrett, Mark Horowitz, Christopher Torng, Fredrik Kjolstad, and Priyanka Raina. 2023. AHA: An agile approach to the design of coarse-grained reconfigurable accelerators and compilers. ACM Trans. Embed. Comput. Syst. 22, 2, Article 35 (Jan. 2023), 34 pages. DOI:
[37]
C. Lattner and V. Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization (CGO’04). 75–86. DOI:
[38]
Jingyuan Li, Yunhui Qiu, Guowei Zhu, Qilong Zhu, Wenbo Yin, and Lingli Wang. 2023. THRAM: A template-based heterogeneous CGRA modeling framework supporting fast DSE. In IEEE International Symposium on Circuits and Systems (ISCAS’23). DOI:
[39]
Leibo Liu, Jianfeng Zhu, Zhaoshi Li, Yanan Lu, Yangdong Deng, Jie Han, Shouyi Yin, and Shaojun Wei. 2019. A survey of coarse-grained reconfigurable architecture and design: Taxonomy, challenges, and applications. ACM Comput. Surv. 52, 6 (2019), 1–39.
[40]
Xingchen Man, Jianfeng Zhu, Guihuan Song, Shouyi Yin, Shaojun Wei, and Leibo Liu. 2022. CaSMap: Agile mapper for reconfigurable spatial architectures by automatically clustering intermediate representations and scattering mapping process. In 49th Annual International Symposium on Computer Architecture. 259–273.
[41]
Larry McMurchie and Carl Ebeling. 1995. PathFinder: A negotiation-based performance-driven router for FPGAs. In ACM 3rd International Symposium on Field-programmable Gate Arrays. 111–117.
[42]
Bingfeng Mei, Serge Vernalde, Diederik Verkest, Hugo De Man, and Rudy Lauwereins. 2003. ADRES: An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix. In International Conference on Field Programmable Logic and Applications. Springer, 61–70.
[43]
Kevin E. Murray, Oleg Petelin, Sheng Zhong, Jia Min Wang, and Vaughn Betz. 2020. VTR 8: High-performance CAD and customizable FPGA architecture modelling. ACM Trans. Reconfig. Technol. Syst. 13, 2 (2020), 1–55.
[44]
S. B. University of California. 2021. EXPRESS Benchmarks. Website. Retrieved from: https://web.ece.ucsb.edu/EXPRESS/benchmark/
[45]
Yoshihiko Ozaki, Yuki Tanigaki, Shuhei Watanabe, and Masaki Onishi. 2020. Multiobjective tree-structured Parzen estimator for computationally expensive optimization problems. In Genetic and Evolutionary Computation Conference. 533–541.
[46]
Y. Qiu, Y. Cao, Y. Dai, W. Yin, and L. Wang. 2022. TRAM: An open-source template-based reconfigurable architecture modeling framework. In 32nd International Conference on Field-Programmable Logic and Applications (FPL’22). IEEE Computer Society, 61–69. DOI:
[47]
Tongzhou Qu, Zibin Dai, Yanjiang Liu, and Lin Chen. 2022. A high flexible shift transformation unit design approach for coarse-grained reconfigurable cryptographic arrays. Electronics 11, 19 (2022). DOI:
[48]
Gokhan Sayilar and Derek Chiou. 2014. Cryptoraptor: High throughput reconfigurable cryptographic processor. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD’14). IEEE, 155–161.
[49]
Weiguang Sheng, Weifeng He, Jianfei Jiang, and Zhigang Mao. 2012. Pareto optimal temporal partition methodology for reconfigurable architectures based on multi-objective genetic algorithm. In IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum. 425–430. DOI:
[50]
Kaichuang Shi, Xuegong Zhou, Hao Zhou, and Lingli Wang. 2022. An optimized GIB routing architecture with bent wires for FPGA. ACM Trans. Reconfigurable Technol. Syst. 16, 1, Article 2 (Dec. 2022), 28 pages. DOI:
[51]
Cheng Tan, Nicolas Bohm Agostini, Jeff Zhang, Marco Minutoli, Vito Giovanni Castellana, Chenhao Xie, Tong Geng, Ang Li, Kevin Barker, and Antonino Tumeo. 2021. OpenCGRA: Democratizing coarse-grained reconfigurable arrays. In IEEE 32nd International Conference on Application-specific Systems, Architectures and Processors (ASAP’21). 149–155. DOI:
[52]
Christopher Torng, Peitian Pan, Yanghui Ou, Cheng Tan, and Christopher Batten. 2021. Ultra-elastic CGRAs for irregular loop specialization. In IEEE International Symposium on High-Performance Computer Architecture (HPCA’21). 412–425. DOI:
[53]
Fengbin Tu, Weiwei Wu, Shouyi Yin, Leibo Liu, and Shaojun Wei. 2018. RANA: Towards efficient neural acceleration with refresh-optimized embedded DRAM. In ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA’18). IEEE, 340–352.
[54]
University of California Berkeley. 2020. Chisel-tester2. Retrieved from: https://github.com/ucb-bar/chiseltest/
[55]
Matthew J. P. Walker and Jason H. Anderson. 2019. Generic connectivity-based CGRA mapping via integer linear programming. In IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’19). IEEE, 65–73.
[56]
Yongchen Wang, Ying Wang, Huawei Li, Cong Shi, and Xiaowei Li. 2019. Systolic cube: A spatial 3D CNN accelerator architecture for low power video analysis. In 56th Annual Design Automation Conference. 1–6.
[57]
Dhananjaya Wijerathne, Zhaoying Li, Thilini Kaushalya Bandara, and Tulika Mitra. 2022. PANORAMA: Divide-and-conquer approach for mapping complex loop kernels on CGRA. In 59th ACM/IEEE Design Automation Conference (DAC’22). Association for Computing Machinery, New York, NY, 127–132. DOI:
[58]
Dhananjaya Wijerathne, Zhaoying Li, Anuj Pathania, Tulika Mitra, and Lothar Thiele. 2021. HiMap: Fast and scalable high-quality mapping on CGRA via hierarchical abstraction. IEEE Trans. Comput.-aid. Des. Integ. Circ. Syst. 41, 10 (2021), 3290--3303.
[59]
Max Willsey, Vincent T. Lee, Alvin Cheung, Rastislav Bodík, and Luis Ceze. 2019. Iterative search for reconfigurable accelerator blocks with a compiler in the loop. IEEE Trans. Comput.-aid. Des. Integ. Circ. Syst. 38, 3 (2019), 407–418. DOI:
[60]
Su Zheng, Kaisen Zhang, Yaoguang Tian, Wenbo Yin, Lingli Wang, and Xuegong Zhou. 2021. FastCGRA: A modeling, evaluation, and exploration platform for large-scale coarse-grained reconfigurable arrays. In International Conference on Field-Programmable Technology (ICFPT’21). IEEE, 157–166.
[61]
Eckart Zitzler, Marco Laumanns, and Lothar Thiele. 2001. SPEA2: Improving the strength Pareto evolutionary algorithm. Retrieved from: https://api.semanticscholar.org/CorpusID:16584254
[62]
Luca Zulberti, Matteo Monopoli, Pietro Nannipieri, and Luca Fanucci. 2022. Architectural implications for inference of graph neural networks on CGRA-based accelerators. In 17th Conference on Ph.D Research in Microelectronics and Electronics (PRIME’22). 373–376. DOI:

Index Terms

  1. HierCGRA: A Novel Framework for Large-scale CGRA with Hierarchical Modeling and Automated Design Space Exploration

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Reconfigurable Technology and Systems
      ACM Transactions on Reconfigurable Technology and Systems  Volume 17, Issue 2
      June 2024
      464 pages
      EISSN:1936-7414
      DOI:10.1145/3613550
      • Editor:
      • Deming Chen
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 10 May 2024
      Online AM: 08 April 2024
      Accepted: 28 March 2024
      Revised: 29 December 2023
      Received: 16 June 2023
      Published in TRETS Volume 17, Issue 2

      Check for updates

      Author Tags

      1. CGRA modeling
      2. hierarchical mapping
      3. automated design space exploration

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 622
        Total Downloads
      • Downloads (Last 12 months)622
      • Downloads (Last 6 weeks)117
      Reflects downloads up to 10 Dec 2024

      Other Metrics

      Citations

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media