More Web Proxy on the site http://driver.im/

research-article

HyCUBE: A CGRA with Reconfigurable Single-cycle Multi-hop Interconnect

Authors:

Manupa Karunaratne,

Aditi Kulkarni Mohite,

Li-Shiuan PehAuthors Info & Claims

DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017

Article No.: 45, Pages 1 - 6

https://doi.org/10.1145/3061639.3062262

Published: 18 June 2017 Publication History

Abstract

CGRAs are promising as accelerators due to their improved energy-efficiency compared to FPGAs. Existing CGRAs support reconfigurability for operations, but not communications because of the static neighbor-to-neighbor interconnect, leading to both performance loss and increased complexity of the compiler. In this paper, we introduce HyCUBE, a novel CGRA architecture with a reconfigurable interconnect providing single-cycle communications between distant FUs, resulting in a new formulation of the application mapping problem that leads to the design of an efficient compiler. HyCUBE achieves 1.5X and 3X better performance-per-watt compared to a CGRA with standard NoC and a CGRA with neighbor-to-neighbor connectivity, respectively.

References

[1]

Arm cortex-a5. https://goo.gl/pGytB2.

[2]

Bouwens et al. Architectural exploration of the adres coarse-grained reconfigurable array. In ARC '07.

Digital Library

[3]

Chen et al. Algorithmic optimizations for energy efficient throughput-oriented fft architectures on fpga. In IGCC '14.

[4]

L. Chen et al. Graph minor approach for application mapping on cgras. TRETS '14.

Digital Library

[5]

B. De Sutter et al. Coarse-grained reconfigurable array architectures. In Handbook of signal processing systems. '13.

[6]

M. R. Guthaus et al. Mibench. In WWC-4 '01.

[7]

M. Hamzeh et al. Epimap: using epimorphism to map applications on cgras. In DAC '12.

Digital Library

[8]

M. Hamzeh et al. Regimap: register-aware application mapping on coarse-grained reconfigurable architectures. In DAC '13.

Digital Library

[9]

Kim et al. Ulp-srp: Ultra low power samsung reconfigurable processor for biomedical applications. In FPT '12.

[10]

T. Krishna et al. Breaking the on-chip latency barrier using smart. In HPCA'13.

Digital Library

[11]

C. Lattner et al. Llvm: A compilation framework for lifelong program analysis & transformation. In CGO '04.

Digital Library

[12]

B. Mei et al. Adres: An architecture with tightly coupled vliw processor and coarse-grained reconfigurable matrix. In FPL '03.

[13]

B. Mei et al. Dresc: A retargetable compiler for coarse-grained reconfigurable architectures. In FPT '02.

[14]

B. Mei et al. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. IEE-Computers and Digital Techniques '03.

[15]

Park et al. Efficient performance scaling of future cgras for mobile applications. In FPT '12.

[16]

H. Park et al. Polymorphic pipeline array: a flexible multicore accelerator with virtualized execution for mobile multimedia applications. In MICRO '09.

Digital Library

[17]

B. R. Rau. Iterative modulo scheduling: An algorithm for software pipelining loops. In MICRO '94.

Digital Library

[18]

S. Thomas et al. Cortexsuite. In IISWC '14.

Cited By

Chen KMason Nelson TKhadem AFayazi MSingapuram SDreslinski RTalati NKim HBlaauw D(2024)Canalis: A Throughput-Optimized Framework for Real-Time Stream Processing of Wireless CommunicationACM Transactions on Reconfigurable Technology and Systems10.1145/369588017:4(1-32)Online publication date: 18-Sep-2024
https://dl.acm.org/doi/10.1145/3695880
de Bruin BVadivel KWijtvliet MJääskeläinen PCorporaal H(2024)R-Blocks: an Energy-Efficient, Flexible, and Programmable CGRAACM Transactions on Reconfigurable Technology and Systems10.1145/365664217:2(1-34)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1145/3656642
Mazzola SRiedel SBenini L(2024)Enabling Efficient Hybrid Systolic Computation in Shared-L1-Memory Manycore ClustersIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.341548632:9(1602-1615)Online publication date: Sep-2024
https://doi.org/10.1109/TVLSI.2024.3415486
Show More Cited By

Recommendations

Synthesizable Standard Cell FPGA Fabrics Targetable by the Verilog-to-Routing CAD Flow
Special Section on Field Programmable Logic and Applications 2015 and Regular Papers

In this article, we consider implementing field-programmable gate arrays (FPGAs) using a standard cell design methodology and present a framework for the automated generation of synthesizable FPGA fabrics. The open-source Verilog-to-Routing (VTR) FPGA ...
Embedded SoPC Design with Nios II Processor and Verilog Examples
Embedded Design Using Programmable Gate Arrays

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017

June 2017

533 pages

ISBN:9781450349277

DOI:10.1145/3061639

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

EDAC: Electronic Design Automation Consortium
SIGDA: ACM Special Interest Group on Design Automation
IEEE-CEDA

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Singapore National Research Foundation
Singapore Ministry of Education

Conference

DAC '17

Sponsor:

EDAC
SIGDA

DAC '17: The 54th Annual Design Automation Conference 2017

June 18 - 22, 2017

TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

144
Total Citations
View Citations
747
Total Downloads

Downloads (Last 12 months)122
Downloads (Last 6 weeks)21

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chen KMason Nelson TKhadem AFayazi MSingapuram SDreslinski RTalati NKim HBlaauw D(2024)Canalis: A Throughput-Optimized Framework for Real-Time Stream Processing of Wireless CommunicationACM Transactions on Reconfigurable Technology and Systems10.1145/369588017:4(1-32)Online publication date: 18-Sep-2024
https://dl.acm.org/doi/10.1145/3695880
de Bruin BVadivel KWijtvliet MJääskeläinen PCorporaal H(2024)R-Blocks: an Energy-Efficient, Flexible, and Programmable CGRAACM Transactions on Reconfigurable Technology and Systems10.1145/365664217:2(1-34)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1145/3656642
Mazzola SRiedel SBenini L(2024)Enabling Efficient Hybrid Systolic Computation in Shared-L1-Memory Manycore ClustersIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.341548632:9(1602-1615)Online publication date: Sep-2024
https://doi.org/10.1109/TVLSI.2024.3415486
Dai YLi JZhu QQiu YHu YYin WWang L(2024)HETA: A Heterogeneous Temporal CGRA Modeling and Design Space Exploration via Bayesian OptimizationIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.334453632:3(505-518)Online publication date: Mar-2024
https://doi.org/10.1109/TVLSI.2023.3344536
Mou DWang BLiu D(2024)SC-CGRA: An Energy-Efficient CGRA Using Stochastic ComputingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.345331035:11(2023-2038)Online publication date: Nov-2024
https://doi.org/10.1109/TPDS.2024.3453310
Sunny CDas SMartin KCoussy P(2024)CREPE: Concurrent Reverse-Modulo-Scheduling and Placement for CGRAsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.340209835:7(1293-1306)Online publication date: 16-May-2024
https://dl.acm.org/doi/10.1109/TPDS.2024.3402098
Deng JTang XYue ZLu GYang QZhang JLi JLi CWei SHu YYin S(2024)Efficient Orchestrated AI Workflows Execution on Scale-Out Spatial ArchitectureIEEE Transactions on Circuits and Systems for Artificial Intelligence10.1109/TCASAI.2024.34762371:2(229-243)Online publication date: Dec-2024
https://doi.org/10.1109/TCASAI.2024.3476237
Liu PLi AChen LJiang JWang QMao ZJing N(2024)A Comprehensive Dataflow-Mapping Optimization for Fully Pipelined Execution in Spatial Programmable ArchitectureIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.340965343:12(4640-4652)Online publication date: Dec-2024
https://doi.org/10.1109/TCAD.2024.3409653
Deng JTang XZhang JLi YZhang LTu FWei SHu YYin S(2024)Rethinking Control Flow in Spatial Architectures: Insights into Control Flow Plane DesignIEEE Transactions on Computers10.1109/TC.2024.3475582(1-14)Online publication date: 2024
https://doi.org/10.1109/TC.2024.3475582
Tan CJiang MPatil DOu YLi ZJu LMitra TPark HTumeo AZhang J(2024)ICED: An Integrated CGRA Framework Enabling DVFS-Aware Acceleration2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00099(1338-1352)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00099
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents