[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3061639.3062262acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

HyCUBE: A CGRA with Reconfigurable Single-cycle Multi-hop Interconnect

Published: 18 June 2017 Publication History

Abstract

CGRAs are promising as accelerators due to their improved energy-efficiency compared to FPGAs. Existing CGRAs support reconfigurability for operations, but not communications because of the static neighbor-to-neighbor interconnect, leading to both performance loss and increased complexity of the compiler. In this paper, we introduce HyCUBE, a novel CGRA architecture with a reconfigurable interconnect providing single-cycle communications between distant FUs, resulting in a new formulation of the application mapping problem that leads to the design of an efficient compiler. HyCUBE achieves 1.5X and 3X better performance-per-watt compared to a CGRA with standard NoC and a CGRA with neighbor-to-neighbor connectivity, respectively.

References

[1]
Arm cortex-a5. https://goo.gl/pGytB2.
[2]
Bouwens et al. Architectural exploration of the adres coarse-grained reconfigurable array. In ARC '07.
[3]
Chen et al. Algorithmic optimizations for energy efficient throughput-oriented fft architectures on fpga. In IGCC '14.
[4]
L. Chen et al. Graph minor approach for application mapping on cgras. TRETS '14.
[5]
B. De Sutter et al. Coarse-grained reconfigurable array architectures. In Handbook of signal processing systems. '13.
[6]
M. R. Guthaus et al. Mibench. In WWC-4 '01.
[7]
M. Hamzeh et al. Epimap: using epimorphism to map applications on cgras. In DAC '12.
[8]
M. Hamzeh et al. Regimap: register-aware application mapping on coarse-grained reconfigurable architectures. In DAC '13.
[9]
Kim et al. Ulp-srp: Ultra low power samsung reconfigurable processor for biomedical applications. In FPT '12.
[10]
T. Krishna et al. Breaking the on-chip latency barrier using smart. In HPCA'13.
[11]
C. Lattner et al. Llvm: A compilation framework for lifelong program analysis & transformation. In CGO '04.
[12]
B. Mei et al. Adres: An architecture with tightly coupled vliw processor and coarse-grained reconfigurable matrix. In FPL '03.
[13]
B. Mei et al. Dresc: A retargetable compiler for coarse-grained reconfigurable architectures. In FPT '02.
[14]
B. Mei et al. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. IEE-Computers and Digital Techniques '03.
[15]
Park et al. Efficient performance scaling of future cgras for mobile applications. In FPT '12.
[16]
H. Park et al. Polymorphic pipeline array: a flexible multicore accelerator with virtualized execution for mobile multimedia applications. In MICRO '09.
[17]
B. R. Rau. Iterative modulo scheduling: An algorithm for software pipelining loops. In MICRO '94.
[18]
S. Thomas et al. Cortexsuite. In IISWC '14.

Cited By

View all
  • (2024)Canalis: A Throughput-Optimized Framework for Real-Time Stream Processing of Wireless CommunicationACM Transactions on Reconfigurable Technology and Systems10.1145/369588017:4(1-32)Online publication date: 18-Sep-2024
  • (2024)R-Blocks: an Energy-Efficient, Flexible, and Programmable CGRAACM Transactions on Reconfigurable Technology and Systems10.1145/365664217:2(1-34)Online publication date: 8-Apr-2024
  • (2024)Enabling Efficient Hybrid Systolic Computation in Shared-L1-Memory Manycore ClustersIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.341548632:9(1602-1615)Online publication date: Sep-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017
June 2017
533 pages
ISBN:9781450349277
DOI:10.1145/3061639
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2017

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Singapore National Research Foundation
  • Singapore Ministry of Education

Conference

DAC '17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)122
  • Downloads (Last 6 weeks)21
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Canalis: A Throughput-Optimized Framework for Real-Time Stream Processing of Wireless CommunicationACM Transactions on Reconfigurable Technology and Systems10.1145/369588017:4(1-32)Online publication date: 18-Sep-2024
  • (2024)R-Blocks: an Energy-Efficient, Flexible, and Programmable CGRAACM Transactions on Reconfigurable Technology and Systems10.1145/365664217:2(1-34)Online publication date: 8-Apr-2024
  • (2024)Enabling Efficient Hybrid Systolic Computation in Shared-L1-Memory Manycore ClustersIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.341548632:9(1602-1615)Online publication date: Sep-2024
  • (2024)HETA: A Heterogeneous Temporal CGRA Modeling and Design Space Exploration via Bayesian OptimizationIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.334453632:3(505-518)Online publication date: Mar-2024
  • (2024)SC-CGRA: An Energy-Efficient CGRA Using Stochastic ComputingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.345331035:11(2023-2038)Online publication date: Nov-2024
  • (2024)CREPE: Concurrent Reverse-Modulo-Scheduling and Placement for CGRAsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.340209835:7(1293-1306)Online publication date: 16-May-2024
  • (2024)Efficient Orchestrated AI Workflows Execution on Scale-Out Spatial ArchitectureIEEE Transactions on Circuits and Systems for Artificial Intelligence10.1109/TCASAI.2024.34762371:2(229-243)Online publication date: Dec-2024
  • (2024)A Comprehensive Dataflow-Mapping Optimization for Fully Pipelined Execution in Spatial Programmable ArchitectureIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.340965343:12(4640-4652)Online publication date: Dec-2024
  • (2024)Rethinking Control Flow in Spatial Architectures: Insights into Control Flow Plane DesignIEEE Transactions on Computers10.1109/TC.2024.3475582(1-14)Online publication date: 2024
  • (2024)ICED: An Integrated CGRA Framework Enabling DVFS-Aware Acceleration2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00099(1338-1352)Online publication date: 2-Nov-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media