Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleSeptember 2024
CHARM 2.0: Composing Heterogeneous Accelerators for Deep Learning on Versal ACAP Architecture
- research-articleDecember 2023
TAPA: A Scalable Task-parallel Dataflow Programming Framework for Modern FPGAs with Co-optimization of HLS and Physical Design
- Licheng Guo,
- Yuze Chi,
- Jason Lau,
- Linghao Song,
- Xingyu Tian,
- Moazin Khatti,
- Weikang Qiao,
- Jie Wang,
- Ecenur Ustun,
- Zhenman Fang,
- Zhiru Zhang,
- Jason Cong
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 16, Issue 4Article No.: 63, Pages 1–31https://doi.org/10.1145/3609335In this article, we propose TAPA, an end-to-end framework that compiles a C++ task-parallel dataflow program into a high-frequency FPGA accelerator. Compared to existing solutions, TAPA has two major advantages. First, TAPA provides a set of convenient ...
- research-articleSeptember 2023
RapidStream 2.0: Automated Parallel Implementation of Latency–Insensitive FPGA Designs Through Partial Reconfiguration
- Licheng Guo,
- Pongstorn Maidee,
- Yun Zhou,
- Chris Lavin,
- Eddie Hung,
- Wuxi Li,
- Jason Lau,
- Weikang Qiao,
- Yuze Chi,
- Linghao Song,
- Yuanlong Xiao,
- Alireza Kaviani,
- Zhiru Zhang,
- Jason Cong
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 16, Issue 4Article No.: 59, Pages 1–30https://doi.org/10.1145/3593025Field-programmable gate arrays (FPGAs) require a much longer compilation cycle than conventional computing platforms such as CPUs. In this article, we shorten the overall compilation time by co-optimizing the HLS compilation (C-to-RTL) and the back-end ...
- research-articleJuly 2023
TARO: Automatic Optimization for Free-Running Kernels in FPGA High-Level Synthesis
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCADICS), Volume 42, Issue 7Pages 2423–2427https://doi.org/10.1109/TCAD.2022.3216544Streaming applications have become one of the key application domains for high-level synthesis (HLS) tools. For a streaming application, there is a potential to simplify the control logic by regulating each task with a stream of input and output data. ...
- research-articleFebruary 2023
CHARM: Composing Heterogeneous AcceleRators for Matrix Multiply on Versal ACAP Architecture
- Jinming Zhuang,
- Jason Lau,
- Hanchen Ye,
- Zhuoping Yang,
- Yubo Du,
- Jack Lo,
- Kristof Denolf,
- Stephen Neuendorffer,
- Alex Jones,
- Jingtong Hu,
- Deming Chen,
- Jason Cong,
- Peipei Zhou
FPGA '23: Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate ArraysPages 153–164https://doi.org/10.1145/3543622.3573210Dense matrix multiply (MM) serves as one of the most heavily used kernels in deep learning applications. To cope with the high computation demands of these applications, heterogeneous architectures featuring both FPGA and dedicated ASIC accelerators have ...
- research-articleAugust 2022
FPGA HLS Today: Successes, Challenges, and Opportunities
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 15, Issue 4Article No.: 51, Pages 1–42https://doi.org/10.1145/3530775The year 2011 marked an important transition for FPGA high-level synthesis (HLS), as it went from prototyping to deployment. A decade later, in this article, we assess the progress of the deployment of HLS technology and highlight the successes in several ...
- research-articleFebruary 2022
Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication
FPGA '22: Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysPages 65–77https://doi.org/10.1145/3490422.3502357Sparse-Matrix Dense-Matrix multiplication (SpMM) is the key operator for a wide range of applications including scientific computing, graph processing, and deep learning. Architecting accelerators for SpMM is faced with three challenges - (1) the random ...
- research-articleFebruary 2021Best Paper
AutoBridge: Coupling Coarse-Grained Floorplanning and Pipelining for High-Frequency HLS Design on Multi-Die FPGAs
FPGA '21: The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysPages 81–92https://doi.org/10.1145/3431920.3439289Despite an increasing adoption of high-level synthesis (HLS) for its design productivity advantages, there remains a significant gap in the achievable clock frequency between an HLS-generated design and a handcrafted RTL one. A key factor that limits ...
- research-articleNovember 2020
Analysis and optimization of the implicit broadcasts in FPGA HLS to improve maximum frequency
DAC '20: Proceedings of the 57th ACM/EDAC/IEEE Design Automation ConferenceArticle No.: 35, Pages 1–6Designs generated by high-level synthesis (HLS) tools typically achieve a lower frequency compared to manual RTL designs. In this work, we study the timing issues in a diverse set of realistic and complex FPGA HLS designs. (1) We observe that in almost ...
HeteroRefactor: refactoring for heterogeneous computing with FPGA
ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software EngineeringPages 493–505https://doi.org/10.1145/3377811.3380340Heterogeneous computing with field-programmable gate-arrays (FPGAs) has demonstrated orders of magnitude improvement in computing efficiency for many applications. However, the use of such platforms so far is limited to a small subset of programmers with ...
- posterFebruary 2020
Analysis and Optimization of the Implicit Broadcasts in FPGA HLS to Improve Maximum Frequency
FPGA '20: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysPage 311https://doi.org/10.1145/3373087.3375332Designs generated by high-level synthesis (HLS) tools typically achieve a lower frequency compared to manual RTL designs. We study the timing issues in a diverse set of nine realistic HLS designs and observe that in most cases the frequency degradation ...