[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3490422.3502361acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article
Open access

RapidStream: Parallel Physical Implementation of FPGA HLS Designs

Published: 11 February 2022 Publication History

Abstract

FPGAs require a much longer compilation cycle than conventional computing platforms like CPUs. In this paper, we shorten the overall compilation time by co-optimizing the HLS compilation (C-to-RTL) and the back-end physical implementation (RTL-to-bitstream). We propose a split compilation approach based on the pipelining flexibility at the HLS level, which allows us to partition designs for parallel placement and routing then stitch the separate partitions together. We outline a number of technical challenges and address them by breaking the conventional boundaries between different stages of the traditional FPGA tool flow and reorganizing them to achieve a fast end-to-end compilation. Our research produces RapidStream, a parallelized and physical-integrated compilation framework that takes in an HLS dataflow program in C/C++ and generates a fully placed and routed implementation. When tested on the Xilinx U250 FPGA with a set of realistic HLS designs, RapidStream achieves a 5-7X reduction in compile time and up to 1.3X increase in frequency when compared to a commercial-off-the-shelf toolchain. In addition, we provide preliminary results using a customized open-source router to reduce the compile time up to an order of magnitude in the cases with lower performance requirements. The tool is open-sourced at github.com/Licheng-Guo/RapidStream.

Supplementary Material

MP4 File (rapidstream-record-1080p.mp4)
In this video, we shorten the overall FPGA compilation time by co-optimizing the HLS compilation (C-to-RTL) and the back-end physical implementation (RTL-to-bitstream). We propose RapidStream, a split compilation approach based on the pipelining flexibility at the HLS level, which allows us to partition designs for parallel placement and routing then stitch the separate partitions together. When tested on the Xilinx U250 FPGA with a set of realistic HLS designs, RapidStream achieves a 5-7X reduction in compile-time and up to 1.3X increase in frequency when compared to a commercial-off-the-shelf toolchain. In addition, we provide preliminary results using a customized open-source router to reduce the compile time up to an order of magnitude in the cases with lower performance requirements.

References

[1]
Matthew An, J Gregory Steffan, and Vaughn Betz. 2014. Speeding up FPGA placement: Parallel algorithms and methods. In 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines. IEEE, 178--185.
[2]
Melvin A Breuer. 1977. A class of min-cut placement algorithms. In Proceedings of the 14th Design Automation Conference. 284--290.
[3]
Davor Capalija and Tarek S Abdelrahman. 2011. Towards synthesis-free JIT compilation to commodity FPGAs. In 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines. IEEE, 202--205.
[4]
Davor Capalija and Tarek S Abdelrahman. 2013. A high-performance overlay architecture for pipelined execution of data flow graphs. In 2013 23rd International Conference on Field programmable Logic and Applications. IEEE, 1--8.
[5]
Luca P Carloni, Kenneth L McMillan, and Alberto L Sangiovanni-Vincentelli. 2001. Theory of latency-insensitive design. IEEE Transactions on computer-aided design of integrated circuits and systems 20, 9 (2001), 1059--1076.
[6]
Tony Chan, Jason Cong, and Kenton Sze. 2005. Multilevel generalized forcedirected method for circuit placement. In Proceedings of the 2005 international symposium on physical design. 185--192.
[7]
Chandra Chekuri. 2010. https://courses.engr.illinois.edu/cs598csc/sp2010/ Lectures/Lecture11.pdf
[8]
Xinyu Chen, Hongshi Tan, Yao Chen, Bingsheng He, Weng-Fai Wong, and Deming Chen. 2021. ThunderGP: HLS-based graph processing framework on fpgas. In The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 69--80.
[9]
Jianyi Cheng, Lana Josipovic, George A Constantinides, Paolo Ienne, and John Wickerson. 2020. Combining Dynamic & Static Scheduling in High-level Synthesis. In The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 288--298.
[10]
Jianyi Cheng, Lana Josipovic, George A Constantinides, Paolo Ienne, and John Wickerson. 2021. DASS: Combining Dynamic and Static Scheduling in High-level Synthesis. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2021).
[11]
Yuze Chi, Jason Cong, Peng Wei, and Peipei Zhou. 2018. SODA: stencil with optimized dataflow architecture. In 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 1--8.
[12]
Yuze Chi, Licheng Guo, and Jason Cong. 2022. Accelerating SSSP for Power- Law Graphs. In Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays.
[13]
Yuze Chi, Licheng Guo, Jason Lau, Young-kyu Choi, Jie Wang, and Jason Cong. 2021. Extending High-Level Synthesis for Task-Parallel Programs. In 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 204--213.
[14]
Jason Cong, PengWei, Cody Hao Yu, and Peipei Zhou. 2018. Latte: Locality aware transformation for high-level synthesis. In 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 125--128.
[15]
Jason Cong and Yi Zou. 2009. Parallel multi-level analytical global placement on graphics processing units. In 2009 IEEE/ACM International Conference on Computer-Aided Design-Digest of Technical Papers. IEEE, 681--688.
[16]
James Coole and Greg Stitt. 2010. Intermediate fabrics: Virtual architectures for circuit portability and fast placement and routing. In Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis. 13--22.
[17]
James Coole and Greg Stitt. 2012. BPR: fast FPGA placement and routing using macroblocks. In Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis. 275--284.
[18]
Kaushik De and Prithviraj Banerjee. 1994. Parallel logic synthesis using partitioning. In 1994 International Conference on Parallel Processing Vol. 3, Vol. 3. IEEE, 135--142.
[19]
Kaushik De, LA Chandy, Sumit Roy, Steven Parkes, and Prithviraj Banerjee. 1995. Parallel algorithms for logic synthesis using the MIS approach. In Proceedings of 9th International Parallel Processing Symposium. IEEE, 579--585.
[20]
Shounak Dhar, Love Singhal, Mahesh Iyer, and David Pan. 2019. FPGA Accelerated FPGA Placement. In 2019 29th International Conference on Field Programmable Logic and Applications (FPL). 404--410.
[21]
Xiao Dong and Guy GF Lemieux. 2009. PGR: Period and glitch reduction via clock skew scheduling, delay padding and GlitchLess. In 2009 International Conference on Field-Programmable Technology. IEEE, 88--95.
[22]
Alfred E Dunlop, Brian W Kernighan, et al. 1985. A procedure for placement of standard cell VLSI circuits. IEEE Transactions on Computer-Aided Design 4, 1 (1985), 92--98.
[23]
John P. Fishburn. 1990. Clock skew optimization. IEEE transactions on computers 39, 7 (1990), 945--951.
[24]
Brian Gaide, Dinesh Gaitonde, Chirag Ravishankar, and Trevor Bauer. 2019. Xilinx adaptive compute acceleration platform: VersalTM architecture. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 84--93.
[25]
Marcel Gort and Jason Anderson. 2014. Design re-use for compile time reduction in FPGA high-level synthesis flows. In 2014 International Conference on Field- Programmable Technology (FPT). IEEE, 4--11.
[26]
Marcel Gort and Jason H Anderson. 2010. Deterministic multi-core parallel routing for FPGAs. In 2010 International Conference on Field-Programmable Technology. IEEE, 78--86.
[27]
Marcel Gort and Jason H Anderson. 2011. Accelerating FPGA routing through parallelization and engineering enhancements special section on PAR-CAD 2010. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 31, 1 (2011), 61--74.
[28]
Licheng Guo, Yuze Chi, Jie Wang, Jason Lau, Weikang Qiao, Ecenur Ustun, Zhiru Zhang, and Jason Cong. 2021. AutoBridge: Coupling Coarse-Grained Floorplanning and Pipelining for High-Frequency HLS Design on Multi-Die FPGAs. In The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 81--92.
[29]
Licheng Guo, Jason Lau, Yuze Chi, Jie Wang, Cody Hao Yu, Zhe Chen, Zhiru Zhang, and Jason Cong. 2020. Analysis and Optimization of the Implicit Broadcasts in FPGA HLS to Improve Maximum Frequency. In 2020 57th ACM/IEEE Design Automation Conference (DAC). 1--6. https://doi.org/10.1109/DAC18072. 2020.9218718
[30]
Chin Hau Hoo and Akash Kumar. 2018. ParaDRo: A Parallel Deterministic Router Based on Spatial Partitioning and Scheduling. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Monterey, CALIFORNIA, USA) (FPGA '18). Association for Computing Machinery, New York, NY, USA, 67--76. https://doi.org/10.1145/3174243.3174246
[31]
Yutian Huan and André DeHon. 2012. FPGA optimized packet-switched NoC using split and merge primitives. In 2012 International Conference on Field- Programmable Technology. IEEE, 47--52.
[32]
Abhishek Kumar Jain, Douglas L Maskell, and Suhaib A Fahmy. 2016. Throughput oriented FPGA overlays using DSP blocks. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1628--1633.
[33]
Abhishek Kumar Jain, Khoa Dang Pham, Jin Cui, Suhaib A Fahmy, and Douglas L Maskell. 2014. Virtualized execution and management of hardware tasks on a hybrid ARM-FPGA platform. Journal of Signal Processing Systems 77, 1 (2014), 61--76.
[34]
Wei Jiang, Zhiru Zhang, Miodrag Potkonjak, and Jason Cong. 2008. Scheduling with integer time budgeting for low-power optimization. In 2008 Asia and South Pacific Design Automation Conference. IEEE, 22--27.
[35]
Lana Josipovic, Radhika Ghosal, and Paolo Ienne. 2018. Dynamically scheduled high-level synthesis. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 127--136.
[36]
Parivallal Kannan and Satish Sivaswamy. 2016. Performance driven routing for modern FPGAs. In Proceedings of the 35th International Conference on Computer- Aided Design. 1--6.
[37]
Nachiket Kapre and Jan Gray. 2017. Hoplite: A deflection-routed directional torus noc for fpgas. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 10, 2 (2017), 1--24.
[38]
Yi-Hsiang Lai, Ecenur Ustun, Shaojie Xiang, Zhenman Fang, Hongbo Rong, and Zhiru Zhang. 2021. Programming and Synthesis for Software-defined FPGA Acceleration: Status and Future Prospects. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 14, 4 (2021), 1--39.
[39]
Chris Lavin and Alireza Kaviani. 2018. Rapidwright: Enabling custom crafted implementations for fpgas. In 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 133--140.
[40]
Christopher Lavin, Marc Padilla, Jaren Lamprecht, Philip Lundrigan, Brent Nelson, and Brad Hutchings. 2011. HMFlow: Accelerating FPGA compilation with hard macros for rapid prototyping. In 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines. IEEE, 117--124.
[41]
Edward A Lee and David G Messerschmitt. 1987. Synchronous data flow. Proc. IEEE 75, 9 (1987), 1235--1245.
[42]
Wuxi Li, Meng Li, Jiajun Wang, and David Z Pan. 2017. UTPlaceF 3.0: A parallelization framework for modern FPGA global placement. In 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 922--928.
[43]
Tao Lin, Chris Chu, and GangWu. 2015. POLAR 3.0: An ultrafast global placement engine. In 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 520--527.
[44]
Adrian Ludwin, Vaughn Betz, and Ketan Padalia. 2008. High-quality, deterministic parallel placement for FPGAs on commodity hardware. In Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays. 14--23.
[45]
Jason Luu, Jeffrey Goeders, Michael Wainberg, Andrew Somerville, Thien Yu, Konstantin Nasartschuk, Miad Nasr, Sen Wang, Tim Liu, Nooruddin Ahmed, Kenneth B. Kent, Jason Anderson, Jonathan Rose, and Vaughn Betz. 2014. VTR 7.0: Next Generation Architecture and CAD System for FPGAs. 7, 2 (2014). https://doi.org/10.1145/2617593
[46]
Sen Ma, Zeyad Aklah, and David Andrews. 2016. Just in time assembly of accelerators. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 173--178.
[47]
Pongstorn Maidee, Cristinel Ababei, and Kia Bazargan. 2003. Fast timing-driven partitioning-based placement for island style FPGAs. In Proceedings of the 40th annual design automation conference. 598--603.
[48]
Pongstorn Maidee, Chris Neely, Alireza Kaviani, and Chris Lavin. 2019. An Open-source Lightweight Timing Model for RapidWright. In 2019 International Conference on Field-Programmable Technology (ICFPT). IEEE, 171--178.
[49]
Fubing Mao, Wei Zhang, Bingsheng He, and Siew-Kei Lam. 2017. Dynamic module partitioning for library based placement on heterogeneous FPGAs . In 2017 IEEE 23rd International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA). IEEE, 1--6.
[50]
Michael K Papamichael and James C Hoe. 2012. CONNECT: Re-examining conventional wisdom for designing NoCs in the context of FPGAs. In Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays. 37--46.
[51]
Dongjoon Park, Yuanlong Xiao, Nevo Magnezi, and André DeHon. 2018. Case for fast FPGA compilation using partial reconfiguration. In 2018 28th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 235--2353.
[52]
Weikang Qiao, Jihun Oh, Licheng Guo, Mau-Chung Frank Chang, and Jason Cong. 2021. FANS: FPGA-Accelerated Near-Storage Sorting. In 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 106--114.
[53]
Nikola Samardzic, Weikang Qiao, Vaibhav Aggarwal, Mau-Chung Frank Chang, and Jason Cong. 2020. Bonsai: High-Performance Adaptive Merge Tree Sorting. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture. IEEE, 282--294.
[54]
Minghua Shen and Guojie Luo. 2015. Accelerate FPGA routing with parallel recursive partitioning. In 2015 IEEE/ACM International Conference on Computer- Aided Design (ICCAD). IEEE, 118--125.
[55]
Linghao Song, Yuze Chi, Licheng Guo, and Jason Cong. 2021. Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication. arXiv preprint arXiv:2111.12555 (2021).
[56]
Mirjana Stojilovic. 2017. Parallel FPGA routing: Survey and challenges. In 2017 27th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 1--8.
[57]
Mingxing Tan, Steve Dai, Udit Gupta, and Zhiru Zhang. 2015. Mapping-aware constrained scheduling for LUT-based FPGAs. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 190--199.
[58]
Kizhepatt Vipin, Jan Gray, and Nachiket Kapre. 2017. Enabling partial reconfiguration and low latency routing using segmented FPGA NoCs. In 2017 27th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 1--8.
[59]
Dekui Wang, Zhenhua Duan, Cong Tian, Bohu Huang, and Nan Zhang. 2017. A runtime optimization approach for FPGA routing. IEEE Transactions on Computer- Aided Design of Integrated Circuits and Systems 37, 8 (2017), 1706--1710.
[60]
Jie Wang, Licheng Guo, and Jason Cong. 2021. AutoSA: A Polyhedral Compiler for High-Performance Systolic Arrays on FPGA. In Proceedings of the 2021 ACM/SIGDA international symposium on Field-programmable gate arrays.
[61]
David Wilson and Greg Stitt. 2019. Seiba: An FPGA Overlay-Based Approach to Rapid Application Development. In 2019 International Conference on ReConFigurable Computing and FPGAs (ReConFig). IEEE, 1--8.
[62]
Yuanlong Xiao, Syed Tousif Ahmed, and André DeHon. 2020. Fast Linking of Separately-Compiled FPGA Blocks without a NoC. In 2020 International Conference on Field-Programmable Technology (ICFPT). IEEE, 196--205.
[63]
Yuanlong Xiao, Dongjoon Park, Andrew Butt, Hans Giesen, Zhaoyang Han, Rui Ding, Nevo Magnezi, Raphael Rubin, and André DeHon. 2019. Reducing FPGA Compile Time with Separate Compilation for FPGA Building Blocks. In 2019 International Conference on Field-Programmable Technology (ICFPT). IEEE, 153--161.
[64]
Xilinx. 2020. Xilinx UltraScale Plus Architecture. https://www.xilinx.com/ products/silicon-devices/fpga/virtex-ultrascale-plus.html
[65]
Xilinx. 2021. https://www.xilinx.com/support/documentation/sw_manuals/ xilinx2021_1/ug905-vivado-hierarchical-design.pdf
[66]
Xilinx. 2021. https://www.xilinx.com/support/documentation/user_guides/ ug572-ultrascale-clocking.pdf
[67]
Xilinx. 2021. https://www.xilinx.com/support/documentation/sw_manuals/ xilinx2021_1/ug909-vivado-partial-reconfiguration.pdf
[68]
Zhen Yang, Anthony Vannelli, and Shawki Areibi. 2007. An ILP based hierarchical global routing approach for VLSI ASIC design. Optimization Letters 1, 3 (2007), 281--297.
[69]
Chao-Yang Yeh and Malgorzata Marek-Sadowska. 2005. Skew-programmable clock design for FPGA and skew-aware placement. In Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays. 33--40.
[70]
Michael Xi Yue, Dirk Koch, and Guy GF Lemieux. 2015. Rapid overlay builder for xilinx fpgas. In 2015 IEEE 23rd Annual International Symposium on Field- Programmable Custom Computing Machines. IEEE, 17--20.
[71]
Yue Zha and Jing Li. 2020. Virtualizing FPGAs in the Cloud. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 845--858.
[72]
Yue Zha and Jing Li. 2021. Hetero-ViTAL: A Virtualization Stack for Heterogeneous FPGA Clusters. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 470--483.
[73]
Yue Zha and Jing Li. 2021. When application-specific ISA meets FPGAs: a multilayer virtualization framework for heterogeneous cloud FPGAs. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 123--134.
[74]
Niansong Zhang, Xiang Chen, and Nachiket Kapre. 2020. Rapidlayout: Fast hard block placement of fpga-optimized systolic arrays using evolutionary algorithms. In 2020 30th International Conference on Field-Programmable Logic and Applications (FPL). IEEE, 145--152.
[75]
Jieru Zhao, Tingyuan Liang, Sharad Sinha, and Wei Zhang. 2019. Machine learning based routing congestion prediction in FPGA high-level synthesis. In 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1130--1135.
[76]
Ritchie Zhao, Mingxing Tan, Steve Dai, and Zhiru Zhang. 2015. Area-efficient pipelining for FPGA-targeted high-level synthesis. In Proceedings of the 52nd Annual Design Automation Conference. 1--6.
[77]
Hongbin Zheng, Swathi T Gurumani, Kyle Rupnow, and Deming Chen. 2014. Fast and effective placement and routing directed high-level synthesis for FPGAs. In Proceedings of the 2014 ACM/SIGDA international symposium on Fieldprogrammable gate arrays. 1--10.
[78]
Yuan Zhou, Udit Gupta, Steve Dai, Ritchie Zhao, Nitish Srivastava, Hanchen Jin, Joseph Featherston, Yi-Hsiang Lai, Gai Liu, Gustavo Angarita Velasquez, et al. 2018. Rosetta: A realistic high-level synthesis benchmark suite for software programmable fpgas. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 269--278.
[79]
Yun Zhou, Pongstorn Maidee, Chris Lavin, Alireza Kaviani, and Dirk Stroobandt. 2021. RWRoute: An Open-source Timing-driven Router for Commercial FPGAs. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 15, 1 (2021), 1--27.
[80]
Yun Zhou, Dries Vercruyce, and Dirk Stroobandt. 2020. Accelerating FPGA Routing Through Algorithmic Enhancements and Connection-Aware Parallelization. ACM Trans. Reconfigurable Technol. Syst. 13, 4, Article 18 (Aug. 2020), 26 pages. https://doi.org/10.1145/3406959

Cited By

View all
  • (2024)ExHiPR: Extended High-Level Partial Reconfiguration for Fast Incremental FPGA CompilationACM Transactions on Reconfigurable Technology and Systems10.1145/361783717:2(1-28)Online publication date: 13-Mar-2024
  • (2024)Hardware-Accelerator Design by Composition: Dataflow Component Interfaces With Tydi-ChiselIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.346133032:12(2281-2292)Online publication date: 1-Dec-2024
  • (2024)Preserving Power Optimizations Across the High Level Synthesis of Distinct Application-Specific Circuits2024 Tenth International Conference on Communications and Electronics (ICCE)10.1109/ICCE62051.2024.10634722(125-130)Online publication date: 31-Jul-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
FPGA '22: Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
February 2022
211 pages
ISBN:9781450391498
DOI:10.1145/3490422
This work is licensed under a Creative Commons Attribution-ShareAlike International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 February 2022

Check for updates

Badges

  • Best Paper

Author Tags

  1. dataflow
  2. fpga
  3. hls
  4. parallel
  5. placement
  6. routing

Qualifiers

  • Research-article

Funding Sources

  • CRISP

Conference

FPGA '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Upcoming Conference

FPGA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)787
  • Downloads (Last 6 weeks)106
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)ExHiPR: Extended High-Level Partial Reconfiguration for Fast Incremental FPGA CompilationACM Transactions on Reconfigurable Technology and Systems10.1145/361783717:2(1-28)Online publication date: 13-Mar-2024
  • (2024)Hardware-Accelerator Design by Composition: Dataflow Component Interfaces With Tydi-ChiselIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.346133032:12(2281-2292)Online publication date: 1-Dec-2024
  • (2024)Preserving Power Optimizations Across the High Level Synthesis of Distinct Application-Specific Circuits2024 Tenth International Conference on Communications and Electronics (ICCE)10.1109/ICCE62051.2024.10634722(125-130)Online publication date: 31-Jul-2024
  • (2024)DynaRapid: Fast-Tracking from C to Routed Circuits2024 34th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL64840.2024.00014(24-32)Online publication date: 2-Sep-2024
  • (2023)PR-ESP: An Open-Source Platform for Design and Programming of Partially Reconfigurable SoCs2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137141(1-6)Online publication date: Apr-2023
  • (2023)REFRESH FPGAs: Sustainable FPGA Chiplet ArchitecturesProceedings of the 14th International Green and Sustainable Computing Conference10.1145/3634769.3634798(1-3)Online publication date: 28-Oct-2023
  • (2023)Special Session: Machine Learning for Embedded System DesignProceedings of the 2023 International Conference on Hardware/Software Codesign and System Synthesis10.1145/3607888.3608962(28-37)Online publication date: 17-Sep-2023
  • (2023)RapidStream 2.0: Automated Parallel Implementation of Latency–Insensitive FPGA Designs Through Partial ReconfigurationACM Transactions on Reconfigurable Technology and Systems10.1145/359302516:4(1-30)Online publication date: 1-Sep-2023
  • (2023)Homunculus: Auto-Generating Efficient Data-Plane ML Pipelines for Datacenter NetworksProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582022(329-342)Online publication date: 25-Mar-2023
  • (2023)SASA: A Scalable and Automatic Stencil Acceleration Framework for Optimized Hybrid Spatial and Temporal Parallelism on HBM-based FPGAsACM Transactions on Reconfigurable Technology and Systems10.1145/357254716:2(1-33)Online publication date: 17-Apr-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media