[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

UNILOGIC: A Novel Architecture for Highly Parallel Reconfigurable Systems

Published: 09 September 2020 Publication History

Abstract

One of the main characteristics of High-performance Computing (HPC) applications is that they become increasingly performance and power demanding, pushing HPC systems to their limits. Existing HPC systems have not yet reached exascale performance mainly due to power limitations. Extrapolating from today’s top HPC systems, about 100–200 MWatts would be required to sustain an exaflop-level of performance. A promising solution for tackling power limitations is the deployment of energy-efficient reconfigurable resources (in the form of Field-programmable Gate Arrays (FPGAs)) tightly integrated with conventional CPUs. However, current FPGA tools and programming environments are optimized for accelerating a single application or even task on a single FPGA device. In this work, we present UNILOGIC (Unified Logic), a novel HPC-tailored parallel architecture that efficiently incorporates FPGAs. UNILOGIC adopts the Partitioned Global Address Space (PGAS) model and extends it to include hardware accelerators, i.e., tasks implemented on the reconfigurable resources. The main advantages of UNILOGIC are that (i) the hardware accelerators can be accessed directly by any processor in the system, and (ii) the hardware accelerators can access any memory location in the system. In this way, the proposed architecture offers a unified environment where all the reconfigurable resources can be seamlessly used by any processor/operating system. The UNILOGIC architecture also provides hardware virtualization of the reconfigurable logic so that the hardware accelerators can be shared among multiple applications or tasks. The FPGA layer of the architecture is implemented by splitting its reconfigurable resources into (i) a static partition, which provides the PGAS-related communication infrastructure, and (ii) fixed-size and dynamically reconfigurable slots that can be programmed and accessed independently or combined together to support both fine and coarse grain reconfiguration.1 Finally, the UNILOGIC architecture has been evaluated on a custom prototype that consists of two 1U chassis, each of which includes eight interconnected daughter boards, called Quad-FPGA Daughter Boards (QFDBs); each QFDB supports four tightly coupled Xilinx Zynq Ultrascale+ MPSoCs as well as 64 Gigabytes of DDR4 memory, and thus, the prototype features a total of 64 Zynq MPSoCs and 1 Terabyte of memory. We tuned and evaluated the UNILOGIC prototype using both low-level (baremetal) performance tests, as well as two popular real-world HPC applications, one compute-intensive and one data-intensive. Our evaluation shows that UNILOGIC offers impressive performance that ranges from being 2.5 to 400 times faster and 46 to 300 times more energy efficient compared to conventional parallel systems utilizing only high-end CPUs, while it also outperforms GPUs by a factor ranging from 3 to 6 times in terms of time to solution, and from 10 to 20 times in terms of energy to solution.

References

[1]
AXI 2017. AXI Reference Guide. Retrieved from www.xilinx.com/support/documentation/ip_documentation/axi_ref_guide/latest/ug1037-vivado-axi-reference-guide.pdf.
[2]
BittWare. 2019. BittWare FPGA Acceleration. Retrieved from https://www.bittware.com/.
[3]
M. Blott. 2016. Reconfigurable future for HPC. In Proceedings of the International Conference on High Performance Computing Simulation (HPCS’16). 130--131.
[4]
B. Brech, J. Rubio, and M. Hollinger. 2015. Data Engine for NoSQL-IBM Power Systems Edition. White Paper.
[5]
A. Cilardo. 2018. HtComp: Bringing reconfigurable hardware to future high-performance applications. Int. J. High Perform. Comput. Appl. 12, 1 (2018), 74--83.
[6]
Convey Computer Corp. 2012. The Convey HC-2 Computer Architectural Overview (White Paper). Retrieved from https://www.micron.com/-/media/documents/products/white-paper/wp_convey_hc2_architectual_overview.pdf.
[7]
R. S. Correa and J. P. David. 2018. Ultra-low latency communication channels for FPGA-based HPC cluster. Integration 63 (2018), 41--55.
[8]
F. A. Escobar, X. Chang, and C. Valderrama. 2016. Suitability analysis of FPGAs for heterogeneous platforms in HPC. IEEE Trans. Parallel Distrib. Syst. 27, 2 (2016), 600--612.
[9]
A. Arif et al. 2020. Performance and energy-efficient implementation of a smart city application on FPGAs. J. Real-Time Image Process. 17, 3 (2020), 729--743.
[10]
A. D. George et al. 2016. Novo-G#: Large-scale reconfigurable computing with direct and programmable interconnects. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC’16). 1--7.
[11]
A. Iordache et al. 2016. High performance in the cloud with FPGA groups. In Proceedings of the 9th International Conference on Utility and Cloud Computing (UCC’16). 1--10.
[12]
A. Ioannou et al. 2019. Optimized FPGA implementation of a compute-intensive oil reservoir simulation algorithm. In Embedded Computer Systems: Architectures, Modeling, and Simulation. Springer International Publishing, 442--454.
[13]
A. Mondigo et al. 2017. Design and scalability analysis of bandwidth-compressed stream computing with multiple FPGAs. In Proceedings of the 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC’17). 1--8.
[14]
A. Putnam et al. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. In Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture (ISCA’14). 13--24.
[15]
A. Putnam et al. 2016. A reconfigurable fabric for accelerating large-scale datacenter services. Commun. ACM 59, 11 (2016), 114--122.
[16]
A. Rigo et al. 2017. Paving the way towards a highly energy-efficient and highly integrated compute node for the exascale revolution: The ExaNoDe approach. In Proceedings of the Euromicro Conference on Digital System Design (DSD’17). 486--493.
[17]
B. Subramaniam et al. 2013. Trends in energy-efficient computing: A perspective from the Green500. In Proceedings of the International Green Computing Conference (IGCC’13). 1--8.
[18]
C. Vatsolakis et al. 2017. RACOS: Transparent access and virtualization of reconfigurable hardware accelerators. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS’17). 11--19.
[19]
D. C. Price et al. 2016. Optimizing performance-per-watt on GPUs in high performance computing: Temperature, frequency and voltage effects. Comput. Sci. Res. Dev. 31, 4 (2016), 185--193.
[20]
D. V. Vu et al. 2014. Enabling partial reconfiguration for coprocessors in mixed criticality multicore systems using PCI express single-root I/O virtualization. In Proceedings of the International Conference on ReConFigurable Computing and FPGAs (ReConFig’14). 1--6.
[21]
F. Chaix et al. 2019. Implementation and impact of an ultra-compact multi-FPGA board for large system prototyping. In Proceedings of the 5th International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC’19).
[22]
G. Pitsis et al. 2019. Efficient convolutional neural network weight compression for space data classification on multi-FPGA platforms. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’19). 3917--3921.
[23]
I. Kalomoiris et al. 2019. An experimental analysis of the opportunities to use field programmable gate array multiprocessors for on-board satellite deep learning classification of spectroscopic observations from future ESA space missions. In Proceedings of the Conference on On-board Data Processing (OBDP’19).
[24]
I. Mavroidis et al. 2016. ECOSCALE: Reconfigurable computing and runtime system for future exascale systems. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’16). 696--701.
[25]
J. Korinth et al. 2019. The TaPaSCo Open-Source Toolflow for the Automated Composition of Task-Based Parallel Reconfigurable Computing Systems. 214--229.
[26]
J. Ouyang et al. 2014. SDA: Software-defined accelerator for large-scale DNN systems. In Proceedings of the IEEE Hot Chips 26 Symposium (HCS’14). 1--23.
[27]
J. Weerasinghe et al. 2016. Network-attached FPGAs for data center applications. In Proceedings of the International Conference on Field-Programmable Technology (FPT’16). 36--43.
[28]
J. Weerasinghe et al. 2016. Network-attached FPGAs for data center applications. Proceedings of the International Conference on Field-Programmable Technology (FPT’16). 36--43.
[29]
K. Pham et al. 2017. BITMAN: A tool and API for FPGA bitstream manipulations. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’17). IEEE, 894--897.
[30]
K. Pham et al. 2018. IPRDF: An isolated partial reconfiguration design flow for Xilinx FPGAs. In Proceedings of the 12th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC’18). 36--43.
[31]
Lee Howes et al. 2015. TheOpenCL Specification. Retrieved from www.khronos.org/registry/OpenCL/specs/opencl-2.0.pdf.
[32]
M. Huang et al. 2016. Programming and runtime support to Blaze FPGA accelerator deployment at datacenter scale. In Proceedings of the 7th ACM Symposium on Cloud Computing. 456--469.
[33]
M. Katevenis et al. 2016. The ExaNeSt project: Interconnects, storage, and packaging for exascale systems. In Proceedings of the Euromicro Conference on Digital System Design (DSD’16). 60--67.
[34]
M. Marazakis et al. 2016. EUROSERVER: Share-anything scale-out micro-server design. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’16). 678--683.
[35]
M. Makni et al. 2017. Performance exploration of AMBA AXI4 bus protocols for wireless sensor networks. In 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA'17). 1163--1169.
[36]
M. Vesper et al. 2016. JetStream: An open-source high-performance PCI Express 3 streaming library for FPGA-to-Host and FPGA-to-FPGA communication. In Proceedings of the 26th International Conference on Field Programmable Logic and Applications (FPL’16). 1--9.
[37]
M. Yoshimi et al. 2010. A performance evaluation of CUBE: One-dimensional 512 FPGA cluster. In Proceedings of the 6th International Symposium on Reconfigurable Computing: Architectures, Tools and Applications (ARC’10). 372--381.
[38]
N. B. Grigore et al. 2018. HLS enabled partially reconfigurable module implementation. In Proceedings of the 31st International Conference on Architecture of Computing Systems (ARCS’18). 269--282.
[39]
O. Sander et al. 2014. A flexible interface architecture for reconfigurable coprocessors in embedded multicore systems using PCIe Single-root I/O virtualization. In Proceedings of the International Conference on Field-Programmable Technology (FPT’14). 223--226.
[40]
P. Malakonakis et al. 2018. HLS algorithmic explorations for HPC execution on reconfigurable hardware—ECOSCALE. In Proceedings of the 14th International Symposium on Applied Reconfigurable Computing. Architectures, Tools, and Applications (ARC’18). 724--736.
[41]
R. Ammendola et al. 2017. The next generation of exascale-class systems: The ExaNeSt project. In Proceedings of the Euromicro Conference on Digital System Design (DSD’17). 510--515.
[42]
R. Kobayashi et al. 2018. OpenCL-ready high speed FPGA network for reconfigurable high performance computing. In Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region (HPC’18). 192--201.
[43]
S. Lyberis et al. 2014. FPGA prototyping of emerging manycore architectures for parallel programming research using formic boards. J. Syst. Architect. 60 (June 2014).
[44]
V. Viswanathan et al. 2015. A parallel and scalable multi-FPGA based architecture for high performance applications (abstract only). In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 266.
[45]
Yann Beilliard et al. 2019. FPGA-based multi-chip module for high-performance computing. CoRR abs/1906.11175. Retrieved from http://arxiv.org/abs/1906.11175.
[46]
Y. Durand et al. 2014. EUROSERVER: Energy efficient node for european micro-servers. In Proceedings of the 17th Euromicro Conference on Digital System Design (DSD’14). 206--213.
[47]
Y. Liu et al. 2010. Building a multi-FPGA-based emulation framework to support networks-on-chip design and verification. Int. J. Electron. 97 (Oct. 2010), 1241--1262.
[48]
Z. Wang et al. 2016. Melia: A MapReduce framework on OpenCL-based FPGAs. IEEE Trans. Parallel Distrib. Syst. 27, 12 (2016), 3547--3560.
[49]
EU. 2013--2017. The Euroserver Project. Retrieved from http://www.euroserver-project.eu.
[50]
K. Fleming and M. Adler. 2016. The LEAP FPGA operating system. In FPGAs for Software Programmers. 245--258.
[51]
Pro Design Electronic GmbH. 2019. profpga: FPGA Prototyping. Retrieved from https://www.profpga.com.
[52]
SciEngines GmbH. 2019. SciEngines Hardware, High Performance Reconfigurable Computing. Retrieved from https://www.sciengines.com/technology-platform/sciengines-hardware/.
[53]
Amazon.com Inc. 2019. Amazon EC2 F1 Instances. Retrieved from https://aws.amazon.com/ec2/instance-types/f1/.
[54]
Digilent Inc. 2019. FPGA, Microcontrollers and Instrumentation. Retrieved from http://www.digilent.com.
[55]
Maxeler Technologies Inc. 2019. Dataflow Computing. Retrieved from https://www.maxeler.com/technology/dataflow-computing/.
[56]
Maxeler Technologies Inc. 2019. Maxeler Products. Retrieved from https://www.maxeler.com/products/.
[57]
National Instruments. 2019. Automated Test and Automated Measurement Systems. Retrieved from http://www.ni.com/en-us/innovations/wireless/software-defined-radio.html.
[58]
N. Kapre and J. Gray. 2017. Hoplite: A deflection-routed directional torus NoC for FPGAs. ACM Trans. Reconfig. Technol. Syst. 10, 2 (2017), 14:1--14:24.
[59]
A. Kashif and M. A. S. Khalid. 2016. Experimental evaluation and comparison of time-multiplexed multi-FPGA routing architectures. In Proceedings of the IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS’16). 1--4.
[60]
M. Katevenis. 2007. Interprocessor communication seen as load-store instruction generalization. In The Future of Computing, Essays in Memory of Stamatis Vassiliadis. K. Bertels (Editor), Delft, The Netherlands, 55--68.
[61]
D. Koch. 2012. Partial Reconfiguration on FPGAs--Architectures, Tools and Applications. Springer.
[62]
J. Laudon and D. Lenoski. 1997. The SGI origin: A ccNUMA highly scalable server. In Proceedings of the 24th International Symposium on Computer Architecture. 241--251.
[63]
HiTech Global LLC. 2019. Xilinx/Altera FPGA boards, design services, IP Cores. Retrieved from http://www.hitechglobal.com/.
[64]
G. Mahesh and S. M. Sakthivel. 2015. Verification of memory transactions in AXI protocol using system verilog approach. In Proceedings of the International Conference on Communications and Signal Processing (ICCSP’15). 0860--0864.
[65]
N. Tarafdar et al. 2017. Enabling flexible network FPGA clusters in a heterogeneous cloud data center. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’17). 237--246.
[66]
O. Pell and V. Averbukh. 2012. Maximum performance computing with dataflow engines. Comput. Sci. Eng. 14, 4 (2012), 98--103.
[67]
Oliver Pell and Oskar Mencer. 2011. Surviving the end of frequency scaling with reconfigurable dataflow computing. SIGARCH Comput. Archit. News 39, 4 (Dec. 2011), 60--65.
[68]
C. Plessl. 2018. Bringing FPGAs to HPC production systems and codes. In Proceedings of the 4th International Workshop on Heterogeneous High-performance Reconfigurable Computing (workshop at Supercomputing).
[69]
BERTEN Digital Signal Processing. 2016. GPU vs. FPGA Performance Comparison. Retrieved from http://www.bertendsp.com/pdf/whitepaper/BWP001_GPU_vs_FPGA_Performance_Comparison_v1.0.pdf.
[70]
S. Ravi, K. Ezra, and H. Kittur. 2014. Design of a bus monitor for performance analysis of AXI protocol based SoC systems. Int. J. Appl. Eng. Res. 9 (Nov. 2014), 6313--6324.
[71]
S. R. Pradeep. 2014. Design and verification environment for AMBA AXI protocol for SoC integration. Int. J. Res. Eng. Technol. 03 (May 2014), 338--343.
[72]
Qingshan Tang. 2015. Methodology of Multi-FPGA Prototyping Platform Generation. Ph.D. Dissertation. Université Pierre et Marie Curie-Paris. Retrieved from https://tel.archives-ouvertes.fr/tel-01256510/document.
[73]
Qingshan Tang and Matthieu Tuna. 2014. Performance comparison between multi-FPGA prototyping platforms: Hardwired off-the-shelf, cabling, and custom. 125--132.
[74]
top500.org 2019. Green500 List—November 2019. Retrieved from www.top500.org/green500/list/2019/11/.
[75]
top500.org 2019. Top500 List—November 2019. Retrieved from www.top500.org/lists/2019/11/.
[76]
A. Vaishnav, K. D. Pham, and D. Koch. 2018. A survey on FPGA virtualization. In Proceedings of the 28th International Conference on Field Programmable Logic and Applications (FPL’18).
[77]
C. Whitson and M. Michelsen. 1989. The negative flash. In Fluid Phase Equilibria, Vol. 35. 51--71.

Cited By

View all
  • (2023)A Survey on FPGA-Based Heterogeneous Clusters ArchitecturesIEEE Access10.1109/ACCESS.2023.328843111(67679-67706)Online publication date: 2023
  • (2023)The Prism Bridge: Maximizing Inter-Chip AXI Throughput in the High-Speed Serial EraIEEE Access10.1109/ACCESS.2023.327795911(50867-50883)Online publication date: 2023
  • (2022)Preconditioned Conjugate Gradient Acceleration on FPGA-Based PlatformsElectronics10.3390/electronics1119303911:19(3039)Online publication date: 24-Sep-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems
ACM Transactions on Reconfigurable Technology and Systems  Volume 13, Issue 4
Special Section on FCCM 2019 and Regular Papers
December 2020
112 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/3419942
  • Editor:
  • Deming Chen
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 September 2020
Accepted: 01 June 2020
Revised: 01 March 2020
Received: 01 December 2019
Published in TRETS Volume 13, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Accelerators
  2. FPGA unification
  3. partial reconfiguration
  4. prototyping

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • European Commission under the H2020 Programme and the ECOSCALE project

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)10
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A Survey on FPGA-Based Heterogeneous Clusters ArchitecturesIEEE Access10.1109/ACCESS.2023.328843111(67679-67706)Online publication date: 2023
  • (2023)The Prism Bridge: Maximizing Inter-Chip AXI Throughput in the High-Speed Serial EraIEEE Access10.1109/ACCESS.2023.327795911(50867-50883)Online publication date: 2023
  • (2022)Preconditioned Conjugate Gradient Acceleration on FPGA-Based PlatformsElectronics10.3390/electronics1119303911:19(3039)Online publication date: 24-Sep-2022
  • (2021)OmpSs@FPGA framework for high performance FPGA computingIEEE Transactions on Computers10.1109/TC.2021.3086106(1-1)Online publication date: 2021
  • (2020)Moving Compute towards Data in Heterogeneous multi-FPGA Clusters using Partial Reconfiguration and I/O Virtualisation2020 International Conference on Field-Programmable Technology (ICFPT)10.1109/ICFPT51103.2020.00038(221-226)Online publication date: Dec-2020

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media