[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

In-Depth Analysis on Microarchitectures of Modern Heterogeneous CPU-FPGA Platforms

Published: 17 February 2019 Publication History

Abstract

Conventional homogeneous multicore processors are not able to provide the continued performance and energy improvement that we have expected from past endeavors. Heterogeneous architectures that feature specialized hardware accelerators are widely considered a promising paradigm for resolving this issue. Among different heterogeneous devices, FPGAs that can be reconfigured to accelerate a broad class of applications with orders-of-magnitude performance/watt gains, are attracting increased attention from both academia and industry. As a consequence, a variety of CPU-FPGA acceleration platforms with diversified microarchitectural features have been supplied by industry vendors. Such diversity, however, poses a serious challenge to application developers in selecting the appropriate platform for a specific application or application domain.
This article aims to address this challenge by determining which microarchitectural characteristics affect performance, and in what ways. Specifically, we conduct a quantitative comparison and an in-depth analysis on five state-of-the-art CPU-FPGA acceleration platforms: (1) the Alpha Data board and (2) the Amazon F1 instance that represent the traditional PCIe-based platform with private device memory; (3) the IBM CAPI that represents the PCIe-based system with coherent shared memory; (4) the first generation of the Intel Xeon+FPGA Accelerator Platform that represents the QPI-based system with coherent shared memory; and (5) the second generation of the Intel Xeon+FPGA Accelerator Platform that represents a hybrid PCIe-based (non-coherent) and QPI-based (coherent) system with shared memory. Based on the analysis of their CPU-FPGA communication latency and bandwidth characteristics, we provide a series of insights for both application developers and platform designers. Furthermore, we conduct two case studies to demonstrate how these insights can be leveraged to optimize accelerator designs. The microbenchmarks used for evaluation have been released for public use.

References

[1]
Jeff Burt. 2016. Intel to Start Shipping Xeons with FPGAs in Early 2016. Retrieved from http://www.eweek.com/servers/intel-to-start-shipping-xeons-with-fpgas-in-early-2016.html.
[2]
Amazon. 2017. Amazon EC2 F1 Instance. Retrieved from https://aws.amazon.com/ec2/instance-types/f1/.
[3]
Xilinx. 2017. SDAccel Development Environment. Retrieved from http://www.xilinx.com/products/design-tools/software-zone/sdaccel.html.
[4]
CCIX. 2018. Cache Coherent Interconnect for Accelerators. Retrieved from https://www.ccixconsortium.com/.
[5]
Brad Brech, Juan Rubio, and Michael Hollinger. 2015. IBM Data Engine for NoSQL—Power Systems Edition. Technical Report. IBM Systems Group.
[6]
Tony M. Brewer. 2010. Instruction set innovations for the Convey HC-1 computer. IEEE Micro 2 (2010), 70--79.
[7]
Nanchini Chandramoorthy, Giuseppe Tagliavini, Kevin Irick, Antonio Pullini, Siddharth Advani, Sulaiman Al Habsi, Matthew Cotter, John Sampson, Vijaykrishnan Narayanan, and Luca Benini. 2015. Exploring architectural heterogeneity in intelligent vision systems. In HPCA-21.
[8]
Yu-Ting Chen, Jason Cong, Zhenman Fang, Jie Lei, and Peng Wei. 2016. When Apache spark meets FPGAs: A case study for next-generation DNA sequencing acceleration. In HotCloud.
[9]
Young-kyu Choi and Jason Cong. 2016. Acceleration of EM-based 3D CT reconstruction using FPGA. IEEE Trans. Biomed. Circ. Syst. 10, 3 (2016), 754--767.
[10]
Young-kyu Choi, Jason Cong, Zhenman Fang, Yuchen Hao, Glenn Reinman, and Peng Wei. 2016. A quantitative analysis on microarchitectures of modern CPU-FPGA platforms. In DAC-53.
[11]
Jason Cong, Zhenman Fang, Yuchen Hao, and Glenn Reinman. 2017. Supporting address translation for accelerator-centric architectures. In HPCA-23.
[12]
Jason Cong, Mohammad Ali Ghodrat, Michael Gill, Beayna Grigorian, Hui Huang, and Glenn Reinman. 2013. Composable accelerator-rich microprocessor enhanced for adaptivity and longevity. In ISLPED.
[13]
Jason Cong, Peng Wei, and Cody Hao Yu. 2018. From JVM to FPGA: Bridging abstraction hierarchy via optimized deep pipelining. In HotCloud.
[14]
Shane Cook. 2012. CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs. Newnes.
[15]
Emilio G. Cota, Paolo Mantovani, Giuseppe Di Guglielmo, and Luca P. Carloni. 2015. An analysis of accelerator coupling in heterogeneous architectures. In DAC-52.
[16]
Zhenman Fang, Sanyam Mehta, Pen-Chung Yew, Antonia Zhai, James Greensky, Gautham Beeraka, and Binyu Zang. 2015. Measuring microarchitectural details of multi- and many-core memory systems through microbenchmarking. ACM Trans. Architect. Code Optimiz. 11, 4 (2015), 55.
[17]
IBM 2015. Coherent Accelerator Processor Interface User’s Manual Xilinx Edition. IBM. Rev. 1.1.
[18]
Intel 2016. BDW+FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification. Intel. Rev. 1.0.
[19]
J. Jang, S. Choi, and V. Prasanna. 2005. Energy- and time-efficient matrix multiplication on FPGAs. IEEE TVLSI 13, 11 (2005), 1305--1319.
[20]
Jason Lawley. 2014. Understanding Performance of PCI Express Systems. Xilinx. Rev. 1.2.
[21]
NVIDIA 2009. NVIDIA’s Next Generation CUDA Compute Architecture: FERMI. NVIDIA. Rev. 1.1.
[22]
Neal Oliver, Rahul R. Sharma, Stephen Chang, Bhushan Chitlur, Elkin Garcia, Joseph Grecco, Aaron Grier, Nelson Ijih, Yaping Liu, Pratik Marolia et al. 2011. A reconfigurable computing system based on a cache-coherent fabric. In ReConFig.
[23]
Kalin Ovtcharov, Olatunji Ruwase, Joo-Young Kim, Jeremy Fowers, Karin Strauss, and Eric S. Chung. 2015. Toward accelerating deep learning at scale using specialized hardware in the datacenter. In Hot Chips.
[24]
Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray et al. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. In ISCA-41.
[25]
Phil Rogers. 2013. Heterogeneous system architecture overview. In Hot Chips.
[26]
Zhenyuan Ruan, Tong He, Bojie Li, Peipei Zhou, and Jason Cong. 2018. ST-Accel: A high-level programming platform for streaming applications on FPGA. In FCCM.
[27]
J. Stuecheli, Bart Blaner, C. R. Johns, and M. S. Siegel. 2015. CAPI: A coherent accelerator processor interface. IBM J. Res. Dev. 59, 1 (2015), 7:1–7:7.
[28]
Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (2009), 65--76.
[29]
Henry Wong, Misel-Myrto Papadopoulou, Maryam Sadooghi-Alvandi, and Andreas Moshovos. 2010. Demystifying GPU microarchitecture through microbenchmarking. In ISPASS.
[30]
Xilinx 2017. ADM-PCIE-7V3 Datasheet. Xilinx. Rev. 1.3.
[31]
Serif Yesil, Muhammet Mustafa Ozdal, Taemin Kim, Andrey Ayupov, Steven Burns, and Ozcan Ozturk. 2015. Hardware accelerator design for data centers. In ICCAD.
[32]
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In FPGA. 161--170.

Cited By

View all
  • (2024)FlexForge: Efficient Reconfigurable Cloud Acceleration via Peripheral Resource Disaggregation2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546641(1-6)Online publication date: 25-Mar-2024
  • (2024)Unveiling the Advantages of Full Coherency Architecture for FPSoC SystemsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.336818932:8(1549-1553)Online publication date: 1-Aug-2024
  • (2024)HitGNN: High-Throughput GNN Training Framework on CPU+Multi-FPGA Heterogeneous PlatformIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.337133235:5(707-719)Online publication date: 1-May-2024
  • Show More Cited By

Index Terms

  1. In-Depth Analysis on Microarchitectures of Modern Heterogeneous CPU-FPGA Platforms

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Reconfigurable Technology and Systems
    ACM Transactions on Reconfigurable Technology and Systems  Volume 12, Issue 1
    March 2019
    115 pages
    ISSN:1936-7406
    EISSN:1936-7414
    DOI:10.1145/3310278
    • Editor:
    • Deming Chen
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 February 2019
    Accepted: 01 November 2018
    Revised: 01 July 2018
    Received: 01 March 2018
    Published in TRETS Volume 12, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. AWS F1
    2. CAPI
    3. CPU-FPGA platform
    4. Heterogeneous computing
    5. Xeon+FPGA

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • NSF/Intel InTrans
    • NSF/Intel CAPA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)98
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 25 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)FlexForge: Efficient Reconfigurable Cloud Acceleration via Peripheral Resource Disaggregation2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546641(1-6)Online publication date: 25-Mar-2024
    • (2024)Unveiling the Advantages of Full Coherency Architecture for FPSoC SystemsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.336818932:8(1549-1553)Online publication date: 1-Aug-2024
    • (2024)HitGNN: High-Throughput GNN Training Framework on CPU+Multi-FPGA Heterogeneous PlatformIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.337133235:5(707-719)Online publication date: 1-May-2024
    • (2024)Vina-FPGA-Cluster: Multi-FPGA Based Molecular Docking Tool With High-Accuracy and Multi-Level ParallelismIEEE Transactions on Biomedical Circuits and Systems10.1109/TBCAS.2024.338832318:6(1321-1337)Online publication date: Dec-2024
    • (2024)SSE: Security Service Engines to Scale Enclave Parallelism for System Interactive Applications2024 International Symposium on Secure and Private Execution Environment Design (SEED)10.1109/SEED61283.2024.00019(84-95)Online publication date: 16-May-2024
    • (2024)Demystifying a CXL Type-2 Device: A Heterogeneous Cooperative Computing Perspective2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00110(1504-1517)Online publication date: 2-Nov-2024
    • (2024)FYalSAT: High-Throughput Stochastic Local Search K-SAT Solver on FPGAIEEE Access10.1109/ACCESS.2024.339733012(65503-65512)Online publication date: 2024
    • (2023)Theoretical Validation and Hardware Implementation of Dynamic Adaptive Scheduling for Heterogeneous Systems on ChipJournal of Low Power Electronics and Applications10.3390/jlpea1304005613:4(56)Online publication date: 17-Oct-2023
    • (2023)CPU-free Computing: A Vision with a BlueprintProceedings of the 19th Workshop on Hot Topics in Operating Systems10.1145/3593856.3595906(1-14)Online publication date: 22-Jun-2023
    • (2023)Rambda: RDMA-driven Acceleration Framework for Memory-intensive µs-scale Datacenter Applications2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071127(499-515)Online publication date: Feb-2023
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media