research-article

In-Depth Analysis on Microarchitectures of Modern Heterogeneous CPU-FPGA Platforms

Authors:

Peng WeiAuthors Info & Claims

ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 12, Issue 1

Article No.: 4, Pages 1 - 20

https://doi.org/10.1145/3294054

Published: 17 February 2019 Publication History

Get Access

Abstract

Conventional homogeneous multicore processors are not able to provide the continued performance and energy improvement that we have expected from past endeavors. Heterogeneous architectures that feature specialized hardware accelerators are widely considered a promising paradigm for resolving this issue. Among different heterogeneous devices, FPGAs that can be reconfigured to accelerate a broad class of applications with orders-of-magnitude performance/watt gains, are attracting increased attention from both academia and industry. As a consequence, a variety of CPU-FPGA acceleration platforms with diversified microarchitectural features have been supplied by industry vendors. Such diversity, however, poses a serious challenge to application developers in selecting the appropriate platform for a specific application or application domain.

This article aims to address this challenge by determining which microarchitectural characteristics affect performance, and in what ways. Specifically, we conduct a quantitative comparison and an in-depth analysis on five state-of-the-art CPU-FPGA acceleration platforms: (1) the Alpha Data board and (2) the Amazon F1 instance that represent the traditional PCIe-based platform with private device memory; (3) the IBM CAPI that represents the PCIe-based system with coherent shared memory; (4) the first generation of the Intel Xeon+FPGA Accelerator Platform that represents the QPI-based system with coherent shared memory; and (5) the second generation of the Intel Xeon+FPGA Accelerator Platform that represents a hybrid PCIe-based (non-coherent) and QPI-based (coherent) system with shared memory. Based on the analysis of their CPU-FPGA communication latency and bandwidth characteristics, we provide a series of insights for both application developers and platform designers. Furthermore, we conduct two case studies to demonstrate how these insights can be leveraged to optimize accelerator designs. The microbenchmarks used for evaluation have been released for public use.

References

[1]

Jeff Burt. 2016. Intel to Start Shipping Xeons with FPGAs in Early 2016. Retrieved from http://www.eweek.com/servers/intel-to-start-shipping-xeons-with-fpgas-in-early-2016.html.

Abstract

References

Cited By

Index Terms

Recommendations

A quantitative analysis on microarchitectures of modern CPU-FPGA platforms

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Optimized HPL for AMD GPU and multi-core CPU usage

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations