More Web Proxy on the site http://driver.im/

poster

CHiMPS: a high-level compilation flow for hybrid CPU-FPGA architectures

Authors:

Andrew R. Putnam,

Eric Dellinger,

Prasanna SundararajanAuthors Info & Claims

FPGA '08: Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays

Page 261

https://doi.org/10.1145/1344671.1344720

Published: 24 February 2008 Publication History

Abstract

This poster describes CHiMPS, a toolflow that aims to provide software developers with a way to program hybrid CPU-FPGA platforms using familiar tools, languages, and techniques. CHiMPS starts with C and produces a specialized spatial dataflow architecture that supports coherent caches and the shared-memory programming model. The toolflow is designed to abstract away the complex details of data movement and separate memories on the hybrid platforms, as well as take advantage of memory management and computation techniques unique to reconfigurable hardware. This poster focuses on the memory design for CHiMPS, particularly the use of numerous small caches customized for various phases of program execution. The poster also addresses area vs. performance tradeoffs for various configurations. Applications compiled using CHiMPS show performance improvements of more than 36x on simple compute-intensive kernels, and 4.3x on the difficult-to-parallelize STSWM application without any special optimizations compared to running only on the CPU. The toolflow supports full ANSI-C, and produces hardware that runs on platforms that are expected to be available within one year

References

[1]

Krste Asanovic, et al. The Landscape of Parallel Computing: A View from Berkeley. UCB/EECS-2006-183. http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html, Dec 2006.

[2]

DRC Computers, RPU110 Data Sheet v4.18.07, http://www.drccomputer.com/pdfs/DRC_RPU110_datasheet.pdf, 2007.

[3]

XtremeData, XD2000i Data Sheet, http://www.xtremedatainc.com/pdf/XD2000i_brief.pdf, 2007.

[4]

Steve Trimberger. Redefining the FPGA. Field Programmable Logic (FPL) 2007, San Jose, CA, 2007.

[5]

Avinash (Nash) Palaniswamy, Misha Burich, Intel + Altera = Efficient HPC Coprocessing, http://www.altera.com/education/webcasts/all/wc-2007-efficient-hpc-processing.html, 2007.

[6]

Celoxica, Handel-C For Hardware Design v1.1, http://www.celoxica.com/techlib/files/CEL-W0307171L48-63.pdf, August 2002.

[7]

Daniel S. Poznanovic, "Application Development on the SRC Computers, Inc. Systems," ipdps, p. 78a, 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) -- Papers, 2005.

Digital Library

[8]

Celoxica, Handel-C Language Reference Manual RM-1003-4.2, http://www.celoxica.com, 2004.

[9]

Xilinx, UG096: Implementing a Virtex-4 FX PowerPC System with a C-to-HDL Hardware Coprocessor Accelerator Design Guide v1.0, http://www.xilinx.com/bvdocs/userguides/ug096.pdf, 2005.

[10]

Zhi Guo, et. al. Optimized Generation of Data-Path from C Codes for FPGAs, Proceedings of the conference on Design, Automation and Test in Europe -- Volume 1 Pages: 112--117, 2005.

Digital Library

[11]

Slogsnat, D., Giese, A., and Brüning, U. 2007. A versatile, low latency HyperTransport core. In Proceedings of the 2007 ACM/SIGDA 15th international Symposium on Field Programmable Gate Arrays (Monterey, California, USA, February 18-20, 2007). FPGA '07. ACM Press, New York, NY, 45--52. DOI= http://doi.acm.org/10.1145/1216919.1216926

Digital Library

[12]

Ian McCallum, Intel® QuickAssist Technology Accelerator Abstraction Layer (AAL) 317481-001US, http://download.intel.com/technology/platforms/quickassist/quickassist_aal_whitepaper.pdf, 2007.

[13]

M.B. Gokhale et al., "Promises and Pitfalls of Reconfigurable Supercomputing" Proc. 2006 Conf. Eng. of Reconfigurable Systems and Algorithms, CSREA Press, 2006, pp. 11--20.

[14]

Celoxica, Accelerating System Performance Using ESL Design Tools and FPGA Technology v. 1.0, http://www.celoxica.com/techlib/files/CEL-W061018155T-514.pdf, August 2006.

[15]

Stephen A. Edwards, "The Challenges of Hardware Synthesis from C-Like Languages," pp. 66--67, Design, Automation and Test in Europe (DATE'05) Volume 1, 2005.

Digital Library

[16]

Charles E. Stroud, Ronald R. Munoz, and David A. Pierce. Behavioral model synthesis with cones. IEEE Design & Test of Computers, 5(3):22--30, July 1988.

Digital Library

[17]

Thorsten Grotker, Stan Liao, Grant Martin, and Stuart Swan. System Design with SystemC. Kluwer, Boston, Massachusetts, 2002.

Digital Library

[18]

David C. Ku and Giovanni De Micheli. HardwareC: A language for hardware design. Technical Report CSTL-TR-90-419, Computer Systems Lab, Stanford University, California, August 1990. Version 2.0.

Digital Library

[19]

David Galloway. The Transmogrifier C hardware description language and compiler for FPGAs. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM), pages 136--144, Napa, California, April 1995.

Digital Library

[20]

Stephen A. Edwards. High-level Synthesis from the Synchronous Language Esterel. In Proceedings of the International Workshop on Logic and Synthesis (IWLS). New Orleans, Louisiana, June, 2002.

[21]

Donald Soderman and Yuri Panchul. Implementing C algorithms in reconfigurable hardware using C2Verilog. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM), pages 339--342, Los Alamitos, CA, April 1998.

Digital Library

[22]

Patrick Schaumont, Serge Vernalde, Luc Rijnders, Marc Engels, and Ivo Bolsens. A programming environment for the design of complex high speed ASICs. In Proceedings of the 35th Design Automation Conference, pages 315--320, San Francisco, California, June 1998.

Digital Library

[23]

Mentor Graphics, Catapult Synthesis Datasheet 10-25-550w, http://www.mentor.com/products/esl/high_level_synthesis/catapult_synthesis/upload/Catapult_DS_0107.pdf, 2007.

[24]

Daniel D. Gajski, Jianwen Zhu, Rainer Dömer, Andreas Gerstlauer, and Shuqing Zhao. SpecC: Specification Language and Methodology. Kluwer, Boston, Massachusetts, 2000.

[25]

W. Bohm, J. Hammes, et al. Mapping a single assignment programming language to reconfigurable systems. The J. of Supercomputing, 21(2):117--130, February 2002.

Digital Library

[26]

J. Frigo, M. Gokhale, et al. Evaluation of the Streams-C C-to-FPGA compiler: an applications perspective. In FPGA, pages 134--140. ACM Press, 2001.

Digital Library

[27]

Takashi Kambe, et. al. A C-based synthesis system, Bach, and its application. In Proceedings of the Asia South Pacific Design Automation Conference (ASP-DAC), pages 151--155, Yokohama, Japan, 2001. ACM Press.

Digital Library

[28]

Nallatech, DIMEtalk 3.1 User Guide NT 107-0305, http://www.nallatech.com, 2006.

[29]

D. C. Cronquist, P. Franklin, et al. Specifying and compiling applications for RaPiD. In K. L. Pocek and J. Arnold, editors, FCCM, pages 116--125. IEEE Computer Society Press, 1998.

Digital Library

[30]

Altera, Nios II C2H Compiler Users Guide v1.2, http://www.altera.com/literature/ug/ug_nios2_c2h_compiler.pdf, May 2007.

[31]

Mihai Budiu and Seth C. Goldstein. Compiling application-specific hardware. In Proceedings of the12th Intl Conf on Field Programmable Logic and Applications (FPL), pages 853--863, Montpellier, France, September 2002.

Digital Library

[32]

K. Berkel. Handshake Circuits: An Asynchronous Architecture for VLSI Programming, volume 5 of Intl. Series on Parallel Computation. Cambridge University Press, 1993.

Digital Library

[33]

D. Edwards and A. Bardsley. Balsa: An asynchronous hardware synthesis language. The Computer J., 45(1):12--18, 2002.

[34]

D. May. OCCAM. SIGPLAN Notices, 18(4):69--79, May 1983.

Digital Library

[35]

J. Teifel and R. Manohar. Static tokens: Using dataflow to automate concurrent pipeline synthesis. In 10th Int'l Symposium on Advanced Research in Asynchronous Circuits and Systems, pages 17--27, April 2004.

[36]

AMD, AMD Introduces World's First Dedicated Enterprise Stream Processor, http://www.amd.com/us--en/Corporate/VirtualPressRoom/0,51_104_543~114146,00.html, November 2006.

[37]

nVidia, NVIDIA CUDA Compute Unified Device Architecture Programming Guide v1.0, http://developer.download.nvidia.com/compute/cuda/1_0/NVIDIA_CUDA_Programming_Guide_1.0.pdf, 6/2007

[38]

RapidMind, Writing Applications for the GPU Using the RapidMind" Development Platform, http://www.rapidmind.net/pdfs/WPgpu.pdf, 2006.

[39]

Matthew Papakipos, The PeakStream Platform, High-Productivity Software Development for Multi-Core Processors, http://download.microsoft.com/download/d/f/6/df6accd5-4bf2-4984-8285-f4f23b7b1f37/WinHEC2007_PeakStream.doc, April 2007.

[40]

Chris Frasier, David Hansen, LCC, A Retargetable Compiler for ANSI-C v4.2, http://www.cs.princeton.edu/software/lcc/, 2007.

[41]

Intel® Core"2 Extreme Processor X6800 and Intel® Core"2 Duo Desktop Processor E6000 and E4000 Sequences Datasheet, v -006, http://download.intel.com/design/processor/datashts/31327806.pdf, 2007.

[42]

Xilinx, XST Users Guide 9.1i, http://toolbox.xilinx.com/docsan/xilinx9/books/docs/xst/xst.pdf, 2007.

[43]

Xilinx, XAPP228 -- Quad-Port Memories in Virtex Devices, http://www.xilinx.com/bvdocs/appnotes/xapp228.pdf, 2002.

[44]

NCAR, STSWM (NCAR Spectral Transform Shallow Water Model), http://www.csm.ornl.gov/chammp/stswm/index.html, 2000

[45]

Xilinx ACP Press Release http://www.xilinx.com/prs_rls/2007/events_corp/0757_intelforum.htm

[46]

Xilinx, DS083: Virtex-II Pro Data Sheet v. 4.6, http://www.xilinx.com/bvdocs/publications/ds083.pdf, 2007.

[47]

Xilinx, UG081: MicroBlaze Processor Reference Guide v 7.0 http://www.xilinx.com/ise/embedded/mb_ref_guide.pdf, 2007.

Cited By

Josipovic LGuerrieri AIenne P(2022)From C/C++ Code to High-Performance Dataflow CircuitsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.310557441:7(2142-2155)Online publication date: Jul-2022
https://doi.org/10.1109/TCAD.2021.3105574
Josipovic LGuerrieri AIenne P(2021)Synthesizing General-Purpose Code Into Dynamically Scheduled CircuitsIEEE Circuits and Systems Magazine10.1109/MCAS.2021.307163121:2(97-118)Online publication date: Oct-2022
https://doi.org/10.1109/MCAS.2021.3071631
Nigam RAtapattu SThomas SLi ZBauer TYe YKoti ASampson AZhang ZDonaldson ATorlak E(2020)Predictable accelerator design with time-sensitive affine typesProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3385974(393-407)Online publication date: 11-Jun-2020
https://dl.acm.org/doi/10.1145/3385412.3385974
Show More Cited By

Index Terms

CHiMPS: a high-level compilation flow for hybrid CPU-FPGA architectures
1. Computer systems organization
  1. Architectures
    1. Other architectures

Recommendations

SAccO

This paper presents SAccO (Scalable Accelerator platform Osnabrück), a novel framework for implementing data-intensive applications using scalable and portable reconfigurable hardware accelerators. Instead of using expensive "reconfigurable ...
A Many-Core Co-Processor for Embedded Parallel Computing on FPGA
DSD '15: Proceedings of the 2015 Euromicro Conference on Digital System Design

Single processor architectures are unable to provide the required performance of high performance embedded systems. Parallel processing based on general-purpose processors can achieve these performances with a considerable increase of required ...
Performance and power of cache-based reconfigurable computing
FPGA '09: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

CHiMPS is a C-based compiler for high-performance computing (HPC) on heterogeneous CPU-FPGA computing platforms. CHiMPS efficiently supports random accesses to main memory through the many-cache memory model, enabling a broader range of applications to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

FPGA '08: Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays

February 2008

278 pages

ISBN:9781595939340

DOI:10.1145/1344671

General Chair:
Mike Hutton
Altera, USA
,
Program Chair:
Paul Chow
University of Toronto, Canada

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

FPGA08

Sponsor:

FPGA08: ACM/SIGDA International Symposium on Field Programmable Gate Arrays

February 24 - 26, 2008

California, Monterey, USA

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

41
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Josipovic LGuerrieri AIenne P(2022)From C/C++ Code to High-Performance Dataflow CircuitsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.310557441:7(2142-2155)Online publication date: Jul-2022
https://doi.org/10.1109/TCAD.2021.3105574
Josipovic LGuerrieri AIenne P(2021)Synthesizing General-Purpose Code Into Dynamically Scheduled CircuitsIEEE Circuits and Systems Magazine10.1109/MCAS.2021.307163121:2(97-118)Online publication date: Oct-2022
https://doi.org/10.1109/MCAS.2021.3071631
Nigam RAtapattu SThomas SLi ZBauer TYe YKoti ASampson AZhang ZDonaldson ATorlak E(2020)Predictable accelerator design with time-sensitive affine typesProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3385974(393-407)Online publication date: 11-Jun-2020
https://dl.acm.org/doi/10.1145/3385412.3385974
Stramondo GCiobanu Cde Laat CVarbanescu A(2019)Designing and building application‐centric parallel memoriesConcurrency and Computation: Practice and Experience10.1002/cpe.548532:15Online publication date: 14-Aug-2019
https://doi.org/10.1002/cpe.5485
Mahajan DKim JSacks JArdalan AKumar AEsmaeilzadeh H(2018)In-RDBMS hardware acceleration of advanced analyticsProceedings of the VLDB Endowment10.14778/3236187.323618811:11(1317-1331)Online publication date: 1-Jul-2018
https://dl.acm.org/doi/10.14778/3236187.3236188
Josipović LGhosal RIenne PAnderson JBazargan K(2018)Dynamically Scheduled High-level SynthesisProceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3174243.3174264(127-136)Online publication date: 15-Feb-2018
https://dl.acm.org/doi/10.1145/3174243.3174264
Margerm SSharifian AGuha AShriraman APokam GOskin MInoue K(2018)TAPASProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00028(245-257)Online publication date: 20-Oct-2018
https://dl.acm.org/doi/10.1109/MICRO.2018.00028
Sacks JMahajan DLawson RKhaleghi BEsmaeilzadeh H(2018)RoboxProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00047(479-490)Online publication date: 2-Jun-2018
https://dl.acm.org/doi/10.1109/ISCA.2018.00047
Ciobanu CStramondo Gde Laat CVarbanescu A(2018)MAX-PolyMem: High-Bandwidth Polymorphic Parallel Memories for DFEs2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2018.00025(107-114)Online publication date: May-2018
https://doi.org/10.1109/IPDPSW.2018.00025
Stramondo GCiobanu CVarbanescu Ade Laat C(2018)Towards Application-Centric Parallel MemoriesEuro-Par 2018: Parallel Processing Workshops10.1007/978-3-030-10549-5_38(481-493)Online publication date: 31-Dec-2018
https://doi.org/10.1007/978-3-030-10549-5_38
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

Media

Figures

Other

Tables

View Table of Contents