[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1344671.1344720acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
poster

CHiMPS: a high-level compilation flow for hybrid CPU-FPGA architectures

Published: 24 February 2008 Publication History

Abstract

This poster describes CHiMPS, a toolflow that aims to provide software developers with a way to program hybrid CPU-FPGA platforms using familiar tools, languages, and techniques. CHiMPS starts with C and produces a specialized spatial dataflow architecture that supports coherent caches and the shared-memory programming model. The toolflow is designed to abstract away the complex details of data movement and separate memories on the hybrid platforms, as well as take advantage of memory management and computation techniques unique to reconfigurable hardware. This poster focuses on the memory design for CHiMPS, particularly the use of numerous small caches customized for various phases of program execution. The poster also addresses area vs. performance tradeoffs for various configurations. Applications compiled using CHiMPS show performance improvements of more than 36x on simple compute-intensive kernels, and 4.3x on the difficult-to-parallelize STSWM application without any special optimizations compared to running only on the CPU. The toolflow supports full ANSI-C, and produces hardware that runs on platforms that are expected to be available within one year

References

[1]
Krste Asanovic, et al. The Landscape of Parallel Computing: A View from Berkeley. UCB/EECS-2006-183. http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html, Dec 2006.
[2]
DRC Computers, RPU110 Data Sheet v4.18.07, http://www.drccomputer.com/pdfs/DRC_RPU110_datasheet.pdf, 2007.
[3]
XtremeData, XD2000i Data Sheet, http://www.xtremedatainc.com/pdf/XD2000i_brief.pdf, 2007.
[4]
Steve Trimberger. Redefining the FPGA. Field Programmable Logic (FPL) 2007, San Jose, CA, 2007.
[5]
Avinash (Nash) Palaniswamy, Misha Burich, Intel + Altera = Efficient HPC Coprocessing, http://www.altera.com/education/webcasts/all/wc-2007-efficient-hpc-processing.html, 2007.
[6]
Celoxica, Handel-C For Hardware Design v1.1, http://www.celoxica.com/techlib/files/CEL-W0307171L48-63.pdf, August 2002.
[7]
Daniel S. Poznanovic, "Application Development on the SRC Computers, Inc. Systems," ipdps, p. 78a, 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) -- Papers, 2005.
[8]
Celoxica, Handel-C Language Reference Manual RM-1003-4.2, http://www.celoxica.com, 2004.
[9]
Xilinx, UG096: Implementing a Virtex-4 FX PowerPC System with a C-to-HDL Hardware Coprocessor Accelerator Design Guide v1.0, http://www.xilinx.com/bvdocs/userguides/ug096.pdf, 2005.
[10]
Zhi Guo, et. al. Optimized Generation of Data-Path from C Codes for FPGAs, Proceedings of the conference on Design, Automation and Test in Europe -- Volume 1 Pages: 112--117, 2005.
[11]
Slogsnat, D., Giese, A., and Brüning, U. 2007. A versatile, low latency HyperTransport core. In Proceedings of the 2007 ACM/SIGDA 15th international Symposium on Field Programmable Gate Arrays (Monterey, California, USA, February 18-20, 2007). FPGA '07. ACM Press, New York, NY, 45--52. DOI= http://doi.acm.org/10.1145/1216919.1216926
[12]
Ian McCallum, Intel® QuickAssist Technology Accelerator Abstraction Layer (AAL) 317481-001US, http://download.intel.com/technology/platforms/quickassist/quickassist_aal_whitepaper.pdf, 2007.
[13]
M.B. Gokhale et al., "Promises and Pitfalls of Reconfigurable Supercomputing" Proc. 2006 Conf. Eng. of Reconfigurable Systems and Algorithms, CSREA Press, 2006, pp. 11--20.
[14]
Celoxica, Accelerating System Performance Using ESL Design Tools and FPGA Technology v. 1.0, http://www.celoxica.com/techlib/files/CEL-W061018155T-514.pdf, August 2006.
[15]
Stephen A. Edwards, "The Challenges of Hardware Synthesis from C-Like Languages," pp. 66--67, Design, Automation and Test in Europe (DATE'05) Volume 1, 2005.
[16]
Charles E. Stroud, Ronald R. Munoz, and David A. Pierce. Behavioral model synthesis with cones. IEEE Design & Test of Computers, 5(3):22--30, July 1988.
[17]
Thorsten Grotker, Stan Liao, Grant Martin, and Stuart Swan. System Design with SystemC. Kluwer, Boston, Massachusetts, 2002.
[18]
David C. Ku and Giovanni De Micheli. HardwareC: A language for hardware design. Technical Report CSTL-TR-90-419, Computer Systems Lab, Stanford University, California, August 1990. Version 2.0.
[19]
David Galloway. The Transmogrifier C hardware description language and compiler for FPGAs. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM), pages 136--144, Napa, California, April 1995.
[20]
Stephen A. Edwards. High-level Synthesis from the Synchronous Language Esterel. In Proceedings of the International Workshop on Logic and Synthesis (IWLS). New Orleans, Louisiana, June, 2002.
[21]
Donald Soderman and Yuri Panchul. Implementing C algorithms in reconfigurable hardware using C2Verilog. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM), pages 339--342, Los Alamitos, CA, April 1998.
[22]
Patrick Schaumont, Serge Vernalde, Luc Rijnders, Marc Engels, and Ivo Bolsens. A programming environment for the design of complex high speed ASICs. In Proceedings of the 35th Design Automation Conference, pages 315--320, San Francisco, California, June 1998.
[23]
Mentor Graphics, Catapult Synthesis Datasheet 10-25-550w, http://www.mentor.com/products/esl/high_level_synthesis/catapult_synthesis/upload/Catapult_DS_0107.pdf, 2007.
[24]
Daniel D. Gajski, Jianwen Zhu, Rainer Dömer, Andreas Gerstlauer, and Shuqing Zhao. SpecC: Specification Language and Methodology. Kluwer, Boston, Massachusetts, 2000.
[25]
W. Bohm, J. Hammes, et al. Mapping a single assignment programming language to reconfigurable systems. The J. of Supercomputing, 21(2):117--130, February 2002.
[26]
J. Frigo, M. Gokhale, et al. Evaluation of the Streams-C C-to-FPGA compiler: an applications perspective. In FPGA, pages 134--140. ACM Press, 2001.
[27]
Takashi Kambe, et. al. A C-based synthesis system, Bach, and its application. In Proceedings of the Asia South Pacific Design Automation Conference (ASP-DAC), pages 151--155, Yokohama, Japan, 2001. ACM Press.
[28]
Nallatech, DIMEtalk 3.1 User Guide NT 107-0305, http://www.nallatech.com, 2006.
[29]
D. C. Cronquist, P. Franklin, et al. Specifying and compiling applications for RaPiD. In K. L. Pocek and J. Arnold, editors, FCCM, pages 116--125. IEEE Computer Society Press, 1998.
[30]
Altera, Nios II C2H Compiler Users Guide v1.2, http://www.altera.com/literature/ug/ug_nios2_c2h_compiler.pdf, May 2007.
[31]
Mihai Budiu and Seth C. Goldstein. Compiling application-specific hardware. In Proceedings of the12th Intl Conf on Field Programmable Logic and Applications (FPL), pages 853--863, Montpellier, France, September 2002.
[32]
K. Berkel. Handshake Circuits: An Asynchronous Architecture for VLSI Programming, volume 5 of Intl. Series on Parallel Computation. Cambridge University Press, 1993.
[33]
D. Edwards and A. Bardsley. Balsa: An asynchronous hardware synthesis language. The Computer J., 45(1):12--18, 2002.
[34]
D. May. OCCAM. SIGPLAN Notices, 18(4):69--79, May 1983.
[35]
J. Teifel and R. Manohar. Static tokens: Using dataflow to automate concurrent pipeline synthesis. In 10th Int'l Symposium on Advanced Research in Asynchronous Circuits and Systems, pages 17--27, April 2004.
[36]
AMD, AMD Introduces World's First Dedicated Enterprise Stream Processor, http://www.amd.com/us--en/Corporate/VirtualPressRoom/0,51_104_543~114146,00.html, November 2006.
[37]
nVidia, NVIDIA CUDA Compute Unified Device Architecture Programming Guide v1.0, http://developer.download.nvidia.com/compute/cuda/1_0/NVIDIA_CUDA_Programming_Guide_1.0.pdf, 6/2007
[38]
RapidMind, Writing Applications for the GPU Using the RapidMind" Development Platform, http://www.rapidmind.net/pdfs/WPgpu.pdf, 2006.
[39]
Matthew Papakipos, The PeakStream Platform, High-Productivity Software Development for Multi-Core Processors, http://download.microsoft.com/download/d/f/6/df6accd5-4bf2-4984-8285-f4f23b7b1f37/WinHEC2007_PeakStream.doc, April 2007.
[40]
Chris Frasier, David Hansen, LCC, A Retargetable Compiler for ANSI-C v4.2, http://www.cs.princeton.edu/software/lcc/, 2007.
[41]
Intel® Core"2 Extreme Processor X6800 and Intel® Core"2 Duo Desktop Processor E6000 and E4000 Sequences Datasheet, v -006, http://download.intel.com/design/processor/datashts/31327806.pdf, 2007.
[42]
Xilinx, XST Users Guide 9.1i, http://toolbox.xilinx.com/docsan/xilinx9/books/docs/xst/xst.pdf, 2007.
[43]
Xilinx, XAPP228 -- Quad-Port Memories in Virtex Devices, http://www.xilinx.com/bvdocs/appnotes/xapp228.pdf, 2002.
[44]
NCAR, STSWM (NCAR Spectral Transform Shallow Water Model), http://www.csm.ornl.gov/chammp/stswm/index.html, 2000
[45]
Xilinx ACP Press Release http://www.xilinx.com/prs_rls/2007/events_corp/0757_intelforum.htm
[46]
Xilinx, DS083: Virtex-II Pro Data Sheet v. 4.6, http://www.xilinx.com/bvdocs/publications/ds083.pdf, 2007.
[47]
Xilinx, UG081: MicroBlaze Processor Reference Guide v 7.0 http://www.xilinx.com/ise/embedded/mb_ref_guide.pdf, 2007.

Cited By

View all
  • (2022)From C/C++ Code to High-Performance Dataflow CircuitsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.310557441:7(2142-2155)Online publication date: Jul-2022
  • (2021)Synthesizing General-Purpose Code Into Dynamically Scheduled CircuitsIEEE Circuits and Systems Magazine10.1109/MCAS.2021.307163121:2(97-118)Online publication date: Oct-2022
  • (2020)Predictable accelerator design with time-sensitive affine typesProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3385974(393-407)Online publication date: 11-Jun-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
FPGA '08: Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
February 2008
278 pages
ISBN:9781595939340
DOI:10.1145/1344671
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FPGA
  2. FPGA accelerators
  3. c-to-gates
  4. high-performance computing
  5. reconfigurable computing

Qualifiers

  • Poster

Conference

FPGA08
Sponsor:

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)From C/C++ Code to High-Performance Dataflow CircuitsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.310557441:7(2142-2155)Online publication date: Jul-2022
  • (2021)Synthesizing General-Purpose Code Into Dynamically Scheduled CircuitsIEEE Circuits and Systems Magazine10.1109/MCAS.2021.307163121:2(97-118)Online publication date: Oct-2022
  • (2020)Predictable accelerator design with time-sensitive affine typesProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3385974(393-407)Online publication date: 11-Jun-2020
  • (2019)Designing and building application‐centric parallel memoriesConcurrency and Computation: Practice and Experience10.1002/cpe.548532:15Online publication date: 14-Aug-2019
  • (2018)In-RDBMS hardware acceleration of advanced analyticsProceedings of the VLDB Endowment10.14778/3236187.323618811:11(1317-1331)Online publication date: 1-Jul-2018
  • (2018)Dynamically Scheduled High-level SynthesisProceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3174243.3174264(127-136)Online publication date: 15-Feb-2018
  • (2018)TAPASProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00028(245-257)Online publication date: 20-Oct-2018
  • (2018)RoboxProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00047(479-490)Online publication date: 2-Jun-2018
  • (2018)MAX-PolyMem: High-Bandwidth Polymorphic Parallel Memories for DFEs2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2018.00025(107-114)Online publication date: May-2018
  • (2018)Towards Application-Centric Parallel MemoriesEuro-Par 2018: Parallel Processing Workshops10.1007/978-3-030-10549-5_38(481-493)Online publication date: 31-Dec-2018
  • Show More Cited By

View Options

Login options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media