[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2967938.2967969acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

A DSL Compiler for Accelerating Image Processing Pipelines on FPGAs

Published: 11 September 2016 Publication History

Abstract

This paper describes an automatic approach to accelerate image processing pipelines using FPGAs. An image processing pipeline can be viewed as a graph of interconnected stages that processes images successively. Each stage typically performs a point-wise, stencil, or other more complex operations on image pixels. Recent efforts have led to the development of domain-specific languages (DSL) and optimization frameworks for image processing pipelines. In this paper, we develop an approach to map image processing pipelines expressed in the PolyMage DSL to efficient parallel FPGA designs. Our approach exploits reuse and available memory bandwidth (or chip resources) maximally. When compared to Darkroom, a state-of-the-art approach to compile high-level DSL to FPGAs, our approach (a) leads to designs that deliver significantly higher throughput, and (b) supports a greater variety of filters. Furthermore, the designs we generate obtain an improvement even over pre-optimized FPGA implementations provided by vendor libraries for some of the benchmarks.

References

[1]
C. Alias, A. Darte, and A. Plesco. Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA. In International workshop on Polyhedral Compilation Techniques (IMPACT), 2012.
[2]
J. Auerbach, D. F. Bacon, I. Burcea, P. Cheng, S. J. Fink, R. Rabbah, and S. Shukla. A compiler and runtime for heterogeneous computing. In Design Automation Conference, pages 271--276, 2012.
[3]
D. F. Bacon, R. M. Rabbah, and S. Shukla. FPGA programming for the masses. Commun. ACM, 56(4):56--63, 2013.
[4]
Blender Foundation. Big Buck Bunny, 2008. The movie. http://www.bigbuckbunny.org/ License: CC BY 3.0 https://creativecommons.org/licenses/by/3.0/.
[5]
U. Bondhugula, J. Ramanujam, and P. Sadayappan. Automatic mapping of nested loops to FPGAs. In ACM SIGPLAN PPoPP, Mar. 2007.
[6]
J. M. Cardoso and D. P. C. Compilation Techniques for Reconfigurable Architectures. Springer US, 2009.
[7]
Creative Commons Attribution 3.0 license (CC BY 3.0). https://creativecommons.org/licenses/by/3.0/.
[8]
Creative Commons Attribution-ShareAlike 3.0 license (CC BY-SA 3.0). https://creativecommons.org/licenses/by-sa/3.0/.
[9]
A. Darte, R. Schreiber, B. R. Rau, and F. Vivien. A Constructive Solution to the Juggling Problem in Processor Array Synthesis. In IPDPS, pages 815--822, 2000.
[10]
C. Dase, J. Falcon, and B. MacCleery. Motorcycle control prototyping using an FPGA-based embedded control system. Control Systems, IEEE, 26(5):17--21, 2006.
[11]
P. C. Diniz, M. W. Hall, J. Park, B. So, and H. Ziegler. Bridging the Gap between Compilation and Synthesis in the DEFACTO System. In LCPC, pages 52--70, 2001.
[12]
M. B. Gokhale, J. M. Stone, J. Arnold, and M. Kalinowski. Stream-oriented FPGA computing in the Streams-C high level language. In IEEE symposium on Field-Programmable Custom Computing Machines, pages 49--56, 2000.
[13]
Z. Guo, W. Najjar, and B. Buyukkurt. Efficient hardware code generation for FPGAs. ACM Trans. Archit. Code Optim., 5(1):6:1--6:26, May 2008.
[14]
A. Hagiescu, W.-F. Wong, D. Bacon, and R. Rabbah. A computing origami: Folding streams in FPGAs. In ACM/IEEE Design Automation Conference, pages 282--287, 2009.
[15]
J. Hegarty, J. Brunhaver, Z. DeVito, J. Ragan-Kelley, N. Cohen, S. Bell, A. Vasilyev, M. Horowitz, and P. Hanrahan. Darkroom: Compiling high-level image nprocessing code into hardware pipelines. ACM Trans. Graph., 33(4):144:1--144:11, 2014.
[16]
The Heterogeneous Image Processing Acceleration Framework. http://hipacc-lang.org/.
[17]
J. Holewinski, L.-N. Pouchet, and P. Sadayappan. High-performance code generation for stencil computations on GPU architectures. In International conference on Supercomputing, pages 311--320, 2012.
[18]
A. Hormati, M. Kudlur, S. Mahlke, D. Bacon, and R. Rabbah. Optimus: Efficient realization of streaming applications on FPGAs. In 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES), pages 41--50, 2008.
[19]
B. K. P. Horn and B. G. Schunck. Determining optical flow. Artif. Intell., 17(1-3):185--203, 1981.
[20]
S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Effective Automatic Parallelization of Stencil Computations. In ACM SIGPLAN conference on Programming Languages Design and Implementation, 2007.
[21]
MATLAB HDL Coder. The MathWorks Inc. http://in.mathworks.com/products/hdl-coder//.
[22]
R. Membarth, O. Reiche, F. Hannig, J. Teich, M. Körner, and W. Eckert. Hipacc: A domain-specific language and compiler for image processing. IEEE Trans. Parallel Distrib. Syst., 27(1):210--224, 2016.
[23]
R. T. Mullapudi, V. Vasista, and U. Bondhugula. Polymage: Automatic optimization for image processing pipelines. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 429--443, 2015.
[24]
W. A. Najjar, W. Böhm, B. A. Draper, J. Hammes, R. Rinker, J. R. Beveridge, M. Chawathe, and C. Ross. High-level language abstraction for reconfigurable computing. Computer, 36(8):63--69, Aug. 2003.
[25]
R. S. Nikhil and Arvind. What is bluespec? SIGDA Newsl., 39(1):1--1, Jan. 2009.
[26]
M. Owaida, N. Bellas, K. Daloukas, and C. Antonopoulos. Synthesis of platform architectures from OpenCL programs. In IEEE Field-Programmable Custom Computing Machines (FCCM), pages 186--193, May 2011.
[27]
P. R. Panda. Systemc: A modeling platform supporting multiple design abstractions. In 14th International symposium on Systems Synthesis, pages 75--80, 2001.
[28]
A. Papakonstantinou, K. Gururaj, J. A. Stratton, D. Chen, J. Cong, and W. W. Hwu. Efficient compilation of CUDA kernels for high-performance computing on FPGAs. ACM Trans. Embedded Comput. Syst., 13(2):25, 2013.
[29]
PolyMage benchmarks, 2015. https://github.com/bondhugula/polymage-benchmarks.
[30]
PolyMage: A DSL and compiler for automatic optimization of image processing pipelines, 2015. http://mcl.csa.iisc.ernet.in/polymage.html.
[31]
L. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. Polyhedral-based data reuse optimization for configurable computing. In ACM/SIGDA International symposium on FPGAs, pages 29--38, 2013.
[32]
J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In ACM SIGPLAN conference on Programming Languages Design and Implementation, pages 519--530, 2013.
[33]
M. Ravishankar, J. Holewinski, and V. Grover. Forma: A dsl for image processing applications to target gpus and multi-core cpus. In 8th Workshop on General Purpose Processing Using GPUs, pages 109--120, 2015.
[34]
O. Reiche, M. Schmid, F. Hannig, R. Membarth, and J. Teich. Code generation from a domain-specific language for C-based HLS of hardware accelerators. In 2014 International Conference on Hardware/Software Codesign and System Synthesis, pages 17:1--17:10, 2014.
[35]
R. Schreiber, S. Aditya, S. Mahlke, V. Kathail, B. R. Rau, D. Cronquist, and M. Sivaraman. PICO-NPA: High-Level synthesis of non-programmable hardware maccelerators. J. VLSI Signal Process. Syst., 31(2):127--142, 2002.
[36]
B. So, M. W. Hall, and P. C. Diniz. A compiler approach to fast hardware design space exploration in FPGA-based systems. In ACM SIGPLAN conference on Programming Languages Design and Implementation, pages 165--176, 2002.
[37]
C. B. Spear. SystemVerilog for Verification: A Guide to Learning the Testbench Language Features. Springer, 2nd edition, 2010.
[38]
Adult tortoise, 2016. Finlay Cox. http://www.pasthorizonspr.com/wp-content/uploads/2016/02/tortoise.jpg License: CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0/.
[39]
X. Zhou, J.-P. Giacalone, M. J. Garzarán, R. H. Kuhn, Y. Ni, and D. Padua. Hierarchical overlapped tiling. In International symposium on Code Generation and Optimization, pages 207--218, 2012.

Cited By

View all
  • (2024)SlidingConv: Domain-Specific Description of Sliding Discrete Cosine Transform Convolution for HalideIEEE Access10.1109/ACCESS.2023.334566012(7563-7583)Online publication date: 2024
  • (2023)FlowPix: Accelerating Image Processing Pipelines on an FPGA Overlay using a Domain Specific CompilerACM Transactions on Architecture and Code Optimization10.1145/362952320:4(1-25)Online publication date: 25-Oct-2023
  • (2023)HIR: An MLIR-based Intermediate Representation for Hardware Accelerator DescriptionProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624767(189-201)Online publication date: 25-Mar-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation
September 2016
474 pages
ISBN:9781450341219
DOI:10.1145/2967938
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 September 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. domain-specific language
  2. dsl
  3. fpgas
  4. hls
  5. image processing
  6. parallelism
  7. reuse

Qualifiers

  • Research-article

Conference

PACT '16
Sponsor:
  • IFIP WG 10.3
  • IEEE TCCA
  • SIGARCH
  • IEEE CS TCPP

Acceptance Rates

PACT '16 Paper Acceptance Rate 31 of 119 submissions, 26%;
Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)5
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)SlidingConv: Domain-Specific Description of Sliding Discrete Cosine Transform Convolution for HalideIEEE Access10.1109/ACCESS.2023.334566012(7563-7583)Online publication date: 2024
  • (2023)FlowPix: Accelerating Image Processing Pipelines on an FPGA Overlay using a Domain Specific CompilerACM Transactions on Architecture and Code Optimization10.1145/362952320:4(1-25)Online publication date: 25-Oct-2023
  • (2023)HIR: An MLIR-based Intermediate Representation for Hardware Accelerator DescriptionProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624767(189-201)Online publication date: 25-Mar-2023
  • (2023)ExHiPR: Extended High-Level Partial Reconfiguration for Fast Incremental FPGA CompilationACM Transactions on Reconfigurable Technology and Systems10.1145/361783717:2(1-28)Online publication date: 14-Sep-2023
  • (2023)Unified Buffer: Compiling Image Processing and Machine Learning Applications to Push-Memory AcceleratorsACM Transactions on Architecture and Code Optimization10.1145/357290820:2(1-26)Online publication date: 1-Mar-2023
  • (2023)AHA: An Agile Approach to the Design of Coarse-Grained Reconfigurable Accelerators and CompilersACM Transactions on Embedded Computing Systems10.1145/353493322:2(1-34)Online publication date: 24-Jan-2023
  • (2023)Duet: Creating Harmony between Processors and Embedded FPGAs2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070989(745-758)Online publication date: Feb-2023
  • (2022)Applications and Techniques for Fast Machine Learning in ScienceFrontiers in Big Data10.3389/fdata.2022.7874215Online publication date: 12-Apr-2022
  • (2022)Pushing the Level of Abstraction of Digital System Design: A Survey on How to Program FPGAsACM Computing Surveys10.1145/353298955:5(1-48)Online publication date: 3-Dec-2022
  • (2022)PLD: fast FPGA compilation to make reconfigurable acceleration compatible with modern incremental refinement software developmentProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507740(933-945)Online publication date: 28-Feb-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media