More Web Proxy on the site http://driver.im/

research-article

PolyMage: Automatic Optimization for Image Processing Pipelines

Authors:

Ravi Teja Mullapudi,

Uday BondhugulaAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 43, Issue 1

Pages 429 - 443

https://doi.org/10.1145/2786763.2694364

Published: 14 March 2015 Publication History

Abstract

This paper presents the design and implementation of PolyMage, a domain-specific language and compiler for image processing pipelines. An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each stage typically performs one of point-wise, stencil, reduction or data-dependent operations on image pixels. Individual stages in a pipeline typically exhibit abundant data parallelism that can be exploited with relative ease. However, the stages also require high memory bandwidth preventing effective utilization of parallelism available on modern architectures. For applications that demand high performance, the traditional options are to use optimized libraries like OpenCV or to optimize manually. While using libraries precludes optimization across library routines, manual optimization accounting for both parallelism and locality is very tedious.

The focus of our system, PolyMage, is on automatically generating high-performance implementations of image processing pipelines expressed in a high-level declarative language. Our optimization approach primarily relies on the transformation and code generation capabilities of the polyhedral compiler framework. To the best of our knowledge, this is the first model-driven compiler for image processing pipelines that performs complex fusion, tiling, and storage optimization automatically. Experimental results on a modern multicore system show that the performance achieved by our automatic approach is up to 1.81x better than that achieved through manual tuning in Halide, a state-of-the-art language and compiler for image processing pipelines. For a camera raw image processing pipeline, our performance is comparable to that of a hand-tuned implementation.

References

[1]

Andrew Adams, Eino-Ville Talvala, Sung Hee Park, David E. Jacobs, Boris Ajdin, Natasha Gelfand, Jennifer Dolson, Daniel Vaquero, Jongmin Baek, Marius Tico, Hendrik P. A. Lensch, Wojciech Matusik, Kari Pulli, Mark Horowitz, and Marc Levoy. The Frankencamera: An Experimental Platform for Computational Photography. In ACM Transactions on Graphics, pages 29:1--29:12, 2010.

Digital Library

[2]

Corinne Ancourt and Francois Irigoin. Scanning polyhedra with do loops. In ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pages 39--50, 1991.

Digital Library

[3]

Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O'Reilly, and Saman Amarasinghe. Opentuner: An extensible frame- work for program autotuning. In International conference on Parallel Architectures and Compilation Techniques, pages 303--316, 2014.

Digital Library

[4]

M. Aubry, S. Paris, S. Hasinoff, J. Kautz, and F. Durand. Fast local laplacian filters: Theory and applications. ACM Transactions on Graphics, 2014.

Digital Library

[5]

Vinayaka Bandishti, Irshad Pananilath, and Uday Bondhugula. Tiling stencil computations to maximize parallelism. In International conference for High Performance Computing, Networking, Storage, and Analysis, pages 40:1--40:11, 2012.

Digital Library

[6]

Cedric Bastoul. Code generation in the polyhedral model is easier than you think. In International conference on Parallel Architectures and Compilation Techniques, pages 7--16, 2004.

Digital Library

[7]

Uday Bondhugula, Oktay Gunluk, Sanjeeb Dash, and Lakshminarayanan Renganarayanan. A model for fusion and code motion in an automatic parallelizing compiler. In International conference on Parallel Architectures and Compilation Techniques, pages 343--352, 2010.

Digital Library

[8]

Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. In ACM SIGPLAN conference on Programming Languages Design and Implementation, pages 101--113, 2008.

Digital Library

[9]

Ian Buck, Tim Foley, Daniel Reiter Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. Brook for GPUs: stream computing on graphics hardware. In ACM Transactions on Graphics, 2004.

Digital Library

[10]

Peter J. Burt and Edward H. Adelson. A multiresolution spline with application to image mosaics. ACM Transactions on Graphics, 2(4):217--236, 1983.

Digital Library

[11]

Jiawen Chen, Sylvain Paris, and Fredo Durand. Real-time edge-aware image processing with the bilateral grid. In ACM Transactions on Graphics, 2007.

Digital Library

[12]

The CImg Library: C++ Template Image Processing Toolkit. http://cimg.sourceforge.net/.

[13]

Albert Cohen, Sylvain Girbal, David Parello, M. Sigler, Olivier Temam, and Nicolas Vasilache. Facilitating the search for compositions of program transformations. In International conference on Supercomputing, pages 151--160, 2005.

Digital Library

[14]

Franklin C. Crow. Summed-area tables for texture mapping. In Annual conference on Computer Graphics and Interactive Techniques, pages 207--212, 1984.

Digital Library

[15]

Zachary DeVito, Niels Joubert, Francisco Palacios, Stephen Oakley, Montserrat Medina, Mike Barrientos, Erich Elsen, Frank Ham, Alex Aiken, Karthik Duraisamy, Eric Darve, Juan Alonso, and Pat Hanrahan. Liszt: A domain specific language for building portable mesh-based pde solvers. In International conference for High Performance Computing, Networking, Storage, and Analysis, pages 9:1--9:12, 2011.

Digital Library

[16]

Conal Elliott. Functional image synthesis. In Proceedings of Bridges, 2001.

[17]

Sylvain Girbal, Nicolas Vasilache, Cedric Bastoul, Albert Cohen, David Parello, Marc Sigler, and Olivier Temam. Semi-automatic composition of loop transformations. International Journal of Parallel Programming, 34(3):261--317, 2006.

Digital Library

[18]

Google Glass. http://www.google.com/glass.

[19]

Michael I. Gordon, William Thies, and Saman P. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In International conference on Architectural Support for Programming Languages and Operating Systems, pages 151--162, 2006.

Digital Library

[20]

Michael I. Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali S. Meli, Andrew A. Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, and Saman P. Amarasinghe. A stream compiler for communication-exposed architectures. In International conference on Architectural Support for Programming Languages and Operating Systems, pages 291--303, 2002.

Digital Library

[21]

Tobias Grosser, Albert Cohen, Justin Holewinski, P Sadayappan, and Sven Verdoolaege. Hybrid hexagonal/classical tiling for GPUs. In International symposium on Code Generation and Optimization, page 66, 2014.

Digital Library

[22]

Tobias Grosser, Albert Cohen, Paul HJ Kelly, J Ramanujam, P Sadayappan, and Sven Verdoolaege. Split tiling for GPUs: automatic parallelization using trapezoidal tiles. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, pages 24--31, 2013.

Digital Library

[23]

Halide git version. https://github.com/halide/HalideCommit: 8a9a0f7153a6701b6d76a706dc08bbd12ba41396.

[24]

Mary W. Hall, Jacqueline Chame, Chun Chen, Jaewook Shin, Gabe Rudy, and Malik Murtaza Khan. Loop transformation recipes for code generation and auto-tuning. In International workshop on Languages and Compilers for Parallel Computing, pages 50--64, 2009.

Digital Library

[25]

Chris Harris and Mike Stephens. A combined corner and edge detector. In Fourth Alvey Vision Conference, pages 147--151, 1988.

[26]

Tom Henretty, Richard Veras, Franz Franchetti, Louis-Noel Pouchet, J. Ramanujam, and P. Sadayappan. A stencil compiler for short-vector simd architectures. In International conference on Supercomputing, pages 13--24, 2013.

Digital Library

[27]

Justin Holewinski, Louis-Noel Pouchet, and P Sadayappan. High-performance code generation for stencil computations on GPU architectures. In International conference on Super- computing, pages 311--320, 2012.

Digital Library

[28]

Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun. Green-marl: A dsl for easy and efficient graph analysis. In International conference on Architectural Support for Programming Languages and Operating Systems, pages 349--362, 2012.

Digital Library

[29]

Sriram Krishnamoorthy, Muthu Baskaran, Uday Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Effective Automatic Parallelization of Stencil Computations. In ACM SIGPLAN conference on Programming Languages Design and Implementation, 2007.

Digital Library

[30]

A. Leung, N.T. Vasilache, B. Meister, and R.A. Lethin. Methods and apparatus for joint parallelism and locality optimization in source code compilation, June 3 2010. WO Patent App. PCT/US2009/057,194.

[31]

Sanyam Mehta, Pei-Hung Lin, and Pen-Chung Yew. Revisiting loop fusion in the polyhedral framework. In ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pages 233--246, 2014.

Digital Library

[32]

OpenCV: Open Source Computer Vision. http://opencv.org.

[33]

Sylvain Paris, Samuel W. Hasinoff, and Jan Kautz. Local laplacian filters: Edge-aware image processing with a laplacian pyramid. In ACM Transactions on Graphics, pages 68:1--68:12, 2011.

Digital Library

[34]

Sylvain Paris, Pierre Kornprobst, JackTumblin Tumblin, and Fredo Durand. Bilateral filtering: Theory and applications. Foundations and Trends R in Computer Graphics and Vision, 4(1):1--75, 2009.

Digital Library

[35]

CoreImage. Apple Core Image programming guide.

[36]

Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Fredo Durand. Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Transactions on Graphics, 31(4):32:1--32:12, 2012.

Digital Library

[37]

Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Fredo Durand, and Saman Amarasinghe. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In ACM SIGPLAN conference on Programming Languages Design and Implementation, pages 519--530, 2013.

Digital Library

[38]

Michael A. Shantzis. A model for efficient and flexible image computing. In ACM Transactions on Graphics, pages 147--154, 1994.

Digital Library

[39]

Arvind K. Sujeeth, Kevin J. Brown, Hyoukjoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. Delite: A compiler architecture for performance-oriented embedded domain-specific languages. ACM Transactions on Embedded Computing, 13(4s):134:1--134:25, 2014.

Digital Library

[40]

William Thies, Michal Karczmarek, and Saman P. Amarasinghe. Streamit: A language for streaming applications. In International conference on Compiler Construction, pages 179--196, 2002.

Digital Library

[41]

Ananta Tiwari, Chun Chen, Jacqueline Chame, Mary Hall, and Jeffrey K. Hollingsworth. A scalable auto-tuning framework for compiler optimization. In International Parallel and Distributed Processing Symposium, pages 1--12, 2009.

Digital Library

[42]

Sven Verdoolaege. isl: An integer set library for the polyhedral model. In International Congress Conference on Mathematical Software, volume 6327, pages 299--302. 2010.

Digital Library

[43]

M. Wolf. More iteration space tiling. In International conference for High Performance Computing, Networking, Storage, and Analysis, pages 655--664, 1989.

Digital Library

[44]

D. Wonnacott. Using time skewing to eliminate idle time due to memory bandwidth and network limitations. In International Parallel and Distributed Processing Symposium, pages 171--180, 2000.

Digital Library

[45]

Jingling Xue. Loop tiling for parallelism. Kluwer Academic Publishers, Norwell, MA, USA, 2000.

Digital Library

[46]

Xing Zhou, Jean-Pierre Giacalone, María Jesus Garzaran, Robert H. Kuhn, Yang Ni, and David Padua. Hierarchical overlapped tiling. In International symposium on Code Generation and Optimization, pages 207--218, 2012.

Digital Library

Cited By

Ye HJun HChen DTsafrir DMUSUVATHI MGupta RAbu-Ghazaleh N(2024)HIDA: A Hierarchical Dataflow Compiler for High-Level SynthesisProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624850(215-230)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3617232.3624850
Sun QLiu YYang HJiang ZLuan ZQian D(2024)Adaptive Auto-Tuning Framework for Global Exploration of Stencil Optimization on GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.332563035:1(20-33)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TPDS.2023.3325630
Liu XYang XMa KLiu SZhang KYang HLiu YLuan ZQian D(2024)Moirae: Generating High-Performance Composite Stencil Programs with Global OptimizationsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00026(1-15)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SC41406.2024.00026
Show More Cited By

Index Terms

PolyMage: Automatic Optimization for Image Processing Pipelines
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation

Recommendations

PolyMage: Automatic Optimization for Image Processing Pipelines
ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems

This paper presents the design and implementation of PolyMage, a domain-specific language and compiler for image processing pipelines. An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each ...
Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines
PLDI '13

Image processing pipelines combine the challenges of stencil computations and stream programs. They are composed of large graphs of different stencil stages, as well as complex reductions, and stages with global or data-dependent access patterns. ...
PolyMage: Automatic Optimization for Image Processing Pipelines
ASPLOS '15

This paper presents the design and implementation of PolyMage, a domain-specific language and compiler for image processing pipelines. An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 43, Issue 1

ASPLOS'15

March 2015

676 pages

ISSN:0163-5964

DOI:10.1145/2786763

Editor:
Doug DeGroot
acm dot org

Issue’s Table of Contents

ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems
March 2015
720 pages
ISBN:9781450328357
DOI:10.1145/2694344
General Chairs:
Ozcan Ozturk
Bilkent University, Turkey
,
Kemal Ebcioglu
Global Supercomputing, USA
,
Program Chair:
Sandhya Dwarkadas
University of Rochester, USA

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 March 2015

Published in SIGARCH Volume 43, Issue 1

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

176
Total Citations
View Citations
2,019
Total Downloads

Downloads (Last 12 months)90
Downloads (Last 6 weeks)10

Reflects downloads up to 28 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ye HJun HChen DTsafrir DMUSUVATHI MGupta RAbu-Ghazaleh N(2024)HIDA: A Hierarchical Dataflow Compiler for High-Level SynthesisProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624850(215-230)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3617232.3624850
Sun QLiu YYang HJiang ZLuan ZQian D(2024)Adaptive Auto-Tuning Framework for Global Exploration of Stencil Optimization on GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.332563035:1(20-33)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TPDS.2023.3325630
Liu XYang XMa KLiu SZhang KYang HLiu YLuan ZQian D(2024)Moirae: Generating High-Performance Composite Stencil Programs with Global OptimizationsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00026(1-15)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SC41406.2024.00026
Zhang WZhao JShen GChen QChen CGuo M(2024)An Optimizing Framework on MLIR for Efficient FPGA-based Accelerator Generation2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00017(75-90)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00017
Campos NEdirisinghe EChesnokov SLarkin D(2024)Fast Generation of Custom Floating-Point Spatial Filters on FPGAsIEEE Access10.1109/ACCESS.2024.348606612(167059-167071)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3486066
Chen C(2024)Application of support vector machine–based CNC machining in furniture product visual design and production control processThe International Journal of Advanced Manufacturing Technology10.1007/s00170-024-14243-xOnline publication date: 10-Aug-2024
https://doi.org/10.1007/s00170-024-14243-x
Srivastava NLiu GLai YZhang Z(2024)FPGA-Specific CompilersHandbook of Computer Architecture10.1007/978-981-97-9314-3_25(989-1025)Online publication date: 21-Dec-2024
https://doi.org/10.1007/978-981-97-9314-3_25
Ciccozzi F(2024)Towards Systematic and Precise Compilation of Domain-Specific Modelling LanguagesITNG 2024: 21st International Conference on Information Technology-New Generations10.1007/978-3-031-56599-1_55(437-443)Online publication date: 11-Mar-2024
https://doi.org/10.1007/978-3-031-56599-1_55
Choudhury ZGulati APurini S(2023)FlowPix: Accelerating Image Processing Pipelines on an FPGA Overlay using a Domain Specific CompilerACM Transactions on Architecture and Code Optimization10.1145/362952320:4(1-25)Online publication date: 25-Oct-2023
https://dl.acm.org/doi/10.1145/3629523
Jeong HYeo JBahk CPark JDubach CBruening DHardekopf B(2023)Pin or Fuse? Exploiting Scratchpad Memory to Reduce Off-Chip Data Transfer in DNN AcceleratorsProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580017(224-235)Online publication date: 17-Feb-2023
https://dl.acm.org/doi/10.1145/3579990.3580017
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents