[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

PolyMage: Automatic Optimization for Image Processing Pipelines

Published: 14 March 2015 Publication History

Abstract

This paper presents the design and implementation of PolyMage, a domain-specific language and compiler for image processing pipelines. An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each stage typically performs one of point-wise, stencil, reduction or data-dependent operations on image pixels. Individual stages in a pipeline typically exhibit abundant data parallelism that can be exploited with relative ease. However, the stages also require high memory bandwidth preventing effective utilization of parallelism available on modern architectures. For applications that demand high performance, the traditional options are to use optimized libraries like OpenCV or to optimize manually. While using libraries precludes optimization across library routines, manual optimization accounting for both parallelism and locality is very tedious.
The focus of our system, PolyMage, is on automatically generating high-performance implementations of image processing pipelines expressed in a high-level declarative language. Our optimization approach primarily relies on the transformation and code generation capabilities of the polyhedral compiler framework. To the best of our knowledge, this is the first model-driven compiler for image processing pipelines that performs complex fusion, tiling, and storage optimization automatically. Experimental results on a modern multicore system show that the performance achieved by our automatic approach is up to 1.81x better than that achieved through manual tuning in Halide, a state-of-the-art language and compiler for image processing pipelines. For a camera raw image processing pipeline, our performance is comparable to that of a hand-tuned implementation.

References

[1]
Andrew Adams, Eino-Ville Talvala, Sung Hee Park, David E. Jacobs, Boris Ajdin, Natasha Gelfand, Jennifer Dolson, Daniel Vaquero, Jongmin Baek, Marius Tico, Hendrik P. A. Lensch, Wojciech Matusik, Kari Pulli, Mark Horowitz, and Marc Levoy. The Frankencamera: An Experimental Platform for Computational Photography. In ACM Transactions on Graphics, pages 29:1--29:12, 2010.
[2]
Corinne Ancourt and Francois Irigoin. Scanning polyhedra with do loops. In ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pages 39--50, 1991.
[3]
Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O'Reilly, and Saman Amarasinghe. Opentuner: An extensible frame- work for program autotuning. In International conference on Parallel Architectures and Compilation Techniques, pages 303--316, 2014.
[4]
M. Aubry, S. Paris, S. Hasinoff, J. Kautz, and F. Durand. Fast local laplacian filters: Theory and applications. ACM Transactions on Graphics, 2014.
[5]
Vinayaka Bandishti, Irshad Pananilath, and Uday Bondhugula. Tiling stencil computations to maximize parallelism. In International conference for High Performance Computing, Networking, Storage, and Analysis, pages 40:1--40:11, 2012.
[6]
Cedric Bastoul. Code generation in the polyhedral model is easier than you think. In International conference on Parallel Architectures and Compilation Techniques, pages 7--16, 2004.
[7]
Uday Bondhugula, Oktay Gunluk, Sanjeeb Dash, and Lakshminarayanan Renganarayanan. A model for fusion and code motion in an automatic parallelizing compiler. In International conference on Parallel Architectures and Compilation Techniques, pages 343--352, 2010.
[8]
Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. In ACM SIGPLAN conference on Programming Languages Design and Implementation, pages 101--113, 2008.
[9]
Ian Buck, Tim Foley, Daniel Reiter Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. Brook for GPUs: stream computing on graphics hardware. In ACM Transactions on Graphics, 2004.
[10]
Peter J. Burt and Edward H. Adelson. A multiresolution spline with application to image mosaics. ACM Transactions on Graphics, 2(4):217--236, 1983.
[11]
Jiawen Chen, Sylvain Paris, and Fredo Durand. Real-time edge-aware image processing with the bilateral grid. In ACM Transactions on Graphics, 2007.
[12]
The CImg Library: C++ Template Image Processing Toolkit. http://cimg.sourceforge.net/.
[13]
Albert Cohen, Sylvain Girbal, David Parello, M. Sigler, Olivier Temam, and Nicolas Vasilache. Facilitating the search for compositions of program transformations. In International conference on Supercomputing, pages 151--160, 2005.
[14]
Franklin C. Crow. Summed-area tables for texture mapping. In Annual conference on Computer Graphics and Interactive Techniques, pages 207--212, 1984.
[15]
Zachary DeVito, Niels Joubert, Francisco Palacios, Stephen Oakley, Montserrat Medina, Mike Barrientos, Erich Elsen, Frank Ham, Alex Aiken, Karthik Duraisamy, Eric Darve, Juan Alonso, and Pat Hanrahan. Liszt: A domain specific language for building portable mesh-based pde solvers. In International conference for High Performance Computing, Networking, Storage, and Analysis, pages 9:1--9:12, 2011.
[16]
Conal Elliott. Functional image synthesis. In Proceedings of Bridges, 2001.
[17]
Sylvain Girbal, Nicolas Vasilache, Cedric Bastoul, Albert Cohen, David Parello, Marc Sigler, and Olivier Temam. Semi-automatic composition of loop transformations. International Journal of Parallel Programming, 34(3):261--317, 2006.
[18]
Google Glass. http://www.google.com/glass.
[19]
Michael I. Gordon, William Thies, and Saman P. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In International conference on Architectural Support for Programming Languages and Operating Systems, pages 151--162, 2006.
[20]
Michael I. Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali S. Meli, Andrew A. Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, and Saman P. Amarasinghe. A stream compiler for communication-exposed architectures. In International conference on Architectural Support for Programming Languages and Operating Systems, pages 291--303, 2002.
[21]
Tobias Grosser, Albert Cohen, Justin Holewinski, P Sadayappan, and Sven Verdoolaege. Hybrid hexagonal/classical tiling for GPUs. In International symposium on Code Generation and Optimization, page 66, 2014.
[22]
Tobias Grosser, Albert Cohen, Paul HJ Kelly, J Ramanujam, P Sadayappan, and Sven Verdoolaege. Split tiling for GPUs: automatic parallelization using trapezoidal tiles. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, pages 24--31, 2013.
[23]
Halide git version. https://github.com/halide/HalideCommit: 8a9a0f7153a6701b6d76a706dc08bbd12ba41396.
[24]
Mary W. Hall, Jacqueline Chame, Chun Chen, Jaewook Shin, Gabe Rudy, and Malik Murtaza Khan. Loop transformation recipes for code generation and auto-tuning. In International workshop on Languages and Compilers for Parallel Computing, pages 50--64, 2009.
[25]
Chris Harris and Mike Stephens. A combined corner and edge detector. In Fourth Alvey Vision Conference, pages 147--151, 1988.
[26]
Tom Henretty, Richard Veras, Franz Franchetti, Louis-Noel Pouchet, J. Ramanujam, and P. Sadayappan. A stencil compiler for short-vector simd architectures. In International conference on Supercomputing, pages 13--24, 2013.
[27]
Justin Holewinski, Louis-Noel Pouchet, and P Sadayappan. High-performance code generation for stencil computations on GPU architectures. In International conference on Super- computing, pages 311--320, 2012.
[28]
Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun. Green-marl: A dsl for easy and efficient graph analysis. In International conference on Architectural Support for Programming Languages and Operating Systems, pages 349--362, 2012.
[29]
Sriram Krishnamoorthy, Muthu Baskaran, Uday Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Effective Automatic Parallelization of Stencil Computations. In ACM SIGPLAN conference on Programming Languages Design and Implementation, 2007.
[30]
A. Leung, N.T. Vasilache, B. Meister, and R.A. Lethin. Methods and apparatus for joint parallelism and locality optimization in source code compilation, June 3 2010. WO Patent App. PCT/US2009/057,194.
[31]
Sanyam Mehta, Pei-Hung Lin, and Pen-Chung Yew. Revisiting loop fusion in the polyhedral framework. In ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pages 233--246, 2014.
[32]
OpenCV: Open Source Computer Vision. http://opencv.org.
[33]
Sylvain Paris, Samuel W. Hasinoff, and Jan Kautz. Local laplacian filters: Edge-aware image processing with a laplacian pyramid. In ACM Transactions on Graphics, pages 68:1--68:12, 2011.
[34]
Sylvain Paris, Pierre Kornprobst, JackTumblin Tumblin, and Fredo Durand. Bilateral filtering: Theory and applications. Foundations and Trends R in Computer Graphics and Vision, 4(1):1--75, 2009.
[35]
CoreImage. Apple Core Image programming guide.
[36]
Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Fredo Durand. Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Transactions on Graphics, 31(4):32:1--32:12, 2012.
[37]
Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Fredo Durand, and Saman Amarasinghe. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In ACM SIGPLAN conference on Programming Languages Design and Implementation, pages 519--530, 2013.
[38]
Michael A. Shantzis. A model for efficient and flexible image computing. In ACM Transactions on Graphics, pages 147--154, 1994.
[39]
Arvind K. Sujeeth, Kevin J. Brown, Hyoukjoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. Delite: A compiler architecture for performance-oriented embedded domain-specific languages. ACM Transactions on Embedded Computing, 13(4s):134:1--134:25, 2014.
[40]
William Thies, Michal Karczmarek, and Saman P. Amarasinghe. Streamit: A language for streaming applications. In International conference on Compiler Construction, pages 179--196, 2002.
[41]
Ananta Tiwari, Chun Chen, Jacqueline Chame, Mary Hall, and Jeffrey K. Hollingsworth. A scalable auto-tuning framework for compiler optimization. In International Parallel and Distributed Processing Symposium, pages 1--12, 2009.
[42]
Sven Verdoolaege. isl: An integer set library for the polyhedral model. In International Congress Conference on Mathematical Software, volume 6327, pages 299--302. 2010.
[43]
M. Wolf. More iteration space tiling. In International conference for High Performance Computing, Networking, Storage, and Analysis, pages 655--664, 1989.
[44]
D. Wonnacott. Using time skewing to eliminate idle time due to memory bandwidth and network limitations. In International Parallel and Distributed Processing Symposium, pages 171--180, 2000.
[45]
Jingling Xue. Loop tiling for parallelism. Kluwer Academic Publishers, Norwell, MA, USA, 2000.
[46]
Xing Zhou, Jean-Pierre Giacalone, María Jesus Garzaran, Robert H. Kuhn, Yang Ni, and David Padua. Hierarchical overlapped tiling. In International symposium on Code Generation and Optimization, pages 207--218, 2012.

Cited By

View all
  • (2024)HIDA: A Hierarchical Dataflow Compiler for High-Level SynthesisProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624850(215-230)Online publication date: 27-Apr-2024
  • (2024)Adaptive Auto-Tuning Framework for Global Exploration of Stencil Optimization on GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.332563035:1(20-33)Online publication date: 1-Jan-2024
  • (2024)Moirae: Generating High-Performance Composite Stencil Programs with Global OptimizationsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00026(1-15)Online publication date: 17-Nov-2024
  • Show More Cited By

Index Terms

  1. PolyMage: Automatic Optimization for Image Processing Pipelines

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 43, Issue 1
    ASPLOS'15
    March 2015
    676 pages
    ISSN:0163-5964
    DOI:10.1145/2786763
    Issue’s Table of Contents
    • cover image ACM Conferences
      ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems
      March 2015
      720 pages
      ISBN:9781450328357
      DOI:10.1145/2694344
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 March 2015
    Published in SIGARCH Volume 43, Issue 1

    Check for updates

    Author Tags

    1. domain-specific language
    2. image processing
    3. locality
    4. multicores
    5. parallelism
    6. polyhedral optimization
    7. tiling
    8. vectorization

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)90
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 28 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)HIDA: A Hierarchical Dataflow Compiler for High-Level SynthesisProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624850(215-230)Online publication date: 27-Apr-2024
    • (2024)Adaptive Auto-Tuning Framework for Global Exploration of Stencil Optimization on GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.332563035:1(20-33)Online publication date: 1-Jan-2024
    • (2024)Moirae: Generating High-Performance Composite Stencil Programs with Global OptimizationsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00026(1-15)Online publication date: 17-Nov-2024
    • (2024)An Optimizing Framework on MLIR for Efficient FPGA-based Accelerator Generation2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00017(75-90)Online publication date: 2-Mar-2024
    • (2024)Fast Generation of Custom Floating-Point Spatial Filters on FPGAsIEEE Access10.1109/ACCESS.2024.348606612(167059-167071)Online publication date: 2024
    • (2024)Application of support vector machine–based CNC machining in furniture product visual design and production control processThe International Journal of Advanced Manufacturing Technology10.1007/s00170-024-14243-xOnline publication date: 10-Aug-2024
    • (2024)FPGA-Specific CompilersHandbook of Computer Architecture10.1007/978-981-97-9314-3_25(989-1025)Online publication date: 21-Dec-2024
    • (2024)Towards Systematic and Precise Compilation of Domain-Specific Modelling LanguagesITNG 2024: 21st International Conference on Information Technology-New Generations10.1007/978-3-031-56599-1_55(437-443)Online publication date: 11-Mar-2024
    • (2023)FlowPix: Accelerating Image Processing Pipelines on an FPGA Overlay using a Domain Specific CompilerACM Transactions on Architecture and Code Optimization10.1145/362952320:4(1-25)Online publication date: 25-Oct-2023
    • (2023)Pin or Fuse? Exploiting Scratchpad Memory to Reduce Off-Chip Data Transfer in DNN AcceleratorsProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580017(224-235)Online publication date: 17-Feb-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media