[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/CGO.2009.20acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
Article

Software Pipelined Execution of Stream Programs on GPUs

Published: 22 March 2009 Publication History

Abstract

The StreamIt programming model has been proposed to exploit parallelism in streaming applications on general purpose multi-core architectures. This model allows programmers to specify the structure of a program as a set of filters that act upon data, and a set of communication channels between them. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on modern Graphics Processing Units (GPUs), as they support abundant parallelism in hardware. In this paper, we describe the challenges in mapping StreamIt to GPUs and propose an efficient technique to software pipeline the execution of stream programs on GPUs. We formulate this problem --- both scheduling and assignment of filters to processors --- as an efficient Integer Linear Program (ILP), which is then solved using ILP solvers. We also describe a novel buffer layout technique for GPUs which facilitates exploiting the high memory bandwidth available in GPUs. The proposed scheduling utilizes both the scalar units in GPU, to exploit data parallelism, and multiprocessors, to exploit task and pipeline parallelism. Further it takes into consideration the synchronization and bandwidth limitations of GPUs, and yields speedups between 1.87X and 36.83X over a single threaded CPU.

References

[1]
NVIDIA CUDA Programming Guide. {Online}. Available: http: //www.nvidia.com/cuda
[2]
S. Ryoo, C. I. Rodrigues, S. S. Stone, S. S. Baghsorkhi, S.-Z. Ueng, J. A. Stratton, and W.-M. W. Hwu, "Program Optimization Space Pruning for a Multithreaded GPU," in CGO '08: Proc. of the sixth annual IEEE/ACM Intl. Symp. on Code Generation and Optimization, 2008, pp. 195-204.
[3]
ATI CTM Guide. {Online}. Available: http://ati.amd.com/companyinfo/ researcher/documents/ATI_CTM_Guide.pdf
[4]
NVIDIA CUDA. {Online}. Available: http://www.nvidia.com/cuda
[5]
M. I. Gordon, W. Thies, and S. Amarasinghe, "Exploiting Coarse-grained Task, Data, and Pipeline Parallelism in Stream Programs," in ASPLOS-XII: Proc. of the 12th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, 2006, pp. 151-162.
[6]
W. Thies, M. Karczmarek, and S. P. Amarasinghe, "StreamIt: A Language for Streaming Applications," in CC '02: Proc. of the 11th Intl. Conf. on Compiler Construction, 2002, pp. 179-196.
[7]
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan, "Brook for GPUs: Stream Computing on Graphics Hardware," ACM Trans. on Graphics, vol. 23, no. 3, pp. 777-786, 2004.
[8]
D. Tarditi, S. Puri, and J. Oglesby, "Accelerator: Using Data Parallelism to Program GPUs for General-Purpose Uses," in ASPLOS-XII: Proc. of the 12th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, 2006, pp. 325-335.
[9]
M. Kudlur and S. Mahlke, "Orchestrating the Execution of Stream Programs on Multicore Platforms," in PLDI '08: Proc. of the 2008 ACM SIGPLAN Conf. on Programming Language Design and Implementation, 2008, pp. 114-124.
[10]
S. Agrawal, W. Thies, and S. Amarasinghe, "Optimizing Stream Programs using Linear State Space Analysis," in CASES '05: Proc. of the 2005 Intl. Conf. on Compilers, Architectures and Synthesis for Embedded Systems, 2005, pp. 126-136.
[11]
E. A. Lee and D. G. Messerschmitt, "Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing," IEEE Trans. on Computers, vol. 36, no. 1, pp. 24-35, 1987.
[12]
S. S. Bhattacharyya and E. A. Lee, "Looped Schedules for Dataflow Descriptions of Multirate Signal Processing Algorithms," Formal Methods in System Design, vol. 5, no. 3, pp. 183-205, 1994.
[13]
M. Karczmarek, W. Thies, and S. Amarasinghe, "Phased Scheduling of Stream Programs," in LCTES '03: Proc. of the 2003 ACM SIGPLAN Conf. on Language, Compiler, and Tool Support for Embedded Systems, 2003, pp. 103-112.
[14]
R. Govindarajan, E. R. Altman, and G. R. Gao, "Minimizing Register Requirements Under Resource-constrained Rate-optimal Software Pipelining," in MICRO 27: Proc. of the 27th annual Intl. Symp. on Microarchitecture, 1994, pp. 85-94.
[15]
R. Govindarajan and G. Gao, "A Novel Framework for Multi-rate Scheduling in DSP Applications," in ASAP '93: Proc. of the 1993 Intl. Conf. on Application-Specific Array Processors, Oct 1993, pp. 77-88.
[16]
B. R. Rau, M. S. Schlansker, and P. P. Tirumalai, "Code Generation Schema for Modulo Scheduled Loops," in MICRO 25: Proc. of the 25th annual Intl. Symp. on Microarchitecture, 1992, pp. 158-169.
[17]
StreamIt Home Page. {Online}. Available: http://www.cag.lcs.mit.edu/ streamit/
[18]
P. K. Murthy and S. S. Bhattacharyya, "Buffer Merging--A Powerful Technique for Reducing Memory Requirements of Synchronous Dataflow Specifications," ACM Trans. on Design and Automation of Electronic Systems, vol. 9, no. 2, pp. 212-237, 2004.
[19]
G. Gao, R. Govindarajan, and P. Panangaden, "Well-Behaved Dataflow Programs for DSP Computation," ICASSP-92: IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 1992., vol. 5, pp. 561-564 vol. 5, Mar 1992.
[20]
R. Govindarajan, G. Gao, and P. Desai, "Minimizing Memory Requirements in Rate-optimal Schedules," in ASAP '94: Proc. of the 1994 Intl. Conf. on Application Specific Array Processors, Aug 1994, pp. 75-86.
[21]
M. I. Gordon, W. Thies, M. Karczmarek, J. Lin, A. S. Meli, A. A. Lamb, C. Leger, J. Wong, H. Hoffmann, D. Maze, and S. Amarasinghe, "A Stream Compiler for Communication-Exposed Architectures," in ASPLOS-X: Proc. of the 10th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, 2002, pp. 291-303.
[22]
D. Zhang, Q. J. Li, R. Rabbah, and S. Amarasinghe, "A Lightweight Streaming Layer for Multicore Execution," SIGARCH Computer Architecture News, vol. 36, no. 2, pp. 18-27, 2008.
[23]
S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W. mei W. Hwu, "Optimization Principles and Application Performance Evaluation of a Multithreaded GPU using CUDA," in PPoPP '08: Proc. of the 13th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, 2008, pp. 73-82.

Cited By

View all
  • (2023)GaiwanScience of Computer Programming10.1016/j.scico.2023.102989230:COnline publication date: 1-Aug-2023
  • (2022)High-Level Stream and Data Parallelism in C++ for GPUsProceedings of the XXVI Brazilian Symposium on Programming Languages10.1145/3561320.3561327(41-49)Online publication date: 6-Oct-2022
  • (2022)FPGA HLS Today: Successes, Challenges, and OpportunitiesACM Transactions on Reconfigurable Technology and Systems10.1145/353077515:4(1-42)Online publication date: 8-Aug-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO '09: Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
March 2009
299 pages
ISBN:9780769535760

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 22 March 2009

Check for updates

Author Tags

  1. CUDA
  2. GPU Programming
  3. Software Pipelining
  4. Stream Programming

Qualifiers

  • Article

Conference

CGO '09

Acceptance Rates

CGO '09 Paper Acceptance Rate 26 of 70 submissions, 37%;
Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)GaiwanScience of Computer Programming10.1016/j.scico.2023.102989230:COnline publication date: 1-Aug-2023
  • (2022)High-Level Stream and Data Parallelism in C++ for GPUsProceedings of the XXVI Brazilian Symposium on Programming Languages10.1145/3561320.3561327(41-49)Online publication date: 6-Oct-2022
  • (2022)FPGA HLS Today: Successes, Challenges, and OpportunitiesACM Transactions on Reconfigurable Technology and Systems10.1145/353077515:4(1-42)Online publication date: 8-Aug-2022
  • (2021)Multiple-tasks on multiple-devices (MTMD): exploiting concurrency in heterogeneous managed runtimesProceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3453933.3454019(125-138)Online publication date: 7-Apr-2021
  • (2019)HiWayLibProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304032(153-166)Online publication date: 4-Apr-2019
  • (2018)High performance stencil code generation with LiftProceedings of the 2018 International Symposium on Code Generation and Optimization10.1145/3168824(100-112)Online publication date: 24-Feb-2018
  • (2018)Memory-Constrained Vectorization and Scheduling of Dataflow Graphs for Hybrid CPU-GPU PlatformsACM Transactions on Embedded Computing Systems10.1145/315766917:2(1-25)Online publication date: 30-Jan-2018
  • (2017)Lift: a functional data-parallel IR for high-performance GPU code generationProceedings of the 2017 International Symposium on Code Generation and Optimization10.5555/3049832.3049841(74-85)Online publication date: 4-Feb-2017
  • (2016)Scalable and modular online data processing for ultrafast computed tomography using CUDA pipelinesProceedings of the 2nd Workshop on In Situ Infrastructures for Enabling Extreme-scale Analysis and Visualization10.5555/3018859.3018861(7-11)Online publication date: 13-Nov-2016
  • (2016)Software pipelining for graphic processing unit accelerationInternational Journal of High Performance Computing Applications10.1177/109434201558584530:2(169-185)Online publication date: 1-May-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media