[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1854273.1854322acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

The Paralax infrastructure: automatic parallelization with a helping hand

Published: 11 September 2010 Publication History

Abstract

Speeding up sequential programs on multicores is a challenging problem that is in urgent need of a solution. Automatic parallelization of irregular pointer-intensive codes, exemplified by the SPECint codes, is a very hard problem. This paper shows that, with a helping hand, such auto-parallelization is possible and fruitful.
This paper makes the following contributions: (i) A compiler-framework for extracting pipeline-like parallelism from outer program loops is presented. (ii) Using a light-weight programming model based on annotations, the programmer helps the compiler to find thread-level parallelism. Each of the annotations specifies only a small piece of semantic information that compiler analysis misses, e.g. stating that a variable is dead at a certain program point. The annotations are designed such that correctness is easily verified. Furthermore, we present a tool for suggesting annotations to the programmer. (iii) The methodology is applied to auto-parallelize several SPECint benchmarks. For the benchmark with most parallelism (hmmer), we obtain a scalable 7-fold speedup on an AMD quad-core dual processor.
The annotations constitute a parallel programming model that relies extensively on a sequential program representation. Hereby, the complexity of debugging is not increased and it does not obscure the source code. These properties could prove valuable to increase the efficiency of parallel programming.

References

[1]
}}J. C. Adams,W. S. Brainerd, J. T.Martin, B. T. Smith, and J. L. Wagener. Fortran 90 handbook: complete ANSI/ISO reference. Intertext Publications, Inc.,/McGraw-Hill, Inc., New York, NY, USA, 1992.
[2]
}}R. Allen and K. Kennedy. Automatic translation of Fortran programs to vector form. ACM Trans. Program. Lang. Syst., 9(4):491--542, 1987.
[3]
}}T. Brandes, S. Chaumette, M. C. Counilh, J. Roman, A. Darte, F. Desprez, and J. C. Mignot. Hpfit: a set of integrated tools for the parallelization of applications using high performance fortran. part I: HPFIT and the TransTOOL environment. Parallel Comput., 23(1-2):71--87, 1997.
[4]
}}M. Bridges, N. Vachharajani, Y. Zhang, T. Jablin, and D. August. Revisiting the sequential programming model for multi-core. In MICRO '07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pages 69--84, 2007.
[5]
}}M. Burke and R. Cytron. Interprocedural dependence analysis and parallelization. SIGPLAN Not., 21(7):162--175, 1986.
[6]
}}D. R. Butenhof. Programming with POSIX threads. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1997.
[7]
}}L. Ceze, J. Tuck, J. Torrellas, and C. Cascaval. Bulk disambiguation of speculative threads in multiprocessors. In ISCA '06: Proceedings of the 33rd annual international symposium on Computer Architecture, pages 227--238, 2006.
[8]
}}M. K. Chen and K. Olukotun. The Jrpm system for dynamically parallelizing Java programs. In ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture, pages 434--446, 2003.
[9]
}}F. Chow, S. Chan, S.-M. Liu, R. Lo, and M. Streich. Effective representation of aliases and indirect memory operations in ssa form. In CC '96: Proceedings of the 6th International Conference on Compiler Construction, pages 253--267, 1996.
[10]
}}C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In PLDI '07: Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pages 223--234, 2007.
[11]
}}K.-F. Faxén, K. Popov, S. Jansson, and L. Albertsson. Embla: Data dependence profiling for parallel programming. In 2008 International Conference on Complex, Intelligent and Software Intensive Systems, pages 780--875, 2008.
[12]
}}J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst., 9(3):319--349, 1987.
[13]
}}S. Fink, K. Knobe, and V. Sarkar. Unified analysis of array and object references in strongly typed languages. In SAS '00: Proceedings of the 7th International Symposium on Static Analysis, pages 155--174, 2000.
[14]
}}M. Frigo, C. E. Leierson, and K. H. Randall. The implementation of the Cilk-5 multi-threaded language. In PLDI '98: Proceedings of the 1998 ACM SIGPLAN conference on Programming language design and implementation, pages 212--223, 1998.
[15]
}}M. I. Gordon, W. Thies, M. Karczmarek, J. Lin, A. S. Meli, A. A. Lamb, C. Leger, J. Wong, H. Hoffmann, D. Maze, and S. Amarasinghe. A stream compiler for communication-exposed architectures. In ASPLOS-X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems, pages 291--303, 2002.
[16]
}}S. Z. Guyer and C. Lin. An annotation language for optimizing software libraries. In DSL'99: Proceedings of the 2nd conference on Conference on Domain-Specific Languages, pages 4--4, 1999.
[17]
}}L. Hammond, M. Willey, and K. Olukotun. Data speculation support for a chip multiprocessor. In ASPLOS-VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, pages 58--69, 1998.
[18]
}}P. Husbands, C. Iancu, and K. Yelick. A performance analysis of the berkeley upc compiler. In ICS '03: Proceedings of the 17th annual international conference on Supercomputing, pages 63--73, 2003.
[19]
}}W.-M. e. Hwu. Implicitly parallel programming models for thousand-core microprocessors. In Design Automation Conference (DAC-44), 2007.
[20]
}}F. Irigoin, P. Jouvelot, and R. Triolet. Semantical interprocedural parallelization: an overview of the PIPS project. In ICS '91: Proceedings of the 5th international conference on Supercomputing, pages 244--251, 1991.
[21]
}}M. Ishihara, H. Honda, and M. Sato. Development and implementation of an interactive parallelization assistance tool for OpenMP: iPat/OMP. IEICE - Trans. Inf. Syst., E89-D(2):399--407, 2006.
[22]
}}K. Kennedy and J. R. Allen. Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2002.
[23]
}}K. Kennedy, K. S. McKinley, and C. W. Tseng. Interactive parallel programming using the parascope editor. IEEE Trans. Parallel Distrib. Syst., 2(3):329--341, 1991.
[24]
}}K. Knobe and V. Sarkar. Array SSA and its use in parallelization. In 25th Annual Symposium on the Principles of Programming Languages, pages 107--120, Jan. 1998.
[25]
}}M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In PLDI '07: Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pages 211--222, 2007.
[26]
}}M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In PLDI '07: Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pages 211--222, 2007.
[27]
}}C. Lapkowski and L. Hendren. Extended ssa numbering: Introducing ssa properties to langauges with multi-level pointers. In CASCON '96: Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research, page 23. IBM Press, 1996.
[28]
}}C. Lattner, A. Lenharth, and V. Adve. Making context-sensitive points-to analysis with heap cloning practical for the real world. In PLDI '07: Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pages 278--289, 2007.
[29]
}}K. Lee and S. P. Midkiff. A two-phase escape analysis for parallel java programs. In PACT '06: Proceedings of the 15th international conference on Parallel architectures and compilation techniques, pages 53--62, 2006.
[30]
}}A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine transforms. In POPL '97: Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 201--214, New York, NY, USA, 1997. ACM.
[31]
}}A message passing interface standard, version 2.2. http://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf, Sept. 2009.
[32]
}}D. Novillo. Memory SSA - a unified approach for sparsely representing memory operations. http://www.airs.com/dnovillo/Papers/mem-ssa.pdf.
[33]
}}G. Ottoni, R. Rangan, A. Stoler, and D. I. August. Automatic thread extraction with decoupled software pipelining. In Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, pages 105--118, 2005.
[34]
}}P. Peterson and D. A. Padua. Dynamic dependence analysis: A novel method for data dependence evaluation. In Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing, pages 64--81, 1993.
[35]
}}E. Raman, G. Ottoni, A. Raman, M. J. Bridges, and D. I. August. Parallel-stage decoupled software pipelining. In CGO '08: Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization, pages 114--123, 2008.
[36]
}}L. Rauchwerger, N. M. Amato, and D. A. Padua. Run-time methods for parallelizing partially parallel loops. In ICS '95: Proceedings of the 9th international conference on Supercomputing, pages 137--146, 1995.
[37]
}}L. Rauchwerger, F. Arzu, and K. Ouchi. Standard templates adaptive parallel library (STAPL). In LCR '98: Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers, pages 402--409, 1998.
[38]
}}S. Rul, H. Vandierendonck, and K. De Bosschere. Extracting coarse-grain parallelism from sequential programs. In International Conference on Principles and Practices of Parallel Programming, pages 281--282, Feb. 2008.
[39]
}}S. Rus, M. Pennings, and L. Rauchwerger. Sensitivity analysis for automatic parallelization on multi-cores. In ICS '07: Proceedings of the 21st annual international conference on Supercomputing, pages 263--273, New York, NY, USA, 2007. ACM.
[40]
}}V. A. Saraswat, V. Sarkar, and C. von Praun. X10: concurrent programming for modern architectures. In PPoPP '07: Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 271--271, 2007.
[41]
}}W. Thies, V. Chandrasekhar, and S. Amarasinghe. A practical approach to exploiting coarse-grained pipeline parallelism in C programs. In MICRO '07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pages 356--369, 2007.
[42]
}}C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In MICRO '08: Proceedings of the 2008 41st IEEE/ACM International Symposium on Microarchitecture, pages 330--341, 2008.
[43]
}}G. Tournavitis, Z. Wang, B. Franke, and M. O'Boyle. Towards a holistic approach to auto-parallelization: Integrating profile-driven parallelism detection and machine-learning based mapping. In PLDI '09: Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation, pages 177--187, jun 2009.
[44]
}}P. Tu and D. A. Padua. Automatic array privatization. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing, pages 500--521, London, UK, 1994. Springer-Verlag.
[45]
}}H. Vandierendonck, S. Rul, and K. De Bosschere. Factoring out ordered sections to expose thread-level parallelism. In Workshop on Parallel Execution of Sequential Programs on Multicore Architectures, page 8, June 2009.

Cited By

View all
  • (2024)Transparent Multicore Scaling of Single-Threaded Network FunctionsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629591(1142-1159)Online publication date: 22-Apr-2024
  • (2023)Automation of Programming for Promising High-Performance Computing SystemsParallel Computing Technologies10.1007/978-3-031-41673-6_1(3-17)Online publication date: 15-Aug-2023
  • (2022)An API for Analysing and Classifying Data Dependence in View of ParallelismProceedings of the 10th International Conference on Computer and Communications Management10.1145/3556223.3556232(61-67)Online publication date: 29-Jul-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques
September 2010
596 pages
ISBN:9781450301787
DOI:10.1145/1854273
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 September 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. semantic annotation
  2. semi-automatic parallelization

Qualifiers

  • Research-article

Conference

PACT '10
Sponsor:
  • IFIP WG 10.3
  • IEEE CS TCPP
  • SIGARCH
  • IEEE CS TCAA

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)2
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Transparent Multicore Scaling of Single-Threaded Network FunctionsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629591(1142-1159)Online publication date: 22-Apr-2024
  • (2023)Automation of Programming for Promising High-Performance Computing SystemsParallel Computing Technologies10.1007/978-3-031-41673-6_1(3-17)Online publication date: 15-Aug-2023
  • (2022)An API for Analysing and Classifying Data Dependence in View of ParallelismProceedings of the 10th International Conference on Computer and Communications Management10.1145/3556223.3556232(61-67)Online publication date: 29-Jul-2022
  • (2022)A Survey of Performance Tuning Techniques and Tools for Parallel ApplicationsIEEE Access10.1109/ACCESS.2022.314784610(15036-15055)Online publication date: 2022
  • (2021)Loop parallelization using dynamic commutativity analysisProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370319(150-161)Online publication date: 27-Feb-2021
  • (2020)Benanza: Automatic μBenchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS47924.2020.00053(440-450)Online publication date: May-2020
  • (2020)LLVM Based Parallelization of C Programs for GPUSupercomputing10.1007/978-3-030-64616-5_38(436-448)Online publication date: 4-Dec-2020
  • (2019)Building a Polyhedral Representation from an Instrumented ExecutionACM Transactions on Architecture and Code Optimization10.1145/336378516:4(1-26)Online publication date: 17-Dec-2019
  • (2019)LernaACM Transactions on Storage10.1145/331036815:1(1-24)Online publication date: 22-Mar-2019
  • (2019)Data-flow/dependence profiling for structured transformationsProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3295737(173-185)Online publication date: 16-Feb-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media