More Web Proxy on the site http://driver.im/

research-article

The Paralax infrastructure: automatic parallelization with a helping hand

Authors:

Hans Vandierendonck,

Koen De BosschereAuthors Info & Claims

PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques

Pages 389 - 400

https://doi.org/10.1145/1854273.1854322

Published: 11 September 2010 Publication History

Abstract

Speeding up sequential programs on multicores is a challenging problem that is in urgent need of a solution. Automatic parallelization of irregular pointer-intensive codes, exemplified by the SPECint codes, is a very hard problem. This paper shows that, with a helping hand, such auto-parallelization is possible and fruitful.

This paper makes the following contributions: (i) A compiler-framework for extracting pipeline-like parallelism from outer program loops is presented. (ii) Using a light-weight programming model based on annotations, the programmer helps the compiler to find thread-level parallelism. Each of the annotations specifies only a small piece of semantic information that compiler analysis misses, e.g. stating that a variable is dead at a certain program point. The annotations are designed such that correctness is easily verified. Furthermore, we present a tool for suggesting annotations to the programmer. (iii) The methodology is applied to auto-parallelize several SPECint benchmarks. For the benchmark with most parallelism (hmmer), we obtain a scalable 7-fold speedup on an AMD quad-core dual processor.

The annotations constitute a parallel programming model that relies extensively on a sequential program representation. Hereby, the complexity of debugging is not increased and it does not obscure the source code. These properties could prove valuable to increase the efficiency of parallel programming.

References

[1]

}}J. C. Adams,W. S. Brainerd, J. T.Martin, B. T. Smith, and J. L. Wagener. Fortran 90 handbook: complete ANSI/ISO reference. Intertext Publications, Inc.,/McGraw-Hill, Inc., New York, NY, USA, 1992.

Digital Library

[2]

}}R. Allen and K. Kennedy. Automatic translation of Fortran programs to vector form. ACM Trans. Program. Lang. Syst., 9(4):491--542, 1987.

Digital Library

[3]

}}T. Brandes, S. Chaumette, M. C. Counilh, J. Roman, A. Darte, F. Desprez, and J. C. Mignot. Hpfit: a set of integrated tools for the parallelization of applications using high performance fortran. part I: HPFIT and the TransTOOL environment. Parallel Comput., 23(1-2):71--87, 1997.

Digital Library

[4]

}}M. Bridges, N. Vachharajani, Y. Zhang, T. Jablin, and D. August. Revisiting the sequential programming model for multi-core. In MICRO '07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pages 69--84, 2007.

Digital Library

[5]

}}M. Burke and R. Cytron. Interprocedural dependence analysis and parallelization. SIGPLAN Not., 21(7):162--175, 1986.

Digital Library

[6]

}}D. R. Butenhof. Programming with POSIX threads. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1997.

Digital Library

[7]

}}L. Ceze, J. Tuck, J. Torrellas, and C. Cascaval. Bulk disambiguation of speculative threads in multiprocessors. In ISCA '06: Proceedings of the 33rd annual international symposium on Computer Architecture, pages 227--238, 2006.

Digital Library

[8]

}}M. K. Chen and K. Olukotun. The Jrpm system for dynamically parallelizing Java programs. In ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture, pages 434--446, 2003.

Digital Library

[9]

}}F. Chow, S. Chan, S.-M. Liu, R. Lo, and M. Streich. Effective representation of aliases and indirect memory operations in ssa form. In CC '96: Proceedings of the 6th International Conference on Compiler Construction, pages 253--267, 1996.

Digital Library

[10]

}}C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In PLDI '07: Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pages 223--234, 2007.

Digital Library

[11]

}}K.-F. Faxén, K. Popov, S. Jansson, and L. Albertsson. Embla: Data dependence profiling for parallel programming. In 2008 International Conference on Complex, Intelligent and Software Intensive Systems, pages 780--875, 2008.

Digital Library

[12]

}}J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst., 9(3):319--349, 1987.

Digital Library

[13]

}}S. Fink, K. Knobe, and V. Sarkar. Unified analysis of array and object references in strongly typed languages. In SAS '00: Proceedings of the 7th International Symposium on Static Analysis, pages 155--174, 2000.

Digital Library

[14]

}}M. Frigo, C. E. Leierson, and K. H. Randall. The implementation of the Cilk-5 multi-threaded language. In PLDI '98: Proceedings of the 1998 ACM SIGPLAN conference on Programming language design and implementation, pages 212--223, 1998.

Digital Library

[15]

}}M. I. Gordon, W. Thies, M. Karczmarek, J. Lin, A. S. Meli, A. A. Lamb, C. Leger, J. Wong, H. Hoffmann, D. Maze, and S. Amarasinghe. A stream compiler for communication-exposed architectures. In ASPLOS-X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems, pages 291--303, 2002.

Digital Library

[16]

}}S. Z. Guyer and C. Lin. An annotation language for optimizing software libraries. In DSL'99: Proceedings of the 2nd conference on Conference on Domain-Specific Languages, pages 4--4, 1999.

Digital Library

[17]

}}L. Hammond, M. Willey, and K. Olukotun. Data speculation support for a chip multiprocessor. In ASPLOS-VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, pages 58--69, 1998.

Digital Library

[18]

}}P. Husbands, C. Iancu, and K. Yelick. A performance analysis of the berkeley upc compiler. In ICS '03: Proceedings of the 17th annual international conference on Supercomputing, pages 63--73, 2003.

Digital Library

[19]

}}W.-M. e. Hwu. Implicitly parallel programming models for thousand-core microprocessors. In Design Automation Conference (DAC-44), 2007.

Digital Library

[20]

}}F. Irigoin, P. Jouvelot, and R. Triolet. Semantical interprocedural parallelization: an overview of the PIPS project. In ICS '91: Proceedings of the 5th international conference on Supercomputing, pages 244--251, 1991.

Digital Library

[21]

}}M. Ishihara, H. Honda, and M. Sato. Development and implementation of an interactive parallelization assistance tool for OpenMP: iPat/OMP. IEICE - Trans. Inf. Syst., E89-D(2):399--407, 2006.

Digital Library

[22]

}}K. Kennedy and J. R. Allen. Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2002.

Digital Library

[23]

}}K. Kennedy, K. S. McKinley, and C. W. Tseng. Interactive parallel programming using the parascope editor. IEEE Trans. Parallel Distrib. Syst., 2(3):329--341, 1991.

Digital Library

[24]

}}K. Knobe and V. Sarkar. Array SSA and its use in parallelization. In 25th Annual Symposium on the Principles of Programming Languages, pages 107--120, Jan. 1998.

Digital Library

[25]

}}M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In PLDI '07: Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pages 211--222, 2007.

Digital Library

[26]

}}M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In PLDI '07: Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pages 211--222, 2007.

Digital Library

[27]

}}C. Lapkowski and L. Hendren. Extended ssa numbering: Introducing ssa properties to langauges with multi-level pointers. In CASCON '96: Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research, page 23. IBM Press, 1996.

Digital Library

[28]

}}C. Lattner, A. Lenharth, and V. Adve. Making context-sensitive points-to analysis with heap cloning practical for the real world. In PLDI '07: Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pages 278--289, 2007.

Digital Library

[29]

}}K. Lee and S. P. Midkiff. A two-phase escape analysis for parallel java programs. In PACT '06: Proceedings of the 15th international conference on Parallel architectures and compilation techniques, pages 53--62, 2006.

Digital Library

[30]

}}A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine transforms. In POPL '97: Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 201--214, New York, NY, USA, 1997. ACM.

Digital Library

[31]

}}A message passing interface standard, version 2.2. http://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf, Sept. 2009.

[32]

}}D. Novillo. Memory SSA - a unified approach for sparsely representing memory operations. http://www.airs.com/dnovillo/Papers/mem-ssa.pdf.

[33]

}}G. Ottoni, R. Rangan, A. Stoler, and D. I. August. Automatic thread extraction with decoupled software pipelining. In Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, pages 105--118, 2005.

Digital Library

[34]

}}P. Peterson and D. A. Padua. Dynamic dependence analysis: A novel method for data dependence evaluation. In Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing, pages 64--81, 1993.

Digital Library

[35]

}}E. Raman, G. Ottoni, A. Raman, M. J. Bridges, and D. I. August. Parallel-stage decoupled software pipelining. In CGO '08: Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization, pages 114--123, 2008.

Digital Library

[36]

}}L. Rauchwerger, N. M. Amato, and D. A. Padua. Run-time methods for parallelizing partially parallel loops. In ICS '95: Proceedings of the 9th international conference on Supercomputing, pages 137--146, 1995.

Digital Library

[37]

}}L. Rauchwerger, F. Arzu, and K. Ouchi. Standard templates adaptive parallel library (STAPL). In LCR '98: Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers, pages 402--409, 1998.

Digital Library

[38]

}}S. Rul, H. Vandierendonck, and K. De Bosschere. Extracting coarse-grain parallelism from sequential programs. In International Conference on Principles and Practices of Parallel Programming, pages 281--282, Feb. 2008.

Digital Library

[39]

}}S. Rus, M. Pennings, and L. Rauchwerger. Sensitivity analysis for automatic parallelization on multi-cores. In ICS '07: Proceedings of the 21st annual international conference on Supercomputing, pages 263--273, New York, NY, USA, 2007. ACM.

Digital Library

[40]

}}V. A. Saraswat, V. Sarkar, and C. von Praun. X10: concurrent programming for modern architectures. In PPoPP '07: Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 271--271, 2007.

Digital Library

[41]

}}W. Thies, V. Chandrasekhar, and S. Amarasinghe. A practical approach to exploiting coarse-grained pipeline parallelism in C programs. In MICRO '07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pages 356--369, 2007.

Digital Library

[42]

}}C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In MICRO '08: Proceedings of the 2008 41st IEEE/ACM International Symposium on Microarchitecture, pages 330--341, 2008.

Digital Library

[43]

}}G. Tournavitis, Z. Wang, B. Franke, and M. O'Boyle. Towards a holistic approach to auto-parallelization: Integrating profile-driven parallelism detection and machine-learning based mapping. In PLDI '09: Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation, pages 177--187, jun 2009.

Digital Library

[44]

}}P. Tu and D. A. Padua. Automatic array privatization. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing, pages 500--521, London, UK, 1994. Springer-Verlag.

Digital Library

[45]

}}H. Vandierendonck, S. Rul, and K. De Bosschere. Factoring out ordered sections to expose thread-level parallelism. In Workshop on Parallel Execution of Sequential Programs on Multicore Architectures, page 8, June 2009.

Cited By

Yan LPan YZhou DCandea GKashyap S(2024)Transparent Multicore Scaling of Single-Threaded Network FunctionsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629591(1142-1159)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3629591
Bakhtin VZakharov DKataev NKolganov AYakobovskiy M(2023)Automation of Programming for Promising High-Performance Computing SystemsParallel Computing Technologies10.1007/978-3-031-41673-6_1(3-17)Online publication date: 15-Aug-2023
https://doi.org/10.1007/978-3-031-41673-6_1
Midolo ATramontana E(2022)An API for Analysing and Classifying Data Dependence in View of ParallelismProceedings of the 10th International Conference on Computer and Communications Management10.1145/3556223.3556232(61-67)Online publication date: 29-Jul-2022
https://dl.acm.org/doi/10.1145/3556223.3556232
Show More Cited By

Index Terms

The Paralax infrastructure: automatic parallelization with a helping hand
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

Annotating abstract anaphora

In this paper, we present first results from annotating abstract (discourse-deictic) anaphora in German. Our annotation guidelines provide linguistic tests for identifying the antecedent, and for determining the semantic types of both the antecedent and ...
Cross-Evaluation of Entity Linking and Disambiguation Systems for Clinical Text Annotation
SEMANTiCS 2016: Proceedings of the 12th International Conference on Semantic Systems

In this paper we study whether state-of-the-art techniques for multi-domain and multilingual entity linking can be ported to the clinical domain. To do so, we compare two known entity linking systems, BabelFly and TagMe, that leverage on Wikipedia and ...
Named Entity Disambiguation for Resource-Poor Languages
ESAIR '15: Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval

Named entity disambiguation (NED) is the task of linking ambiguous names in natural language text to canonical entities like people, organizations or places, registered in a knowledge base. The problem is well-studied for English text, but few systems ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques

September 2010

596 pages

ISBN:9781450301787

DOI:10.1145/1854273

General Chair:
Valentina Salapura
IBM TJ Watson Research Center
,
Program Chairs:
Michael Gschwind
IBM Systems & Technology Group
,
Jens Knoop
Technische Universität Wien

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IFIP WG 10.3: IFIP working group 10.3 on concurrent systems
IEEE CS TCPP: IEEE-CS technical committee on parallel processing
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE CS TCAA: IEEE CS technical committee on architectural acoustics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 September 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PACT '10

Sponsor:

IFIP WG 10.3
IEEE CS TCPP
SIGARCH
IEEE CS TCAA

PACT '10: International Conference on Parallel Architectures and Compilation Techniques

September 11 - 15, 2010

Vienna, Austria

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

67
Total Citations
View Citations
642
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)2

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yan LPan YZhou DCandea GKashyap S(2024)Transparent Multicore Scaling of Single-Threaded Network FunctionsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629591(1142-1159)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3629591
Bakhtin VZakharov DKataev NKolganov AYakobovskiy M(2023)Automation of Programming for Promising High-Performance Computing SystemsParallel Computing Technologies10.1007/978-3-031-41673-6_1(3-17)Online publication date: 15-Aug-2023
https://doi.org/10.1007/978-3-031-41673-6_1
Midolo ATramontana E(2022)An API for Analysing and Classifying Data Dependence in View of ParallelismProceedings of the 10th International Conference on Computer and Communications Management10.1145/3556223.3556232(61-67)Online publication date: 29-Jul-2022
https://dl.acm.org/doi/10.1145/3556223.3556232
Mustafa D(2022)A Survey of Performance Tuning Techniques and Tools for Parallel ApplicationsIEEE Access10.1109/ACCESS.2022.314784610(15036-15055)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3147846
Vasiladiotis CLozano RCole MFranke BLee J(2021)Loop parallelization using dynamic commutativity analysisProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370319(150-161)Online publication date: 27-Feb-2021
https://dl.acm.org/doi/10.1109/CGO51591.2021.9370319
Li CDakkak AXiong JHwu W(2020)Benanza: Automatic μBenchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS47924.2020.00053(440-450)Online publication date: May-2020
https://doi.org/10.1109/IPDPS47924.2020.00053
Kataev N(2020)LLVM Based Parallelization of C Programs for GPUSupercomputing10.1007/978-3-030-64616-5_38(436-448)Online publication date: 4-Dec-2020
https://doi.org/10.1007/978-3-030-64616-5_38
Selva MGruber FSampaio DGuillon CPouchet LRastello F(2019)Building a Polyhedral Representation from an Instrumented ExecutionACM Transactions on Architecture and Code Optimization10.1145/336378516:4(1-26)Online publication date: 17-Dec-2019
https://dl.acm.org/doi/10.1145/3363785
Saad MPalmieri RRavindran B(2019)LernaACM Transactions on Storage10.1145/331036815:1(1-24)Online publication date: 22-Mar-2019
https://dl.acm.org/doi/10.1145/3310368
Gruber FSelva MSampaio DGuillon CMoynault APouchet LRastello FHollingsworth JKeidar I(2019)Data-flow/dependence profiling for structured transformationsProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3295737(173-185)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.1145/3293883.3295737
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents