Article

Free access

Automatic parallelization for symmetric shared-memory multiprocessors

Authors:

Jyh-Herng Chow,

Leonard E. Lyon,

Vivek SarkarAuthors Info & Claims

CASCON '96: Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research

Page 5

Published: 12 November 1996 Publication History

PDF eReader

Abstract

The trend in workstation hardware is towards symmetric shared-memory multiprocessors (SMPs). User expectations are for (largely) automatic exploitation of parallelism on an SMP, similar to automatic exploitation of modern processor features such as caches and instruction scheduling.In this paper, we present our solution to automatic SMP parallelization. Our solution is unique in its robust support for unbalanced processor loads and nesting of parallel loops and parallel sections, in conjunction with its tight integration with high-order transformations for improved uniprocessor performance, so that the speedup due to parallelism is truly a multiplicative speedup over highly optimized uniprocessor execution times.

References

[1]

{1} AIX Version 4.1 General Programming Concepts: Writing and Debugging Programs, 1994.

Google Scholar

[2]

{2} Jeanne Ferrante, Vivek Sarkar, and Wendy Thrash. On Estimating and Enhancing Cache Effectiveness. Lecture Notes in Computer Science, (589):328-343, 1991. Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing, Santa Clara, California, USA, August 1991. Edited by U. Banerjee, D. Gelernter, A. Nicolau, D. Padua.

Digital Library

Google Scholar

[3]

{3} Susan Flynn Hummel and Edith Schonberg. Low-Overhead Scheduling of Nested Parallelism. IBM Journal of Research and Development, 1991.

Digital Library

Google Scholar

[4]

{4} Susan Flynn Hummel, Edith Schonberg, and Lawrence Flynn. Factoring: A Practical and Robust Method for Scheduling Parallel Loops. Supercomputing 91, Nov. 1991.

Digital Library

Google Scholar

[5]

{5} Clyde Kruskal and Alan Weiss. Allocating Independent Subtasks on Parallel Processors. IEEE Transactions on Software Engineering, SE-11(10), October 1985.

Digital Library

Google Scholar

[6]

{6} Jim Q. Ning, Andre Engberts, and W. Kozaczynski. Automated Support for Legacy Code Understanding. Communications of ACM, 5(37):50-57, 1994.

Digital Library

Google Scholar

[7]

{7} Constantine D. Polychronopoulos and David J. Kuck. Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers. IEEE Transactions on Computers, C-36(12), December 1987.

Digital Library

Google Scholar

[8]

{8} IBM Shared Memory System POWER/4 User's Guide and Technical Reference, 1993.

Google Scholar

[9]

{9} Vivek Sarkar. Determining Average Program Execution Times and their Variance. Proceedings of the 1989 SIGPLAN Conference on Programming Language Design and Implementation, 24(7):298-312, July 1989.

Digital Library

Google Scholar

[10]

{10} Vivek Sarkar. Automatic Partitioning of a Program Dependence Graph into Parallel Tasks. IBM Journal of Research and Development, 35(5/6), 1991.

Digital Library

Google Scholar

[11]

{11} Vivek Sarkar. Automatic Selection of High Order Transformations in the IBM ASTI Optimizer. Technical Report ADTI-96-004, Application Development Technology Institute, IBM Software Solutions Division, September 1996. Submitted to special issue of IBM Journal of Research and Development.

Digital Library

Google Scholar

[12]

{12} Vivek Sarkar and Radhika Thekkath. A General Framework for Iteration-Reordering Loop Transformations. Proceedings of the ACM SIGPLAN '92 Conference on Programming Language Design and Implementation, pages 175-187, June 1992.

Digital Library

Google Scholar

[13]

{13} IBM VS FORTRAN Version 2: Programming Guide for CMS and MVS, 1993.

Google Scholar

Cited By

View all

Lobeiras JArenaz MHernández OChandrasekaran SFoertter F(2015)Experiences in extending parallware to support OpenACCProceedings of the Second Workshop on Accelerator Programming using Directives10.1145/2832105.2832112(1-12)Online publication date: 15-Nov-2015
https://dl.acm.org/doi/10.1145/2832105.2832112
Zhang YKandemir MPitsianis NSun XChakraborty SHalbwachs N(2009)Exploring parallelization strategies for NUFFT data translationProceedings of the seventh ACM international conference on Embedded software10.1145/1629335.1629361(187-196)Online publication date: 12-Oct-2009
https://dl.acm.org/doi/10.1145/1629335.1629361
Teruel XUnnikrishnan PMartorell XAyguadé ESilvera RZhang GTiotto ENg JCouturier CVigder MChechik M(2008)OpenMP tasks in IBM XL compilersProceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds10.1145/1463788.1463810(207-221)Online publication date: 27-Oct-2008
https://dl.acm.org/doi/10.1145/1463788.1463810
Show More Cited By

Index Terms

Automatic parallelization for symmetric shared-memory multiprocessors

Recommendations

Parallelization of NAS benchmarks for shared memory multiprocessors
Abstract
This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high ...
Evaluating automatic parallelization for efficient execution on shared-memory multiprocessors
ICS '94: Proceedings of the 8th international conference on Supercomputing

We present a parallel code generation algorithm for complete applications and a new experimental methodology that tests the efficacy of our approach. The algorithm optimizes for data locality and parallelism, reducing or eliminating false sharing. It ...
Conservative circuit simulation on shared-memory multiprocessors

We investigate conservative parallel discrete event simulations for logical circuits on shared-memory multiprocessors. For a first estimation of the possible speedup, we extend the critical path analysis technique by partitioning strategies. To ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

CASCON '96: Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research

November 1996

504 pages

Publisher

IBM Press

Publication History

Published: 12 November 1996

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 24 of 90 submissions, 27%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
460
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)6

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Lobeiras JArenaz MHernández OChandrasekaran SFoertter F(2015)Experiences in extending parallware to support OpenACCProceedings of the Second Workshop on Accelerator Programming using Directives10.1145/2832105.2832112(1-12)Online publication date: 15-Nov-2015
https://dl.acm.org/doi/10.1145/2832105.2832112
Zhang YKandemir MPitsianis NSun XChakraborty SHalbwachs N(2009)Exploring parallelization strategies for NUFFT data translationProceedings of the seventh ACM international conference on Embedded software10.1145/1629335.1629361(187-196)Online publication date: 12-Oct-2009
https://dl.acm.org/doi/10.1145/1629335.1629361
Teruel XUnnikrishnan PMartorell XAyguadé ESilvera RZhang GTiotto ENg JCouturier CVigder MChechik M(2008)OpenMP tasks in IBM XL compilersProceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds10.1145/1463788.1463810(207-221)Online publication date: 27-Oct-2008
https://dl.acm.org/doi/10.1145/1463788.1463810
Chandraiah PDoemer RLevitan S(2007)Designer-controlled generation of parallel and flexible heterogeneous MPSoC specificationProceedings of the 44th annual Design Automation Conference10.1145/1278480.1278676(787-790)Online publication date: 4-Jun-2007
https://dl.acm.org/doi/10.1145/1278480.1278676
Chow JSarkar V(1997)False Sharing Elimination by Selection of Runtime Scheduling ParametersProceedings of the international Conference on Parallel Processing10.5555/645533.656492(396-403)Online publication date: 11-Aug-1997
https://dl.acm.org/doi/10.5555/645533.656492
Megiddo NSarkar VLeiserson CCuller D(1997)Optimal weighted loop fusion for parallel programsProceedings of the ninth annual ACM symposium on Parallel algorithms and architectures10.1145/258492.258520(282-291)Online publication date: 1-Jun-1997
https://dl.acm.org/doi/10.1145/258492.258520

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Parallelization of NAS benchmarks for shared memory multiprocessors

Evaluating automatic parallelization for efficient execution on shared-memory multiprocessors

Conservative circuit simulation on shared-memory multiprocessors