More Web Proxy on the site http://driver.im/

Article

Free access

A compiler method for the parallel execution of irregular reductions in scalable shared memory multiprocessors

Authors:

E. L. ZapataAuthors Info & Claims

ICS '00: Proceedings of the 14th international conference on Supercomputing

Pages 78 - 87

https://doi.org/10.1145/335231.335239

Published: 08 May 2000 Publication History

Abstract

This paper presents a new parallelization method for reductions of arrays with subscripted subscripts on scalable shared memory multiprocessors. The mapping of computations is based on grouping reduction loop iterations into sets that are further assigned to the cooperating threads of computation. Iterations belonging to the same set are chosen in such a way that update different entries in the reduction array. That is, the loop distribution implies a conflict-free write distribution of the reduction array. The iteration sets are set up by building a loop-index prefetching data structure that allows to reorder properly the loop iterations. The proposed method is general, scalable, and easy to implement on a compiler. In addition it deals in a uniform way with one and multiple subscript arrays. In case of multiple indirection arrays, writes on the reduction array affecting different sets are solved by defining conflict-free supersets. A performance evaluation is presented. From the experimental results and performance analysis, the proposed method appears as a clear alternative to the array expansion and privatized buffer techniques, used on state-of-the-art parallelizing compilers, like Polaris or SUIF. The scalability problem that those techniques exhibit is missing in our method, as the memory overhead presented does not depend on the number of processors.

References

[1]

R. Asenjo, E. Gutierrez, Y. Lin, D. Padua, B. Pottengerg, and E. Zapata. On the automatic parallelization of sparse and irregular Fortran codes. Technical Report TR-1512, University for Illinois at Urbana-Champalgn. Center for Supercomputing R&D, December 1996.]]

[2]

W. Blume, R. Doallo, R. Eigemann, J. Grout, J. Hoeflinger, T. Lawrence, J. Lee, D. Padua, Y. Pack, B. Pottenger, L. R.auchwerger, and P. Tu. Parallel programming with Polaris. IEEE Computer, 29(12):78-82, December 1996.]]

Digital Library

[3]

B. Brooks, K. Bruccoleri, B. Olafson, D. States, S. Swaminathan, and M. Karplus. Charmm: A program for macromolecular energy, minimization and dynamics calculations. J. Computational Chemistry, 4:217-183, 1983.]]

[4]

C. Ding and K. Kennedy. Improving cache performance of dynaic applications with computation and data layout transformations. In ACM Int'l. Conf. on Programming Languages Design and Implementation (PLDI'99), pages 229-241, Atlanta, GA, May 1999.]]

Digital Library

[5]

I. Foster, R. Schreiber, and P. Havlak. HPF-2, scope of activities and motivating applications. Technical Report CRPC-TR94492, Center for Research on Parallel Computation, PAce University, November 1994.]]

[6]

M. Hall, S. Amarasinghe, B. Murphy, S. Liao, and M. Lain. Detecting coarse-grain parallelism using an interprocedural parallelizing compiler. In IEEE Supercomputing'95, San Diego, CA, December 1995.]]

Digital Library

[7]

M. Hall, J. Anderson, S. Amarasinghe, B. Murphy, S.-W. Liao, E. Bugnion, and M. S. Lain. Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer, 29(12), December 1996.]]

Digital Library

[8]

H. Han and C.-W. Tseng. Improving compiler and run-time support for irregular reductions. In 11th Workshop on Languages and Compilers for Parallel Computing (LCPC'98), Chaperl Hill, NC, August 1998.]]

Digital Library

[9]

R. Hanxleden. Compiler support for machine independent parallelization of irregular problems. Technical Report CR.PC-TR.92301-S, Center for Research on Parallel Computation, Rice University, November 1992.]]

[10]

High Performance Fortran Forum. High Performance Fortran Language Specification, Version 2.0, 1996.]]

[11]

J. Ku. The design of an efficient and portable interface between a parallelizing compiler and its target machine. Technical report, Master's Thesis, Univ. of Illinois at Urbana-Champaign, Center for Supercomputing R&D, 1995.]]

[12]

Y. Lin and D. Padua. On the automatic parallelization of sparse and irregular Fortran programs. In 4th Workshop. on Languages, Compilers and Runtime Systems for Sealable Computers (LCR'98), Pittsburgh, PA, May 1998.]]

Digital Library

[13]

N. Mitchell, L. Carter, and J. Ferrante. Localizing non-affine array references. In Int'l. Conf. on Parallel Architectures and Compilation Techniques (PACT'99), Newport Beach, CA, October 1999.]]

Digital Library

[14]

J. Morales and S. Toxvaerd. The ceU-neighbour table method in moleimlar dynamics simulations. Computer Physics Communications, 71:71-76, 1992.]]

[15]

OpenMP Architecture Re,clew Board. OpenMP: A Proposed lndustry Standard API for Shared Memory Programming, 1997.]]

[16]

R. Ponnusamy, J. Saltz, A. Choudhary, S. Hwang, and G. Fox. Runtime support and compilation methods for user-specified data distributions. IEEE Trans. on Parallel and Distributed Systems, 6(8):815-831, June 1995.]]

Digital Library

[17]

B. Pottenger and K. Eigenmann. Idiom recognition in the Polaris parallelizing compiler. In 9th A CM Int'l Conf. on Supercomputin9, pages 444-448, Barcelona, Spain, July 1995.]]

Digital Library

[18]

Silicon Graphics, Inc. MIPSpro Fortran77 Programmer's Guide, 1994.]]

[19]

Silicon Graphics, Inc. MIPSpro Automatic Parallelization, 1998.]]

[20]

S. Toxvacrd. Algorithms for canonical molecular dynamics simulations. Molecular Physics, 72(1):159-168, 1991.]]

Cited By

Quislant RFernandez IGutierrez EPlata O(2023)Time series analysis acceleration with advanced vectorization extensionsThe Journal of Supercomputing10.1007/s11227-023-05060-279:9(10178-10207)Online publication date: 2-Feb-2023
https://doi.org/10.1007/s11227-023-05060-2
Quislant RFernandez ISerralvo EGutierrez EPlata O(2022)Exploiting Vector Extennsions to Accelerate Time Series Analysis2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)10.1109/PDP55904.2022.00017(55-62)Online publication date: Mar-2022
https://doi.org/10.1109/PDP55904.2022.00017
Bhosale AEigenmann R(2020)Compile-time Parallelization of Subscripted Subscript Patterns2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW50202.2020.00065(317-325)Online publication date: May-2020
https://doi.org/10.1109/IPDPSW50202.2020.00065
Show More Cited By

Index Terms

A compiler method for the parallel execution of irregular reductions in scalable shared memory multiprocessors

Recommendations

Compiler analysis of irregular memory accesses
PLDI '00: Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation

Irregular array accesses are array accesses whose array subscripts do not have closed-form expressions in terms of loop indices. Traditional array analysis and loop transformation techniques cannot handle irregular array accesses. In this paper, we ...
Optimizing irregular shared-memory applications for distributed-memory systems
PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming

In prior work, we have proposed techniques to extend the ease of shared-memory parallel programming to distributed-memory platforms by automatic translation of OpenMP programs to MPI. In the case of irregular applications, the performance of this ...
On Automatic Parallelization of Irregular Reductions on Scalable Shared Memory Systems
Euro-Par '99: Proceedings of the 5th International Euro-Par Conference on Parallel Processing

This paper presents a new parallelization method for reductions of arrays with subscripted subscripts on scalable shared-memory multiprocessors. The mapping of computations is based on the conflict-free write distribution of the reduction vector across ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '00: Proceedings of the 14th international conference on Supercomputing

May 2000

347 pages

ISBN:1581132700

DOI:10.1145/335231

Chairmen:
John Reynders
Los Alamos National Lab, Los Alamos, NM
,
Alex Veidenbaum
Univ. of California at Irvine, Irvine

Copyright © 2000 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 May 2000

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ICS00

Sponsor:

SIGARCH

ICS00: International Conference on Supercomputing

May 8 - 11, 2000

New Mexico, Santa Fe, USA

Acceptance Rates

ICS '00 Paper Acceptance Rate 33 of 122 submissions, 27%;

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

41
Total Citations
View Citations
435
Total Downloads

Downloads (Last 12 months)60
Downloads (Last 6 weeks)7

Reflects downloads up to 18 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Quislant RFernandez IGutierrez EPlata O(2023)Time series analysis acceleration with advanced vectorization extensionsThe Journal of Supercomputing10.1007/s11227-023-05060-279:9(10178-10207)Online publication date: 2-Feb-2023
https://doi.org/10.1007/s11227-023-05060-2
Quislant RFernandez ISerralvo EGutierrez EPlata O(2022)Exploiting Vector Extennsions to Accelerate Time Series Analysis2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)10.1109/PDP55904.2022.00017(55-62)Online publication date: Mar-2022
https://doi.org/10.1109/PDP55904.2022.00017
Bhosale AEigenmann R(2020)Compile-time Parallelization of Subscripted Subscript Patterns2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW50202.2020.00065(317-325)Online publication date: May-2020
https://doi.org/10.1109/IPDPSW50202.2020.00065
Fernandez IVillegas AGutierrez EPlata O(2019)Accelerating time series motif discovery in the Intel Xeon Phi KNL processorThe Journal of Supercomputing10.1007/s11227-019-02923-5Online publication date: 10-Jun-2019
https://doi.org/10.1007/s11227-019-02923-5
Ginsbach PRemmelg TSteuwer MBodin BDubach CO'Boyle M(2018)Automatic Matching of Legacy Code to Heterogeneous APIsACM SIGPLAN Notices10.1145/3296957.317318253:2(139-153)Online publication date: 19-Mar-2018
https://dl.acm.org/doi/10.1145/3296957.3173182
Ginsbach PRemmelg TSteuwer MBodin BDubach CO'Boyle MShen XTuck JBianchini RSarkar V(2018)Automatic Matching of Legacy Code to Heterogeneous APIsProceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3173162.3173182(139-153)Online publication date: 19-Mar-2018
https://dl.acm.org/doi/10.1145/3173162.3173182
Ginsbach PO'Boyle MReddi VSmith ATang L(2017)Discovery and exploitation of general reductions: a constraint based approachProceedings of the 2017 International Symposium on Code Generation and Optimization10.5555/3049832.3049862(269-280)Online publication date: 4-Feb-2017
https://dl.acm.org/doi/10.5555/3049832.3049862
Ginsbach PO'Boyle M(2017)Discovery and exploitation of general reductions: A constraint based approach2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO.2017.7863746(269-280)Online publication date: Feb-2017
https://doi.org/10.1109/CGO.2017.7863746
Pedrero MGutierrez ERomero SPlata O(2017)ReduxSTM: Optimizing STM designs for Irregular ApplicationsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2017.04.009107(114-133)Online publication date: Sep-2017
https://doi.org/10.1016/j.jpdc.2017.04.009
Gonzalez-Mesa MGutierrez EZapata EPlata O(2014)Effective Transactional Memory Execution Management for Improved ConcurrencyACM Transactions on Architecture and Code Optimization10.1145/263304811:3(1-27)Online publication date: 13-Aug-2014
https://dl.acm.org/doi/10.1145/2633048
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents