[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/335231.335239acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article
Free access

A compiler method for the parallel execution of irregular reductions in scalable shared memory multiprocessors

Published: 08 May 2000 Publication History

Abstract

This paper presents a new parallelization method for reductions of arrays with subscripted subscripts on scalable shared memory multiprocessors. The mapping of computations is based on grouping reduction loop iterations into sets that are further assigned to the cooperating threads of computation. Iterations belonging to the same set are chosen in such a way that update different entries in the reduction array. That is, the loop distribution implies a conflict-free write distribution of the reduction array. The iteration sets are set up by building a loop-index prefetching data structure that allows to reorder properly the loop iterations. The proposed method is general, scalable, and easy to implement on a compiler. In addition it deals in a uniform way with one and multiple subscript arrays. In case of multiple indirection arrays, writes on the reduction array affecting different sets are solved by defining conflict-free supersets. A performance evaluation is presented. From the experimental results and performance analysis, the proposed method appears as a clear alternative to the array expansion and privatized buffer techniques, used on state-of-the-art parallelizing compilers, like Polaris or SUIF. The scalability problem that those techniques exhibit is missing in our method, as the memory overhead presented does not depend on the number of processors.

References

[1]
R. Asenjo, E. Gutierrez, Y. Lin, D. Padua, B. Pottengerg, and E. Zapata. On the automatic parallelization of sparse and irregular Fortran codes. Technical Report TR-1512, University for Illinois at Urbana-Champalgn. Center for Supercomputing R&D, December 1996.]]
[2]
W. Blume, R. Doallo, R. Eigemann, J. Grout, J. Hoeflinger, T. Lawrence, J. Lee, D. Padua, Y. Pack, B. Pottenger, L. R.auchwerger, and P. Tu. Parallel programming with Polaris. IEEE Computer, 29(12):78-82, December 1996.]]
[3]
B. Brooks, K. Bruccoleri, B. Olafson, D. States, S. Swaminathan, and M. Karplus. Charmm: A program for macromolecular energy, minimization and dynamics calculations. J. Computational Chemistry, 4:217-183, 1983.]]
[4]
C. Ding and K. Kennedy. Improving cache performance of dynaic applications with computation and data layout transformations. In ACM Int'l. Conf. on Programming Languages Design and Implementation (PLDI'99), pages 229-241, Atlanta, GA, May 1999.]]
[5]
I. Foster, R. Schreiber, and P. Havlak. HPF-2, scope of activities and motivating applications. Technical Report CRPC-TR94492, Center for Research on Parallel Computation, PAce University, November 1994.]]
[6]
M. Hall, S. Amarasinghe, B. Murphy, S. Liao, and M. Lain. Detecting coarse-grain parallelism using an interprocedural parallelizing compiler. In IEEE Supercomputing'95, San Diego, CA, December 1995.]]
[7]
M. Hall, J. Anderson, S. Amarasinghe, B. Murphy, S.-W. Liao, E. Bugnion, and M. S. Lain. Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer, 29(12), December 1996.]]
[8]
H. Han and C.-W. Tseng. Improving compiler and run-time support for irregular reductions. In 11th Workshop on Languages and Compilers for Parallel Computing (LCPC'98), Chaperl Hill, NC, August 1998.]]
[9]
R. Hanxleden. Compiler support for machine independent parallelization of irregular problems. Technical Report CR.PC-TR.92301-S, Center for Research on Parallel Computation, Rice University, November 1992.]]
[10]
High Performance Fortran Forum. High Performance Fortran Language Specification, Version 2.0, 1996.]]
[11]
J. Ku. The design of an efficient and portable interface between a parallelizing compiler and its target machine. Technical report, Master's Thesis, Univ. of Illinois at Urbana-Champaign, Center for Supercomputing R&D, 1995.]]
[12]
Y. Lin and D. Padua. On the automatic parallelization of sparse and irregular Fortran programs. In 4th Workshop. on Languages, Compilers and Runtime Systems for Sealable Computers (LCR'98), Pittsburgh, PA, May 1998.]]
[13]
N. Mitchell, L. Carter, and J. Ferrante. Localizing non-affine array references. In Int'l. Conf. on Parallel Architectures and Compilation Techniques (PACT'99), Newport Beach, CA, October 1999.]]
[14]
J. Morales and S. Toxvaerd. The ceU-neighbour table method in moleimlar dynamics simulations. Computer Physics Communications, 71:71-76, 1992.]]
[15]
OpenMP Architecture Re,clew Board. OpenMP: A Proposed lndustry Standard API for Shared Memory Programming, 1997.]]
[16]
R. Ponnusamy, J. Saltz, A. Choudhary, S. Hwang, and G. Fox. Runtime support and compilation methods for user-specified data distributions. IEEE Trans. on Parallel and Distributed Systems, 6(8):815-831, June 1995.]]
[17]
B. Pottenger and K. Eigenmann. Idiom recognition in the Polaris parallelizing compiler. In 9th A CM Int'l Conf. on Supercomputin9, pages 444-448, Barcelona, Spain, July 1995.]]
[18]
Silicon Graphics, Inc. MIPSpro Fortran77 Programmer's Guide, 1994.]]
[19]
Silicon Graphics, Inc. MIPSpro Automatic Parallelization, 1998.]]
[20]
S. Toxvacrd. Algorithms for canonical molecular dynamics simulations. Molecular Physics, 72(1):159-168, 1991.]]

Cited By

View all
  • (2023)Time series analysis acceleration with advanced vectorization extensionsThe Journal of Supercomputing10.1007/s11227-023-05060-279:9(10178-10207)Online publication date: 2-Feb-2023
  • (2022)Exploiting Vector Extennsions to Accelerate Time Series Analysis2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)10.1109/PDP55904.2022.00017(55-62)Online publication date: Mar-2022
  • (2020)Compile-time Parallelization of Subscripted Subscript Patterns2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW50202.2020.00065(317-325)Online publication date: May-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '00: Proceedings of the 14th international conference on Supercomputing
May 2000
347 pages
ISBN:1581132700
DOI:10.1145/335231
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 May 2000

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ICS00
Sponsor:
ICS00: International Conference on Supercomputing
May 8 - 11, 2000
New Mexico, Santa Fe, USA

Acceptance Rates

ICS '00 Paper Acceptance Rate 33 of 122 submissions, 27%;
Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)60
  • Downloads (Last 6 weeks)7
Reflects downloads up to 18 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Time series analysis acceleration with advanced vectorization extensionsThe Journal of Supercomputing10.1007/s11227-023-05060-279:9(10178-10207)Online publication date: 2-Feb-2023
  • (2022)Exploiting Vector Extennsions to Accelerate Time Series Analysis2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)10.1109/PDP55904.2022.00017(55-62)Online publication date: Mar-2022
  • (2020)Compile-time Parallelization of Subscripted Subscript Patterns2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW50202.2020.00065(317-325)Online publication date: May-2020
  • (2019)Accelerating time series motif discovery in the Intel Xeon Phi KNL processorThe Journal of Supercomputing10.1007/s11227-019-02923-5Online publication date: 10-Jun-2019
  • (2018)Automatic Matching of Legacy Code to Heterogeneous APIsACM SIGPLAN Notices10.1145/3296957.317318253:2(139-153)Online publication date: 19-Mar-2018
  • (2018)Automatic Matching of Legacy Code to Heterogeneous APIsProceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3173162.3173182(139-153)Online publication date: 19-Mar-2018
  • (2017)Discovery and exploitation of general reductions: a constraint based approachProceedings of the 2017 International Symposium on Code Generation and Optimization10.5555/3049832.3049862(269-280)Online publication date: 4-Feb-2017
  • (2017)Discovery and exploitation of general reductions: A constraint based approach2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO.2017.7863746(269-280)Online publication date: Feb-2017
  • (2017)ReduxSTM: Optimizing STM designs for Irregular ApplicationsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2017.04.009107(114-133)Online publication date: Sep-2017
  • (2014)Effective Transactional Memory Execution Management for Improved ConcurrencyACM Transactions on Architecture and Code Optimization10.1145/263304811:3(1-27)Online publication date: 13-Aug-2014
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media