More Web Proxy on the site http://driver.im/

article

Blocking and array contraction across arbitrarily nested loops using affine partitioning

Authors:

Monica S. LamAuthors Info & Claims

ACM SIGPLAN Notices, Volume 36, Issue 7

Pages 103 - 112

https://doi.org/10.1145/568014.379586

Published: 18 June 2001 Publication History

Abstract

Applicable to arbitrary sequences and nests of loops, affine partitioning is a program transformation framework that unifies many previously proposed loop transformations, including unimodular transforms, fusion, fission, reindexing, scaling and statement reordering. Algorithms based on affine partitioning have been shown to be effective for parallelization and communication minimization. This paper presents algorithms that improve data locality using affine partitioning.

Blocking and array contraction are two important optimizations that have been shown to be useful for data locality. Blocking creates a set of inner loops so that data brought into the faster levels of the memory hierarchy can be reused. Array contraction reduces an array to a scalar variable and thereby reduces the number of memory operations executed and the memory footprint. Loop transforms are often necessary to make blocking and array contraction possible.

By bringing the full generality of affine partitioning to bear on the problem, our locality algorithm can find more contractable arrays than previously possible. This paper also generalizes the concept of blocking and shows that affine partitioning allows the benefits of blocking be realized in arbitrarily nested loops. Experimental results on a number of benchmarks and a complete multigrid application in aeronautics indicates that affine partitioning is effective in practice.

References

[1]

N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In Proceedings of the 2000 ACM International Conference on Supercomputing, pages 141-152, May 2000.]]

Digital Library

[2]

N. Ahmed, N. Mateev, and K. Pingali. Tiling imperfectly-nested loop nests. In Proceedings of Supercomputing 2000, November 2000.]]

Digital Library

[3]

G. Aigner, A. Diwan, D. L. Heine, M. S. Lam, D. L. Moore, B. R. Murphy, and C. Sapuntzakis. An overview of the SUIF2 compiler infrastructure. Technical report, Stanford University, 2000.]]

[4]

D. Bacon, S. Graham, and O. Sharp. Compiler transformations for high-performance computing. Computing Surveys, 26(4):345-420, December 1994.]]

Digital Library

[5]

D. Bailey and J. Barton. The NAS kernel benchmark program. Technical Report 86711, NASA, 1985.]]

[6]

S. Carr, K. S. McKinley, and C.-W. Tseng. Compiler optimizations for improving data locality. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 252-262, October 1994.]]

Digital Library

[7]

P. Feautrier. Some efficient solutions to the affine scheduling problem, part I, one dimensional time. International Journal of Parallel Programming, 21(5):313- 348, October 1992.]]

Digital Library

[8]

G. R. Gao, R. Olsen, V. Sarkar, and R. Thekkath. Collective loop fusion for array contraction. In Proceedings of the Fifth Workshop on Languages and Compilers for Parallel Computing, pages 171-181, August 1992.]]

Digital Library

[9]

G. H. Golub and C. F. Van Loan. Matrix Computations - 2nd Edition. The John Hopkins University Press, Baltimore, Maryland, 1989.]]

Digital Library

[10]

A. Jameson. Solution of the Euler equations by amultigrid method. Applied Mathematics and Computation, 13:327-356, 1983.]]

Digital Library

[11]

K. Kennedy and K. S. McKinley. Optimizing for parallelism and data locality. In Proceedings of the 1992 ACM International Conference on Supercomputing, pages 323-334, July 1992.]]

Digital Library

[12]

K. Kennedy and K. S. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing, pages 301-320. Springer-Verlag, August 1993.]]

Digital Library

[13]

I. Kodukula, N. Ahmed, and K. Pingali. Data-centric multi-level blocking. In Proceedings of the ACM SIG- PLAN '97 Conference on Programming Language Design and Implementation, pages 346-357, June 1997.]]

Digital Library

[14]

E. C. Lewis, C. Lin, and L. Snyder. The implementation and evaluation of fusion and array contraction in array languages. In Proceedings of the ACM SIGPLAN '98 Conference onProgramming Language Design and Implementation, pages 50-59, June 1998.]]

Digital Library

[15]

S.-W. Liao. SUIF Explorer: an Interactive and Interprocedural Paralllelizer. PhD thesis, Stanford University, August 2000. Published as CSL-TR-00-807.]]

Digital Library

[16]

A. W. Lim, G. I. Cheong, and M. S. Lam. An affine partitioning algorithm to maximize parallelism and minimize communication. In Proceedings of the 13th ACM SIGARCH International Conference on Supercomputing, Rhodes, Greece, June 1999.]]

Digital Library

[17]

A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine transforms. In Conference Record of the 24th ACM SIGPLAN- SIGACT Symposium on Principles of Programming Languages, Paris, France, January 1997.]]

Digital Library

[18]

A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Computing, 24(3-4):445-475, May 1998.]]

Digital Library

[19]

W. Pugh and E. Rosser. Iteration space slicing for locality. In Proceedings of the Twelfth Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, August 1999.]]

Digital Library

[20]

V. Sarkar and G. R. Gao. Optimization of array accesses by collective loop transformations. In Proceedings of the 1991 ACM International Conference on Supercomputing, pages 194-204, June 1991.]]

Digital Library

[21]

A. Schrijver. Theory of Linear and Integer Programming. Wiley, Chichester, 1986.]]

Digital Library

[22]

M. E. Wolf. Improving Locality and Parallelism in Nested Loops. PhD thesis, Stanford University, August 1992. Published as CSL-TR-92-538.]]

Digital Library

[23]

M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN '91 Conference onProgramming Language Design and Implementation, pages 30-44, June 1991.]]

Digital Library

[24]

M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company, 1995.]]

Digital Library

Cited By

Wang JSuo QYan DYue C(2023)CustomHalide – A new plugin of clang for loop optimizationProceedings of the 2023 9th International Conference on Computing and Artificial Intelligence10.1145/3594315.3594372(557-567)Online publication date: 17-Mar-2023
https://dl.acm.org/doi/10.1145/3594315.3594372
Tao XPang JXu JZhu Y(2021)Compiler-directed scratchpad memory data transfer optimization for multithreaded applications on a heterogeneous many-core architectureThe Journal of Supercomputing10.1007/s11227-021-03853-xOnline publication date: 15-May-2021
https://doi.org/10.1007/s11227-021-03853-x
Shivam AWatkinson NNicolau APadua DVeidenbaum A(2019)Towards an Achievable Performance for the Loop NestsLanguages and Compilers for Parallel Computing10.1007/978-3-030-34627-0_6(70-77)Online publication date: 13-Nov-2019
https://doi.org/10.1007/978-3-030-34627-0_6
Show More Cited By

Index Terms

Blocking and array contraction across arbitrarily nested loops using affine partitioning
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Concurrent programming structures
2. Theory of computation
  1. Models of computation
    1. Concurrency
      1. Parallel computing models

Recommendations

Blocking and array contraction across arbitrarily nested loops using affine partitioning
PPoPP '01: Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming

Applicable to arbitrary sequences and nests of loops, affine partitioning is a program transformation framework that unifies many previously proposed loop transformations, including unimodular transforms, fusion, fission, reindexing, scaling and ...
Affine and unimodular transformations for non-uniform nested loops
ICCOMP'08: Proceedings of the 12th WSEAS international conference on Computers

Performance improvement in the modern parallel machines needs not only to find sufficient parallelism in a program, but it is also important that we minimize the synchronization and communication overheads in the parallelized program. Parallelizing and ...
Affine-by-Statement Transformations of Imperfectly Nested Loops
IPPS '96: Proceedings of the 10th International Parallel Processing Symposium

A majority of loop restructuring techniques developed so far assume that loops are perfectly nested. The unimodular approach unifies three individual transformations -- loop interchange, skewing and reversal -- but is still limited to perfect loop ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 36, Issue 7

July 2001

143 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/568014

Issue’s Table of Contents

PPoPP '01: Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
June 2001
142 pages
ISBN:1581133464
DOI:10.1145/379539
Chairmen:
Michael Heath
Univ. of Illinois, Illinois, IN
,
Andrew Lumsdaine
Indiana Univ.

Copyright © 2001 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2001

Published in SIGPLAN Volume 36, Issue 7

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

98
Total Citations
View Citations
788
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang JSuo QYan DYue C(2023)CustomHalide – A new plugin of clang for loop optimizationProceedings of the 2023 9th International Conference on Computing and Artificial Intelligence10.1145/3594315.3594372(557-567)Online publication date: 17-Mar-2023
https://dl.acm.org/doi/10.1145/3594315.3594372
Tao XPang JXu JZhu Y(2021)Compiler-directed scratchpad memory data transfer optimization for multithreaded applications on a heterogeneous many-core architectureThe Journal of Supercomputing10.1007/s11227-021-03853-xOnline publication date: 15-May-2021
https://doi.org/10.1007/s11227-021-03853-x
Shivam AWatkinson NNicolau APadua DVeidenbaum A(2019)Towards an Achievable Performance for the Loop NestsLanguages and Compilers for Parallel Computing10.1007/978-3-030-34627-0_6(70-77)Online publication date: 13-Nov-2019
https://doi.org/10.1007/978-3-030-34627-0_6
Kristensen MLund SBlum TAvery JZaks AMendelson BRauchwerger LHwu W(2016)Fusion of Parallel Array OperationsProceedings of the 2016 International Conference on Parallel Architectures and Compilation10.1145/2967938.2967945(71-85)Online publication date: 11-Sep-2016
https://dl.acm.org/doi/10.1145/2967938.2967945
DOHI KOKINA KSOEJIMA RSHIBATA YOGURI K(2015)Performance Modeling of Stencil Computing on a Stream-Based FPGA Accelerator for Efficient Design Space ExplorationIEICE Transactions on Information and Systems10.1587/transinf.2014RCP0013E98.D:2(298-308)Online publication date: 2015
https://doi.org/10.1587/transinf.2014RCP0013
Hong Yao Huifang Deng (2015)An Abelian group model of commutative data dependence relations for the iteration space slicing2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)10.1109/SNPD.2015.7176231(1-6)Online publication date: Jun-2015
https://doi.org/10.1109/SNPD.2015.7176231
Mehta SBeeraka GYew P(2013)Tile size selection revisitedACM Transactions on Architecture and Code Optimization10.1145/2541228.255529210:4(1-27)Online publication date: 1-Dec-2013
https://dl.acm.org/doi/10.1145/2541228.2555292
Ben-Asher YBen-Asher Y(2012)Locality and SynchronizationMulticore Programming Using the ParC Language10.1007/978-1-4471-2164-0_3(93-127)Online publication date: 2012
https://doi.org/10.1007/978-1-4471-2164-0_3
Xinyu YYing LShuiguang DJie C(2011)Improved Affine Partition Algorithm for Compile-Time and Runtime PerformanceIntelligent Automation & Soft Computing10.1080/10798587.2011.1064322017:8(1179-1191)Online publication date: Jan-2011
https://doi.org/10.1080/10798587.2011.10643220
López-Lagunas AChai S(2011)Streaming Data Movement for Real-Time Image AnalysisJournal of Signal Processing Systems10.1007/s11265-008-0336-x62:1(29-42)Online publication date: 1-Jan-2011
https://dl.acm.org/doi/10.1007/s11265-008-0336-x
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents