[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Blocking and array contraction across arbitrarily nested loops using affine partitioning

Published: 18 June 2001 Publication History

Abstract

Applicable to arbitrary sequences and nests of loops, affine partitioning is a program transformation framework that unifies many previously proposed loop transformations, including unimodular transforms, fusion, fission, reindexing, scaling and statement reordering. Algorithms based on affine partitioning have been shown to be effective for parallelization and communication minimization. This paper presents algorithms that improve data locality using affine partitioning.
Blocking and array contraction are two important optimizations that have been shown to be useful for data locality. Blocking creates a set of inner loops so that data brought into the faster levels of the memory hierarchy can be reused. Array contraction reduces an array to a scalar variable and thereby reduces the number of memory operations executed and the memory footprint. Loop transforms are often necessary to make blocking and array contraction possible.
By bringing the full generality of affine partitioning to bear on the problem, our locality algorithm can find more contractable arrays than previously possible. This paper also generalizes the concept of blocking and shows that affine partitioning allows the benefits of blocking be realized in arbitrarily nested loops. Experimental results on a number of benchmarks and a complete multigrid application in aeronautics indicates that affine partitioning is effective in practice.

References

[1]
N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In Proceedings of the 2000 ACM International Conference on Supercomputing, pages 141-152, May 2000.]]
[2]
N. Ahmed, N. Mateev, and K. Pingali. Tiling imperfectly-nested loop nests. In Proceedings of Supercomputing 2000, November 2000.]]
[3]
G. Aigner, A. Diwan, D. L. Heine, M. S. Lam, D. L. Moore, B. R. Murphy, and C. Sapuntzakis. An overview of the SUIF2 compiler infrastructure. Technical report, Stanford University, 2000.]]
[4]
D. Bacon, S. Graham, and O. Sharp. Compiler transformations for high-performance computing. Computing Surveys, 26(4):345-420, December 1994.]]
[5]
D. Bailey and J. Barton. The NAS kernel benchmark program. Technical Report 86711, NASA, 1985.]]
[6]
S. Carr, K. S. McKinley, and C.-W. Tseng. Compiler optimizations for improving data locality. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 252-262, October 1994.]]
[7]
P. Feautrier. Some efficient solutions to the affine scheduling problem, part I, one dimensional time. International Journal of Parallel Programming, 21(5):313- 348, October 1992.]]
[8]
G. R. Gao, R. Olsen, V. Sarkar, and R. Thekkath. Collective loop fusion for array contraction. In Proceedings of the Fifth Workshop on Languages and Compilers for Parallel Computing, pages 171-181, August 1992.]]
[9]
G. H. Golub and C. F. Van Loan. Matrix Computations - 2nd Edition. The John Hopkins University Press, Baltimore, Maryland, 1989.]]
[10]
A. Jameson. Solution of the Euler equations by amultigrid method. Applied Mathematics and Computation, 13:327-356, 1983.]]
[11]
K. Kennedy and K. S. McKinley. Optimizing for parallelism and data locality. In Proceedings of the 1992 ACM International Conference on Supercomputing, pages 323-334, July 1992.]]
[12]
K. Kennedy and K. S. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing, pages 301-320. Springer-Verlag, August 1993.]]
[13]
I. Kodukula, N. Ahmed, and K. Pingali. Data-centric multi-level blocking. In Proceedings of the ACM SIG- PLAN '97 Conference on Programming Language Design and Implementation, pages 346-357, June 1997.]]
[14]
E. C. Lewis, C. Lin, and L. Snyder. The implementation and evaluation of fusion and array contraction in array languages. In Proceedings of the ACM SIGPLAN '98 Conference onProgramming Language Design and Implementation, pages 50-59, June 1998.]]
[15]
S.-W. Liao. SUIF Explorer: an Interactive and Interprocedural Paralllelizer. PhD thesis, Stanford University, August 2000. Published as CSL-TR-00-807.]]
[16]
A. W. Lim, G. I. Cheong, and M. S. Lam. An affine partitioning algorithm to maximize parallelism and minimize communication. In Proceedings of the 13th ACM SIGARCH International Conference on Supercomputing, Rhodes, Greece, June 1999.]]
[17]
A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine transforms. In Conference Record of the 24th ACM SIGPLAN- SIGACT Symposium on Principles of Programming Languages, Paris, France, January 1997.]]
[18]
A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Computing, 24(3-4):445-475, May 1998.]]
[19]
W. Pugh and E. Rosser. Iteration space slicing for locality. In Proceedings of the Twelfth Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, August 1999.]]
[20]
V. Sarkar and G. R. Gao. Optimization of array accesses by collective loop transformations. In Proceedings of the 1991 ACM International Conference on Supercomputing, pages 194-204, June 1991.]]
[21]
A. Schrijver. Theory of Linear and Integer Programming. Wiley, Chichester, 1986.]]
[22]
M. E. Wolf. Improving Locality and Parallelism in Nested Loops. PhD thesis, Stanford University, August 1992. Published as CSL-TR-92-538.]]
[23]
M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN '91 Conference onProgramming Language Design and Implementation, pages 30-44, June 1991.]]
[24]
M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company, 1995.]]

Cited By

View all
  • (2023)CustomHalide – A new plugin of clang for loop optimizationProceedings of the 2023 9th International Conference on Computing and Artificial Intelligence10.1145/3594315.3594372(557-567)Online publication date: 17-Mar-2023
  • (2021)Compiler-directed scratchpad memory data transfer optimization for multithreaded applications on a heterogeneous many-core architectureThe Journal of Supercomputing10.1007/s11227-021-03853-xOnline publication date: 15-May-2021
  • (2019)Towards an Achievable Performance for the Loop NestsLanguages and Compilers for Parallel Computing10.1007/978-3-030-34627-0_6(70-77)Online publication date: 13-Nov-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 36, Issue 7
July 2001
143 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/568014
Issue’s Table of Contents
  • cover image ACM Conferences
    PPoPP '01: Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
    June 2001
    142 pages
    ISBN:1581133464
    DOI:10.1145/379539
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2001
Published in SIGPLAN Volume 36, Issue 7

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)CustomHalide – A new plugin of clang for loop optimizationProceedings of the 2023 9th International Conference on Computing and Artificial Intelligence10.1145/3594315.3594372(557-567)Online publication date: 17-Mar-2023
  • (2021)Compiler-directed scratchpad memory data transfer optimization for multithreaded applications on a heterogeneous many-core architectureThe Journal of Supercomputing10.1007/s11227-021-03853-xOnline publication date: 15-May-2021
  • (2019)Towards an Achievable Performance for the Loop NestsLanguages and Compilers for Parallel Computing10.1007/978-3-030-34627-0_6(70-77)Online publication date: 13-Nov-2019
  • (2016)Fusion of Parallel Array OperationsProceedings of the 2016 International Conference on Parallel Architectures and Compilation10.1145/2967938.2967945(71-85)Online publication date: 11-Sep-2016
  • (2015)Performance Modeling of Stencil Computing on a Stream-Based FPGA Accelerator for Efficient Design Space ExplorationIEICE Transactions on Information and Systems10.1587/transinf.2014RCP0013E98.D:2(298-308)Online publication date: 2015
  • (2015)An Abelian group model of commutative data dependence relations for the iteration space slicing2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)10.1109/SNPD.2015.7176231(1-6)Online publication date: Jun-2015
  • (2013)Tile size selection revisitedACM Transactions on Architecture and Code Optimization10.1145/2541228.255529210:4(1-27)Online publication date: 1-Dec-2013
  • (2012)Locality and SynchronizationMulticore Programming Using the ParC Language10.1007/978-1-4471-2164-0_3(93-127)Online publication date: 2012
  • (2011)Improved Affine Partition Algorithm for Compile-Time and Runtime PerformanceIntelligent Automation & Soft Computing10.1080/10798587.2011.1064322017:8(1179-1191)Online publication date: Jan-2011
  • (2011)Streaming Data Movement for Real-Time Image AnalysisJournal of Signal Processing Systems10.1007/s11265-008-0336-x62:1(29-42)Online publication date: 1-Jan-2011
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media