Abstract
Loop fusion is a program transformation that merges multiple loops into one. It is effective for reducing the synchronization overhead of parallel loops and for improving data locality. This paper presents three results for fusion: (1) a new algorithm for fusing a collection of parallel and sequential loops, minimizing parallel loop synchronization while maximizing parallelism; (2) a proof that performing fusion to maximize data locality is NP-hard; and (3) two polynomial-time algorithms for improving data locality. These techniques also apply to loop distribution, which is shown to be essentially equivalent to loop fusion. Our approach is general enough to support other fusion heuristics. Preliminary experimental results validate our approach for improving performance by exploiting data locality and increasing the granularity of parallelism.
This research was supported by the Center for Research on Parallel Computation, a NSF Science and Technology Center. Use of the Sequent Symmetry S81 was provided under NSF Cooperative Agreement No. CDA-8619393.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
W. Abu-Sufah. Improving the Performance of Virtual Memory Computers. PhD thesis, Dept. of Computer Science, University of Illinois at Urbana-Champaign, 1979.
F. Allen and J. Cocke. A catalogue of optimizing transformations. In J. Rustin, editor, Design and Optimization of Compilers. Prentice-Hall, 1972.
J. R. Allen, D. Callahan, and K. Kennedy. Automatic decomposition of scientific programs for parallel execution. In Proceedings of the Fourteenth Annual ACM Symposium on the Principles of Programming Languages, Munich, Germany, Jan. 1987.
J. R. Allen and K. Kennedy. Automatic translation of Fortran programs to vector form. ACM Transactions on Programming Languages and Systems, 9(4):491–542, Oct. 1987.
A. J. Bernstein. Analysis of programs for parallel processing. IEEE Transactions on Electronic Computers, 15(5):757–763, Oct. 1966.
D. Callahan. A Global Approach to Detection of Parallelism. PhD thesis, Dept. of Computer Science, Rice University, Mar. 1987.
D. Callahan, S. Carr, and K. Kennedy. Improving register allocation for sub-scripted variables. In Proceedings of the SIGPLAN '90 Conference on Program Language Design and Implementation, White Plains, NY, June 1990.
S. Carr, K. Kennedy, K. S. McKinley, and C. Tseng. Compiler optimizations for improving data locality. Technical Report TR92-195, Dept. of Computer Science, Rice University, Nov. 1992.
G. Cybenko, L. Kipp, L. Pointer, and D. Kuck. Supercomputer performance evaluation and the Perfect benchmarks. In Proceedings of the 1990 ACM International Conference on Supercomputing, Amsterdam, The Netherlands, June 1990.
R. Cytron, J. Ferrante, and V. Sarkar. Experiences using control dependence in PTRAN. In D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing. The MIT Press, 1990.
E. Dahlhaus, D. S. Johnson, C. H. Papadimitriou, P. D. Seymour, and M. Yannakakis. The complexity of multiway cuts. In Proceedings of the 24th Annual ACM Symposium on the Theory of Computing, May 1992.
J. Ferrante, K. Ottenstein, and J. Warren. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems, 9(3):319–349, July 1987.
Ford, Jr., L. R. and D. R. Fulkerson. Flows in Networks. Princeton University Press, Princeton, NJ, 1962.
G. Gao, R. Olsen, V. Sarkar, and R. Thekkath. Collective loop fusion for array contraction. In Proceedings of the Fifth Workshop on Languages and Compilers for Parallel Computing, New Haven, CT, Aug. 1992.
A. Goldberg and R. Paige. Stream processing. In Conference Record of the 1984 ACM Symposium on Lisp and Functional Programming, pages 228–234, Aug. 1984.
A. V. Goldberg and R. E. Tarjan. A new approach to the maximum-flow problem. Journal of the Association for Computing Machinery, 35(4):921–940, Oct. 1988.
M. W. Hall, K. Kennedy, and K. S. McKinley. Interprocedural transformations for parallel code generation. In Proceedings of Supercomputing '91, Albuquerque, NM, Nov. 1991.
K. Kennedy and K. S. McKinley. Loop distribution with arbitrary control flow. In Proceedings of Supercomputing '90, New York, NY, Nov. 1990.
K. Kennedy and K. S. McKinley. Optimizing for parallelism and data locality. In Proceedings of the 199S ACM International Conference on Supercomputing, Washington, DC, July 1992.
K. Kennedy and K. S. McKinley. Typed fusion with applications to parallel and sequential code generation. Technical Report TR93-208, Dept. of Computer Science, Rice University, Aug. 1993.
K. Kennedy, K. S. McKinley, and C. Tseng. Analysis and transformation in an interactive parallel programming tool. Concurrency: Practice & Experience, to appear 1993.
K. S. McKinley. Automatic and Interactive Parallelization. PhD thesis, Dept. of Computer Science, Rice University, Apr. 1992.
A. Porterfield. Software Methods for Improvement of Cache Performance. PhD thesis, Dept. of Computer Science, Rice University, May 1989.
V. Sarkar and G. Gao. Optimization of array accesses by collective loop transformations. In Proceedings of the 1991 ACM International Conference on Supercomputing, Cologne, Germany, June 1991.
J. Warren. A hierachical basis for reordering transformations. In Conference Record of the Eleventh Annual ACM Symposium on the Principles of Programming Languages, Salt Lake City, UT, Jan. 1984.
M. Yannakakis, P. C. Kanellakis, S. C. Cosmadakis, and C. H. Papadimitriou. Cutting and partitioning a graph after a fixed pattern. Automata, Languages, and Programming — Lecture Notes in Computer Science, 154:712–722, 1983.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kennedy, K., McKinley, K.S. (1994). Maximizing loop parallelism and improving data locality via loop fusion and distribution. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1993. Lecture Notes in Computer Science, vol 768. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57659-2_18
Download citation
DOI: https://doi.org/10.1007/3-540-57659-2_18
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57659-4
Online ISBN: 978-3-540-48308-3
eBook Packages: Springer Book Archive