Abstract
We demonstrate approaches to the static parallelization of loops and recursions on the example of the polynomial product. Phrased as a loop nest, the polynomial product can be parallelized automatically by applying a space-time mapping technique based on linear algebra and linear programming. One can choose a parallel program that is optimal with respect to some objective function like the number of execution steps, processors, channels, etc. However,at best,linear execution time complexity can be atained. Through phrasing the polynomial product as a divide-and-conquer recursion, one can obtain a parallel program with sublinear execution time. In this case, the target program is not derived by an automatic search but given as a program skeleton, which can be deduced by a sequence of equational program transformations. We discuss the use of such skeletons, compare and assess the models in which loops and divide-and-conquer resursions are parallelized and comment on the performance properties of the resulting parallel implementations.
Similar content being viewed by others
References
A.V. Aho, J.E. Hopcroft, and J.D. Ullman. The Design and Analysis of Computer Algorithms. Series in Computer Science and Information Processing. Addison-Wesley, Reading, MA, 1974.
C. Ancourt, F. Coelho, B. Creusillet, F. Irigoin, P. Jouvelot, and R. Keryell. PIPS: A framework for building interprocedural compilers, parallelizers and optimizers. Technical Report A/289. Centre de Recherche en Informatique, Ecole des Mines de Paris, April, 1996. WWW: http://www.cri.ensmp.fr/~pips/.
D.G. Baltus and J. Allen. Efficient exploration of nonuniform space-time transformations for optimal systolic array synthesis. In L. Dadda and B. Wah, eds, Proc. Int. Conf. Application Specific Array Processors (ASAP'93), pp. 428–441. IEEE Computer Society Press. Los Alamitos, CA, 1993.
U. Banerjee. Loop Transformations for Restructuring Compilers: The Foundations. Series on Loop Transformations for Restructuring Compilers, Kluwer Academic Publishers, 1993.
U. Banerjee. Loop Parallelization. Series on Loop Transformations for Restructuring Compilers, Kluwer Academic Publishers, 1994.
U. Banerjee. Dependence Analysis. Series on Loop Transformations for Restructuring Compilers, Kluwer Academic Publishers, 1997.
R.S. Bird. Lectures on constructive functional programming. In M. Broy, ed., Constructive Methods in Computing Science, NATO ASI Series F: Computer and Systems Sciences, Vol. 55, pp. 151–216, Springer-Verlag, 1988.
G. Blelloch. Scans as primitive parallel operations. IEEE Trans. on Computers, 38:1526–1538, 1989.
P. Boulet, M. Dijon, E. Lequiniou, and T. Risset. Reference manual of the Bouclettes parallelizer. Technical Report 94-04, Laboratoire de l'Informatique du Parallélisme. Ecole Normale Supérieure de Lyon, 1994. WWW: http://www.prism.uvsq.fr/public/bop/base/bclt/ bouclettes.html.
M.I. Cole. Algorithmic Skeletons: Structured Management of Parallel Computation, Research Monographs in Parallel and Distributed Computing, Pitman, 1989.
M.I. Cole. Parallel programming with list homomorphisms. Parallel Processing Letters, 5:191–204, 1995.
M.I. Cole, S. Gorlatch, C. Lengauer, and D.B. Skillicorn, eds. Theory and practice of higher-order parallel programming. Technical Report 169. Schloβ Dagstuhl, 1997.
J.-F. Collard. Automatic parallelization of while-loops using speculative execution. Int. J. Parallel Programming, 23:191–219, 1995.
A. Darte. Regular partitioning for synthesizing fixed-size systolic arrays. INTEGRATION, 12:293–304, 1991.
E.W. Dijkstra and C.S. Scholten. Predicate Calculus and Program Semantics. Texts and Monographs in Computer Science, Springer-Verlag, 1990.
P. Feautrier. Array expansion. In Proc. Int. Conf. on Supercomputing, pp. 429–441, ACM Press, 1988.
P. Feautrier. Automatic parallelization in the polytope model. In G.-R. Perrin and A. Darte, eds., The Data Parallel Programming Model. Lecture Notes in Computer Science 1132, pp. 79–103, Springer-Verlag, 1996.
J. Gibbons. Upwards and downwards accumulations on trees. In R. Bird, C. Morgan, and J. Woodcock, eds., Mathematics of Program Construction, Lecture Notes in Computer Science 669, pp. 122–138, Springer-Verlag, 1992.
S. Gorlatch. From transformations to methodology in parallel program development: a case study. Microprocessing and Microprogramming, 41:571–588, 1996.
S. Gorlatch. Systematic efficient parallelization of scan and other list homomorphisms. In L. Bougé, P. Fraigniaud, A. Mignotte, and Y. Robert, eds., Euro-Par'96, Lecture Notes in Computer Science 1124, pp. 401–408, Springer-Verlag, 1996.
S. Gorlatch. Systematic extraction and implementation of divide-and-conquer parallelism. In H. Kuchen and D. Swierstra, eds., Programming Languages: Implementation, Logics and Programs, Lecture Notes in Computer Science 1140, pp. 274–288, Springer-Verlag, 1996c.
S. Gorlatch and H. Bischof. Formal derivation of divide-and-conquer programs: A case study in the multidimensional FFT's. In D. Mery, ed., Proc. 2nd Int. Workshop on Formal Methods for Parallel Programming: Theory and Applications (FMPPTA'97), pp. 80–94, IEEE Computer Society, 1997.
M. Griebl. The Mechanical Parallelization of Loop Nests Containing while Loops. PhD thesis, Fakultät für Mathematik und Informatik, Universität Passau. Technical Report MIP-9701, 1996.
M. Griebl and C. Lengauer. Classifying loops for space-time mapping. In L. Bougé, P. Fraigniaud, A. Mignotte, and Y. Robert, eds., Euro-Par'96, Vol. I, Lecture Notes in Computer Science 1123, pp. 467–474, Springer-Verlag, 1996.
M. Griebl and C. Lengauer. The loop parallelizer LooPo. In M. Gerndt, ed., Proc. Sixth Workshop on Compilers for Parallel Computers, volume 21 of Konferenzen des Forschungszentrums Jülich, pp. 311–320, Forschungszentrum Jülich, 1996. WWW: http://www.uni-passau.de/~loopo/.
C.A. Herrmann and C. Lengauer. On the space-time mapping of a class of divide-and-conquer recursions. Parallel Processing Letters, 6:525–537, 1996.
C.A. Herrmann and C. Lengauer. Transformation of divide -- conquer to nested parallel loops. In H. Glaser, P. Hartel, and H. Kuchen, eds., Programming Languages: Implementation, Logics, and Programs (PLILP'97), Lecture Notes in Computer Science 1292, pp. 95–109, Springer-Verlag, 1997.
V. Kumar, A. Grama, A. Gupta, and G. Karypis. Introduction to Parallel Computing: Design and Analysis of Algorithms, Benjamin/Cummings, 1994.
C. Lengauer. Aview of systolic design. In N.N. Mirenkov, ed., Parallel Computing Technologies (PaCT-91), pp. 32–46, World Scientific, 1991.
C. Lengauer. Loop parallelization in the polytope model. In E. Best, ed., CONCUR'93, Lecture Notes in Computer Science 715, pp. 398–416, Springer-Verlag, 1993.
C. Lengauer and M. Griebl. On the parallelization of loop nests containing while loops. In N.N. Mirenkov, Q.-P. Gu, S. Peng, and S. Sedukhin, eds., Proc. 1st Aizu Int. Symp. on Parallel Algorithm/Architecture Synthesis (pAs'95), pp. 10–18, IEEE Computer Society Press, 1995.
C. Lengauer, L. Thiele, M. Wolfe, and H. Zima, eds. Loop parallelization. Technical Report 142, Schloβ Dagstuhl, 1996.
V. Loechner and C. Mongenet. OPERA:Atoolbox for loop parallelization. In I. Jelly, I. Gorton, and P. Croll, eds., Proc. 1st Int. Workshop on Software Engineering for Parallel and Distributed Systems, pp. 134–145. Chapman -- Hall, 1996. WWW: http://icps.u-strasbg.fr/opera/.
R. Miller. A Constructive Theory of Multidimensional Arrays. PhD thesis, Programming Research Group, Oxford University, 1993.
Z.G. Mou and P. Hudak. An algebraic model for divide-and-conquer algorithms and its parallelism. Journal of Supercomputing, 2:257–278, 1988.
M.J. Quinn. Parallel Computing, McGraw-Hill, 1994.
P. Quinton and Y. Robert. Systolic Algorithms and Architectures, Prentice-Hall, 1990.
R. Sedgewick. Algorithms, Addison-Wesley, 1988.
D.B. Skillicorn. Foundations of Parallel Programming, Cambridge International Series on Parallel Computation, Cambridge University Press, 1994.
D. Smith. Applications of a strategy for designing divide-and-conquer algorithms. Science of Computer Programming, 8:213–229, 1987.
J. Teich and L. Thiele. Control generation in the design of processor arrays. J. VLSI Signal Processing, 3:77–92, 1991.
S. Thompson. Haskell --The Craft of Functional Programming, International Computer Science Series, Addison-Wesley, 1996.
C. Wedler and C. Lengauer. Parallel implementations of combinations of broadcast, reduction and scan. In Proc. 2nd Int. Workshop on Software Engineering for Parallel and Distributed Systems (PDSE'97), pp. 108–119, IEEE Computer Society Press, 1997.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Lengauer, C., Gorlatch, S. & Herrmann, C. The Static Parallelization of Loops and Recursions. The Journal of Supercomputing 11, 333–353 (1997). https://doi.org/10.1023/A:1007904422322
Issue Date:
DOI: https://doi.org/10.1023/A:1007904422322