[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1629395.1629409acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
research-article

Slicing based code parallelization for minimizing inter-processor communication

Published: 11 October 2009 Publication History

Abstract

One of the critical problems in distributed memory multi-core architectures is scalable parallelization that minimizes inter-processor communication. Using the concept of iteration space slicing, this paper presents a new code parallelization scheme for data-intensive applications. This scheme targets distributed memory multi-core architectures, and formulates the problem of data-computation distribution (partitioning) across parallel processors using slicing such that, starting with the partitioning of the output arrays, it iteratively determines the partitions of other arrays as well as iteration spaces of the loop nests in the application code. The goal is to minimize inter-processor data communications. Based on this iteration space slicing based formulation of the problem, we also propose a solution scheme. The proposed data-computation scheme is evaluated using six data-intensive benchmark programs. In our experimental evaluation, we also compare this scheme against three alternate data-computation distribution schemes. The results obtained are very encouraging, indicating around 10% better speedup, with 16 processors, over the next-best scheme when averaged over all benchmark codes we tested.

References

[1]
H. Agrawal and J. R. Horgan. Dynamic program slicing. In Proceedings of the ACM Conference on Programming Language Design and Implementation, 1990.
[2]
S. P. Amarasinghe and M. S. Lam. Communication optimization and code generation for distributed memory machines. In ACM SIGPLAN Notices, 1993.
[3]
J. M. Anderson. Automatic computation and data decomposition for multiprocessors. Ph.D Thesis, Stanford University, March 1997.
[4]
P. Banerjee, J. A. Chandy, M. Gupta, J. G. Holm, A. Lain, D. J. Palermo, S. Ramaswamy, and E. Su. The paradigm compiler for distributed-memory multicomputers. In IEEE Computer, 1995.
[5]
A. Basumallik, S.-J. Min, and R. Eigenmann. Programming distributed systems using Openmp. In Proceedings of HIPS, 2007.
[6]
R. Brightwell, R. Riesen, K. D. Underwood. Analyzing the impact of overlap, offload, and independent progress for message passing interface applications. In International Journal of High Performance Computing Applications, May 2005.
[7]
S. Chakrabarti, M. Gupta, and J. deok Choi. Global communication analysis and optimization. In Proceedings of the ACM Conference on Programming Language Design and Implementation, 1996.
[8]
S. Chatterjee, J. R. Gilbert, F. J. E. Long, R. Schreiber, S. Hua Teng, D. John, and R. Gilbert. Generating local addresses and communication sets for data-parallel programs. In Journal of Parallel and Distributed Computing, 1995.
[9]
A. Danalis, K.-Y. Kim, L. Pollock, M. Swany. Transformations to parallel codes for communication-computation overlap. In Proceedings of the ACM/IEEE Conference on Supercomputing, 2005.
[10]
P. Feautrier, J.F. Collard, C. Bastoul. Solving systems of affine (in)equalities. Technical Report, PRiSM,Versailles University, France, 2002.
[11]
K. B. Gallagher and J. R. Lyle. Using program slicing in software maintenance. In IEEE Transactions on Software Engineering, 17:751--761, 1991.
[12]
C. Gong, R. Gupta, and R. Melhem. Compilation techniques for optimizing communication on distributed-memory systems. In In Proc. International Conference on Parallel Processing, Volume II, 1993.
[13]
M. Gupta and P. Banerjee. Demonstration of automatic data partitioning techniques for parallelizing compilers on multicomputers. IEEE Transactions on Parallel and Distributed Systems, 3:179--193, 1992.
[14]
M. W. Hall, M. W. Hall, S. Hiranandani, S. Hiranandani, K. Kennedy, K. Kennedy, C. wen Tseng, and C. wen Tseng. Compiling Fortran D for MIMD distributed--memory machines. Communications of the ACM, 35:66--80, 1992.
[15]
M. J. Harrold and N. Ci. Reuse-driven interprocedural slicing. In In Proceedings of the 20th International Conference on Software Engineering, 1998.
[16]
High Performance Fortran. http://www.netlib.org/hpf/
[17]
S. Horwitz, T. Reps, and D. Binkley. Interprocedural slicing using dependence graphs. In ACM Transactions on Programming Languages and Systems, 12:26--60, 1990.
[18]
K. Kennedy and K. Kennedy. Combining dependence and data-flow analyses to optimize communication. In Proceedings of the 9th International Parallel Processing Symposium, 1995.
[19]
K. Kennedy and A. Sethi. Resource-based communication placement analysis. In Proceedings of the 9th Workshop on Language and Compilers for Parallel Computing, 1996.
[20]
R. Komondoor and S. Horwitz. Using slicing to identify duplication in source code. In Proceedings of the 8th International Symposium on Static Analysis, 2001.
[21]
D. Liang and M. J. Harrold. Reuse-driven interprocedural slicing in the presence of pointers and recursion. In Proceedings of International Conference on Software Maintenance, 1999.
[22]
MPI Standard. http://www-unix.mcs.anl.gov/mpi/
[23]
The Omega Project. http://www.cs.umd.edu/projects/omega/
[24]
L.-N. Pouchet, C. Bastoul, A. Cohen, and N. Vasilache. Iterative optimization in the polyhedral model: Part I, one-dimensional time. In Proceedings of CGO, pp. 144--156, 2007.
[25]
E. Rosser and W. Pugh. Iteration space slicing and its application to communication optimization. In Proceedings of the International Conference on Supercomputing,1997.
[26]
T. Reps and W. Yang. The semantics of program slicing. Technical Report, University of Wisconsin, 1988.
[27]
http://www.virtutech.com/
[28]
http://suif.stanford.edu/
[29]
F. Tip. A survey of program slicing techniques. Journal of Programming Languages, 3:121--189, 1995.
[30]
S. Verdoolaege, K. Beyls, M. Bruynooghe, and F. Catthoor. Experiences with Enumeration of Integer Projections of Parametric Polytopes. In Proceedings of the Compiler Construction Symposium, pp. 91--105, 2005.
[31]
M. Weiser. Program slicing. In Proceedings of the 5th international conference on Software engineering, 1981.
[32]
X. Zhang and R. Gupta. Cost effective program slicing. In Proceedings of the Programming Languages Design and Implementation, 2004.
[33]
J. Anderson and M. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In PLDI '93: Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation, pp.112--125, 1993.
[34]
M. Wolf and M. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. Parallel Distrib. Syst. 2, 4:452--471,1991.
[35]
K. Bondalapati. Parallelizing DSP nested loops on reconfigurable architectures using data context switching. In Proc. of the 38th Design Automation Conference, pp.273--276, 2001.
[36]
G. Goumas, N. Drosinos, M. Athanasaki and N. Koziris. Automatic parallel code generation for tiled nested loops. In Proc. of the ACM Symposium on Applied Computing, pp. 1412--1419, 2004.
[37]
B. Mei, S. Vernalde, D. Verkest, H. Man and R. Lauwereins. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. In Proc. of the Conference on Design, Automation and Test in Europe, pp. 10296--10301, 2003.
[38]
K. Hogstedt, L. Carter and J. Ferrante. On the parallel execution time of tiled loops. IEEE Transactions on Parallel Distributed Systems 14, 3:307--321, 2003.
[39]
A. Navarro, E. Zapata and D. Padua. Compiler techniques for the distribution of data and computation. IEEE Transactions on Parallel Distributed Systems 14, 6:545--562, 2003.

Index Terms

  1. Slicing based code parallelization for minimizing inter-processor communication

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CASES '09: Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
    October 2009
    298 pages
    ISBN:9781605586267
    DOI:10.1145/1629395
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 October 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. automatic code parallelization
    2. code analysis and optimization
    3. iteration space slicing
    4. parallelizing compilers

    Qualifiers

    • Research-article

    Conference

    ESWeek '09
    ESWeek '09: Fifth Embedded Systems Week
    October 11 - 16, 2009
    Grenoble, France

    Acceptance Rates

    Overall Acceptance Rate 52 of 230 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 241
      Total Downloads
    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 28 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media