More Web Proxy on the site http://driver.im/

research-article

Slicing based code parallelization for minimizing inter-processor communication

Authors:

Mahmut Kandemir,

Sai Prasanth Muralidhara,

Sri Hari Krishna NarayananAuthors Info & Claims

CASES '09: Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems

Pages 87 - 96

https://doi.org/10.1145/1629395.1629409

Published: 11 October 2009 Publication History

Abstract

One of the critical problems in distributed memory multi-core architectures is scalable parallelization that minimizes inter-processor communication. Using the concept of iteration space slicing, this paper presents a new code parallelization scheme for data-intensive applications. This scheme targets distributed memory multi-core architectures, and formulates the problem of data-computation distribution (partitioning) across parallel processors using slicing such that, starting with the partitioning of the output arrays, it iteratively determines the partitions of other arrays as well as iteration spaces of the loop nests in the application code. The goal is to minimize inter-processor data communications. Based on this iteration space slicing based formulation of the problem, we also propose a solution scheme. The proposed data-computation scheme is evaluated using six data-intensive benchmark programs. In our experimental evaluation, we also compare this scheme against three alternate data-computation distribution schemes. The results obtained are very encouraging, indicating around 10% better speedup, with 16 processors, over the next-best scheme when averaged over all benchmark codes we tested.

References

[1]

H. Agrawal and J. R. Horgan. Dynamic program slicing. In Proceedings of the ACM Conference on Programming Language Design and Implementation, 1990.

Digital Library

[2]

S. P. Amarasinghe and M. S. Lam. Communication optimization and code generation for distributed memory machines. In ACM SIGPLAN Notices, 1993.

Digital Library

[3]

J. M. Anderson. Automatic computation and data decomposition for multiprocessors. Ph.D Thesis, Stanford University, March 1997.

Digital Library

[4]

P. Banerjee, J. A. Chandy, M. Gupta, J. G. Holm, A. Lain, D. J. Palermo, S. Ramaswamy, and E. Su. The paradigm compiler for distributed-memory multicomputers. In IEEE Computer, 1995.

Digital Library

[5]

A. Basumallik, S.-J. Min, and R. Eigenmann. Programming distributed systems using Openmp. In Proceedings of HIPS, 2007.

[6]

R. Brightwell, R. Riesen, K. D. Underwood. Analyzing the impact of overlap, offload, and independent progress for message passing interface applications. In International Journal of High Performance Computing Applications, May 2005.

Digital Library

[7]

S. Chakrabarti, M. Gupta, and J. deok Choi. Global communication analysis and optimization. In Proceedings of the ACM Conference on Programming Language Design and Implementation, 1996.

Digital Library

[8]

S. Chatterjee, J. R. Gilbert, F. J. E. Long, R. Schreiber, S. Hua Teng, D. John, and R. Gilbert. Generating local addresses and communication sets for data-parallel programs. In Journal of Parallel and Distributed Computing, 1995.

Digital Library

[9]

A. Danalis, K.-Y. Kim, L. Pollock, M. Swany. Transformations to parallel codes for communication-computation overlap. In Proceedings of the ACM/IEEE Conference on Supercomputing, 2005.

Digital Library

[10]

P. Feautrier, J.F. Collard, C. Bastoul. Solving systems of affine (in)equalities. Technical Report, PRiSM,Versailles University, France, 2002.

[11]

K. B. Gallagher and J. R. Lyle. Using program slicing in software maintenance. In IEEE Transactions on Software Engineering, 17:751--761, 1991.

Digital Library

[12]

C. Gong, R. Gupta, and R. Melhem. Compilation techniques for optimizing communication on distributed-memory systems. In In Proc. International Conference on Parallel Processing, Volume II, 1993.

Digital Library

[13]

M. Gupta and P. Banerjee. Demonstration of automatic data partitioning techniques for parallelizing compilers on multicomputers. IEEE Transactions on Parallel and Distributed Systems, 3:179--193, 1992.

Digital Library

[14]

M. W. Hall, M. W. Hall, S. Hiranandani, S. Hiranandani, K. Kennedy, K. Kennedy, C. wen Tseng, and C. wen Tseng. Compiling Fortran D for MIMD distributed--memory machines. Communications of the ACM, 35:66--80, 1992.

Digital Library

[15]

M. J. Harrold and N. Ci. Reuse-driven interprocedural slicing. In In Proceedings of the 20th International Conference on Software Engineering, 1998.

Digital Library

[16]

High Performance Fortran. http://www.netlib.org/hpf/

[17]

S. Horwitz, T. Reps, and D. Binkley. Interprocedural slicing using dependence graphs. In ACM Transactions on Programming Languages and Systems, 12:26--60, 1990.

Digital Library

[18]

K. Kennedy and K. Kennedy. Combining dependence and data-flow analyses to optimize communication. In Proceedings of the 9th International Parallel Processing Symposium, 1995.

Digital Library

[19]

K. Kennedy and A. Sethi. Resource-based communication placement analysis. In Proceedings of the 9th Workshop on Language and Compilers for Parallel Computing, 1996.

Digital Library

[20]

R. Komondoor and S. Horwitz. Using slicing to identify duplication in source code. In Proceedings of the 8th International Symposium on Static Analysis, 2001.

Digital Library

[21]

D. Liang and M. J. Harrold. Reuse-driven interprocedural slicing in the presence of pointers and recursion. In Proceedings of International Conference on Software Maintenance, 1999.

Digital Library

[22]

MPI Standard. http://www-unix.mcs.anl.gov/mpi/

[23]

The Omega Project. http://www.cs.umd.edu/projects/omega/

[24]

L.-N. Pouchet, C. Bastoul, A. Cohen, and N. Vasilache. Iterative optimization in the polyhedral model: Part I, one-dimensional time. In Proceedings of CGO, pp. 144--156, 2007.

Digital Library

[25]

E. Rosser and W. Pugh. Iteration space slicing and its application to communication optimization. In Proceedings of the International Conference on Supercomputing,1997.

Digital Library

[26]

T. Reps and W. Yang. The semantics of program slicing. Technical Report, University of Wisconsin, 1988.

[27]

http://www.virtutech.com/

[28]

http://suif.stanford.edu/

[29]

F. Tip. A survey of program slicing techniques. Journal of Programming Languages, 3:121--189, 1995.

[30]

S. Verdoolaege, K. Beyls, M. Bruynooghe, and F. Catthoor. Experiences with Enumeration of Integer Projections of Parametric Polytopes. In Proceedings of the Compiler Construction Symposium, pp. 91--105, 2005.

Digital Library

[31]

M. Weiser. Program slicing. In Proceedings of the 5th international conference on Software engineering, 1981.

Digital Library

[32]

X. Zhang and R. Gupta. Cost effective program slicing. In Proceedings of the Programming Languages Design and Implementation, 2004.

[33]

J. Anderson and M. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In PLDI '93: Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation, pp.112--125, 1993.

Digital Library

[34]

M. Wolf and M. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. Parallel Distrib. Syst. 2, 4:452--471,1991.

Digital Library

[35]

K. Bondalapati. Parallelizing DSP nested loops on reconfigurable architectures using data context switching. In Proc. of the 38th Design Automation Conference, pp.273--276, 2001.

Digital Library

[36]

G. Goumas, N. Drosinos, M. Athanasaki and N. Koziris. Automatic parallel code generation for tiled nested loops. In Proc. of the ACM Symposium on Applied Computing, pp. 1412--1419, 2004.

Digital Library

[37]

B. Mei, S. Vernalde, D. Verkest, H. Man and R. Lauwereins. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. In Proc. of the Conference on Design, Automation and Test in Europe, pp. 10296--10301, 2003.

Digital Library

[38]

K. Hogstedt, L. Carter and J. Ferrante. On the parallel execution time of tiled loops. IEEE Transactions on Parallel Distributed Systems 14, 3:307--321, 2003.

Digital Library

[39]

A. Navarro, E. Zapata and D. Padua. Compiler techniques for the distribution of data and computation. IEEE Transactions on Parallel Distributed Systems 14, 6:545--562, 2003.

Digital Library

Index Terms

Slicing based code parallelization for minimizing inter-processor communication
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

A comparison of automatic parallelization tools/compilers on the SGI origin 2000
SC '98: Proceedings of the 1998 ACM/IEEE conference on Supercomputing

Porting applications to new high performance parallel and distributed computing platforms is a challenging task. Since writing parallel code by hand is time consuming and costly, porting codes would ideally be automated by using some parallelization ...
Coarse-grained loop parallelization: Iteration Space Slicing vs affine transformations

Automatic coarse-grained parallelization of program loops is of great importance for parallel computing systems. This paper presents the theory of Iteration Space Slicing aimed at extracting synchronization-free parallelism available in arbitrarily ...
Multi-processor performance on the Tera MTA
SC '98: Proceedings of the 1998 ACM/IEEE conference on Supercomputing

The Tera MTA is a revolutionary commercial computer based on a multithreaded processor architecture. In contrast to many other parallel architectures, the Tera MTA can effectively use high amounts of parallelism on a single processor. By running ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CASES '09: Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems

October 2009

298 pages

ISBN:9781605586267

DOI:10.1145/1629395

Program Chairs:
Joerg Henkel
Karlsruhe University (TH), Germany
,
Sri Parameswaran
University of New South Wales, Australia

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 October 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESWeek '09

Sponsor:

ESWeek '09: Fifth Embedded Systems Week

October 11 - 16, 2009

Grenoble, France

Acceptance Rates

Overall Acceptance Rate 52 of 230 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
241
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten