[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

An empirical study of data decomposition for software parallelization

Published: 01 March 2017 Publication History

Abstract

Multi-core programming is becoming increasingly important.Data decomposition is a key challenge during parallelization for multi-core CPUs.We conduct a multi-method study to better understand data decomposition.We derive a set of 10 key requirements for tools to support parallelization.The state-of-the-art tooling support does not support these requirements. Context: Multi-core architectures are becoming increasingly ubiquitous and software professionals are seeking to leverage the capabilities of distributed-memory architectures. The process of parallelizing software applications can be very tedious and error-prone, in particular the task of data decomposition. Empirical studies investigating the complexity of data decomposition and communication are lacking.Objective: Our objective is threefold: (i) to gain an empirical-based understanding of the task of data decomposition as part of the parallelization of software applications; (ii) to identify key requirements for tools to assist developers in this task, and (iii) assess the current state-of-the-art.Methods: Our empirical investigation employed a multi-method approach, using an interview study, participant-observer case study, focus group study, and a sample survey. The empirical investigation involved collaborations with three industry partners: IBMs High Performance Computing Center, the Irish Centre for High-End Computing (ICHEC), and JBA Consulting.Results: This article presents data decomposition as one of the most prevalent tasks of parallelizing applications for multi-core architectures. Based on our studies, we identify ten key requirements for tool support to help HPC developers in this area. Our evaluation of the state-of-the-art shows that none of the extant tool support implements all 10 requirements.Conclusion: While there is a considerable body of research in the area of HPC, a few empirical studies exist which explicitly focus on the challenges faced by practitioners in this area; this research aims to address this gap. The empirical studies in this article provide insights that may help researchers and tool vendors to better understand the needs of parallel programmers.

References

[1]
CUDA v.5.5 NVIDIA. 2013. Available from: http://www.nvidia.com/object/cuda_home_new.html accessed November 2013.
[2]
J. Anvik, J. Schaeffer, D. Szafron, K. Tan, Why not use a pattern-based parallel programming system?, in: Euro-Par 2003 Parallel Processing, Springer, 2003, pp. 81-86.
[3]
R. Arora, P. Bangalore, M. Mernik, Raising the level of abstraction for developing message passing applications, J. Supercomput., 59 (2010) 1079-1100.
[4]
R. Arora, E. Capetillo, P. Bangalore, M. Mernik, A high-level framework for parallelizing legacy applications for multiple platforms, 2013.
[5]
K. Asanovic, R. Bodik, B. Catanzaro, J.J. Gebis, P. Husbands, K. Keutzer, D.A. Patterson, W. Plishker, J. Shalf, S.W. Williams, The landscape of parallel computing research: A view from berkeley, UCB/EECS-2006-183 EECS Department, University of California Berkeley, 2006.
[6]
S. Balay, J. Brown, K. Buschelman, V. Eijkhout, W.D. Gropp, D. Kaushik, M.G. Knepley, L.C. McInnes, B.F. Smith, H. Zhang, Petsc users manual, ANL-95/11 - Revision 3.4, Argonne National Laboratory, 2013.
[7]
V.R. Basili, J.C. Carver, D. Cruzes, L.M. Hochstein, J.K. Hollingsworth, F. Shull, M.V. Zelkowitz, Understanding the high-performance-computing community: a software engineers perspective, IEEE Software, 25 (2008) 29-36.
[8]
H. Besch, H. Bi, P. Enskonatus, G. Heber, M. Wilhelmi, High-level data parallel programming in promoter, 1997.
[9]
M. Blatt, P. Bastian, C++ components describing parallel domain decomposition and communication, Int. J. Parallel, Emergent Distrib. Syst., 24 (2009) 467-477.
[10]
F. Buschmann, R. Meunier, H. Rohnert, P. Sommerlad, M. Stal, Pattern-oriented software architecture, A Syst. Patt., 1 (1996).
[11]
SE-HPCCSE 13: Proceedings of the 1st International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering, in: SE-HPCCSE 13: Proceedings of the 1st International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering, ACM, New York, NY, USA, 2013.
[12]
F. Chan, J. Cao, Y. Sun, High-level abstractions for message-passing parallel programming, Parallel Comput., 29 (2003) 1589-1621.
[13]
P. Crandall, M. Quinn, Block data decomposition for data-parallel programming on a heterogeneous workstation network, 1993.
[14]
A. Crossley, R. Lamb, S. Waller, Fast solution of the shallow water equations using GPU technology, 2010.
[15]
K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, K. Yelick, Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures, 2008.
[16]
J. Diaz, C. Munoz-Caro, A. Nino, A survey of parallel programming models and tools in the multi and many-core era, IEEE Trans. Parallel Distributed Syst., 23 (2012) 1369-1386.
[17]
D. Dig, A refactoring approach to parallelism, IEEE Software, 28 (2011) 17-22.
[18]
J. Dongarra, R. Graybill, W. Harrod, R. Lucas, E. Lusk, P. Luszczek, J. Mcmahon, A. Snavely, J. Vetter, K. Yelick, Darpas hpcs program: History, models, tools, languages, Adv. Comput., 72 (2008) 1-100.
[19]
E. Dovolnov, A. Kalinov, S. Klimov, Natural block data decomposition for heterogeneous clusters, 2003.
[20]
Doxygen, 2013. www.doxygen.org accessed August 2013.
[21]
R. Duncan, A survey of parallel computer architectures, Computer, 23 (1990) 5-16.
[22]
DUNE, 2013. Institute of parallel and distributed systems, university of stuttgart. Available from: http://www.dune-project.org accessed January 2013.
[23]
R. Eccles, D. Stacey, Understanding the parallel programmer, 2006.
[24]
M. Flynn, Some computer organizations and their effectiveness, IEEE Trans. Comput., C-21 (1972) 948-960.
[25]
M. Fowler, Pearson Education, 2010.
[26]
E. Gallopoulos, E. Houstis, J.R. Rice, Computer as thinker/doer: Problem-solving environments for computational science, IEEE Comput. Sci. Eng., 1 (1994) 11-23.
[27]
Gara, A., 2011. IBM blue gene supercomputer. http://domino.watson.ibm.com.
[28]
J. Gerlach, Generic programming of parallel applications with janus, Parallel Process. Lett., 12 (2002) 175-190.
[29]
J. Gerlach, P. Gottschling, U. Der, A generic c++ framework for parallel mesh-based scientific applications, in: High-Level Parallel Programming Models and Supportive Environments, Springer, 2001, pp. 45-54.
[30]
W.K. Giloi, M. Kessler, A. Schramm, Promoter: A high level object-parallel programming language, 1995.
[31]
W.K. Giloi, A. Schramm, Promoter: an application-oriented programming model for massive parallelism, 1993.
[32]
D. Goswami, A. Singh, B.R. Preiss, Building parallel applications using design patterns, in: Advances in software engineering, Springer, 2002, pp. 243-265.
[33]
D. Goswami, A. Singh, B.R. Preiss, From design patterns to parallel architectural skeletons, J. Parallel Distributed Comput., 62 (2002) 669-695.
[34]
GProf, 2014. GProf profiler. Available from: http://sourceforge.net/apps/mediawiki/cpwiki/index.php?title=Gprof accessed January 2014.
[35]
A. Grama, A. Gupta, G. Karypis, V. Kumar, Addison Wesley, 2003.
[36]
T. Grossman, G. Fitzmaurice, R. Attar, A survey of software learnability: metrics, methodologies and guidelines, 2009.
[37]
G. Hager, G. Wellein, CRC Press, 2010.
[38]
B. Hendrickson, Graph partitioning and parallel solvers: has the emperor no clothes?, in: Solving Irregularly Structured Problems in Parallel, Springer-Verlag, Berlin/Heidelberg, 1998, pp. 218-225.
[39]
L. Hochstein, V.R. Basili, The ASC-alliance projects: a case study of large-scale parallel scientific code development, IEEE Comput., 41 (2008) 50-58.
[40]
K. Hornbk, Current practice in measuring usability: Challenges to usability studies and research, Int. J. Human-comput. Studies, 64 (2006) 79-102.
[41]
S.E. Hove, B. Anda, Experiences from conducting semi-structured interviews in empirical software engineering research, 2005.
[42]
A. Jannesari, F. Wolf, W.F. Tichy, SEPS 2014: first international workshop on software engineering for parallel systems, 2014.
[43]
S. Karmesin, J. Crotinger, J. Cummings, S. Haney, W. Humphrey, J. Reynders, S. Smith, T.J. Williams, Array design and expression evaluation in pooma ii, in: Computing in Object-Oriented Parallel Environments, Springer, 1998, pp. 231-238.
[44]
H. Kasim, M. Verdi, R. Zhang, S. See, Survey on parallel programming model, in: Network and Parallel Computing, Springer, 2008, pp. 266-275.
[45]
K. Kennedy, C. Koelbel, H. Zima, The rise and fall of high performance fortran: an historical object lesson, 2007.
[46]
K. Keutzer, B. Massingill, T. Mattson, B. Sanders, A design pattern language for engineering (parallel) software: merging the plpp and opl projects, 2010.
[47]
M. Kiefer, K. Molitorisz, J. Bieler, W. Tichy, Parallelizing a real-time audio application a case study in multithreaded software engineering, May 2015.
[48]
B. Kitchenham, S. Pfleeger, Personal opinion surveys, in: Guide to Advanced Empirical Software Engineering, Springer London, 2008, pp. 63-92.
[49]
B.A. Kitchenham, P. Brereton, M. Turner, M.K. Niazi, S. Linkman, R. Pretorius, D. Budgen, Refining the systematic literature review process, two participant-observer case studies, Empirical Softw. Eng., 15 (2010) 618-653.
[50]
F.B. Kjolstad, M. Snir, Ghost cell pattern, 2010.
[51]
J. Kontio, J. Bragge, L. Lehtola, The focus group method as an empirical tool in software engineering, in: Guide to advanced empirical software engineering, Springer, 2008, pp. 93-116.
[52]
R. Lamb, M. Crossley, S. Waller, A fast two-dimensional floodplain inundation model, Ice Virtual Library, 2009.
[53]
S. Larkin, A. Grant, W. Hewitt, A data decomposition tool for writing parallel modules in visualization systems, 1996.
[54]
J. Leng, K. Roy, J. Brooke, Vipar: A case study for UKHEC, Technical Report, University of Manchester (2001).
[55]
T. Li, J. Grace, X. Bi, Study of wall boundary condition in numerical simulations of bubbling fluidized beds, Powder Technol., 203 (2010) 447-457.
[56]
B. Massingill, T. Mattson, B. Sanders, Reengineering for parallelism: an entry point into plpp for legacy applications, Concurrency Comput.: Practice Experience, 19 (2007) 503-529.
[57]
T. Mattson, B. Sanders, B. Massingill, Addison-Wesley Professional, 2004.
[58]
A. Meade, J. Buckley, J.J. Collins, Challenges of evolving sequential to parallel code: an exploratory review, 2011.
[59]
A. Meade, D.K. Deeptimahanti, M. Johnston, J. Buckley, J.J. Collins, Data decomposition for code parallelization in practice: what do the experts need?, 2013.
[60]
P. Mehta, A. Nelson, D. Szafron, Is mpi suitable for a generative design-pattern system?, Parallel Comput., 32 (2006) 616-626.
[61]
S. Midkiff, Data distribution, Springer US, 2011.
[62]
MPI-2, 2014. A message-passing interface standard, version 2.2. http://mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf accessed January 2014.
[63]
mpiP, 2014. mpip: Lightweight, scalable MPI profiler. Available from: http://mpip.sourceforge.net accessed January 2014.
[64]
D.R. Musser, G.J. Derge, A. Saini, Addison-Wesley Professional, 2001.
[65]
J. Nielsen, Academic Press, 1993.
[66]
J. Nolte, Y. Ishikawa, M. Sato, Taco: prototyping high-level object-oriented programming constructs by means of template based programming techniques, ACM SIGPLAN Notices, 36 (2001) 35-49.
[67]
J. Nolte, M. Sato, Y. Ishikawa, Taco - dynamic distributed collections with templates and topologies, Springer Berlin Heidelberg, 2000.
[68]
B. Oates, Researching information systems and computing, Sage Publications, 2006.
[69]
OpenCL v.2.0, 2013. Khronos Group. Available from: http://www.khronos.org/opencl/ accessed November 2013.
[70]
OpenMP, v., 2013. API specification for Parallel Programming. Available from: http://openmp.org/wp/openmp-specifications/ accessed November 2013.
[71]
C. Pancake, C. Cook, What users need in parallel tool support: survey results and analysis, 1994.
[72]
V. Pankratius, Software engineering in the era of parallelism, Emerging Res. Directions Comput. Sci.: Contributions Young Informat. Faculty Karlsruhe, 100 (2010) 10-45.
[73]
V. Pankratius, C. Schaefer, A. Jannesari, W.F. Tichy, Software engineering for multicore systems: an experience report, 2008.
[74]
ParMETIS, 2013. University of minnesota. Available from: http://glaros.dtc.umn.edu/gkhome/views/metis/parmetis/ accessed January 2013.
[75]
D. Patterson, The trouble with multicore, IEEE Spectrum, 47 (Jun 2010) 1-7.
[76]
POSIX.1, 2008. The IEEE and the open group. Available: http://pubs.opengroup.org/onlinepubs/9699919799/ accessed November 2013.
[77]
I. Rafique, J. Weng, Y. Wang, M. Abbasi, P. Lew, X. Wang, Evaluating software learnability: a learnability attributes model, 2012.
[78]
J.V. Reynders, P.J. Hinker, J.C. Cummings, S.R. Atlas, S. Banerjee, W.F. Humphrey, S.R. Karmesin, K. Keahey, M. Srikant, M. Tholburn, Pooma: a framework for scientific simulations on parallel architectures, Parallel Programm. C+ (1996) 547-588.
[79]
P. Runeson, M. Hst, A. Rainer, B. Regnell, Wiley & Sons Inc., Hoboken New Jersey, 2012.
[80]
K. Sangani, Computing-two good to be true-multicore microprocessors may be great for doing lots of things at once, but what about doing one thing more quickly?, Eng. Technol., 2 (2007) 40-43.
[81]
Scalasca, 2014. Scalasca: performance and optimisation tool. Available from: http://www.scalasca.org accessed January 2014.
[82]
C. Seaman, Qualitative methods in empirical studies of software engineering, IEEE Trans. Software Eng., 25 (1999) 557-572.
[83]
A. Seffah, E. Metzker, The obstacles and myths of usability and software engineering, Commun. ACM, 47 (Dec 2004) 71-76.
[84]
StemTrek, 2013. Stemtrek organization. Available from: http://www.stem-trek.org/ accessed April 2013.
[85]
H. Sutter, J. Larus, Software and the concurrency revolution, Queue, 3 (2005) 54-62.
[86]
W. Tichy, The multicore transformation opening statement: the multicore transformation (ubiquity symposium), Ubiquity 2014(May) (2014) 1:1-1:8.
[87]
C. Upson, T.A. Faulhaber, D. Kamins, D. Laidlaw, D. Schlegel, J. Vroom, R. Gurwitz, A. Van Dam, The application visualization system: a computational environment for scientific visualization, Comput. Graph. Appl. IEEE, 9 (1989) 30-42.
[88]
H. Vandierendonck, T. Mens, Averting the next software crisis, IEEE Comput., 44 (2011) 88-90.
[89]
R. Yin, Sage Publications, 2003.

Cited By

View all
  • (2021)Sylkan: Towards a Vulkan Compute Target Platform for SYCLProceedings of the 9th International Workshop on OpenCL10.1145/3456669.3456683(1-12)Online publication date: 27-Apr-2021
  • (2020)Programming languages for data-Intensive HPC applicationsParallel Computing10.1016/j.parco.2019.10258491:COnline publication date: 1-Mar-2020
  • (2020)Parallel multi-objective artificial bee colony algorithm for software requirement optimizationRequirements Engineering10.1007/s00766-020-00328-y25:3(363-380)Online publication date: 27-Jan-2020
  • Show More Cited By
  1. An empirical study of data decomposition for software parallelization

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Journal of Systems and Software
    Journal of Systems and Software  Volume 125, Issue C
    March 2017
    408 pages

    Publisher

    Elsevier Science Inc.

    United States

    Publication History

    Published: 01 March 2017

    Author Tags

    1. Data decomposition
    2. Empirical studies
    3. Parallelization

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Sylkan: Towards a Vulkan Compute Target Platform for SYCLProceedings of the 9th International Workshop on OpenCL10.1145/3456669.3456683(1-12)Online publication date: 27-Apr-2021
    • (2020)Programming languages for data-Intensive HPC applicationsParallel Computing10.1016/j.parco.2019.10258491:COnline publication date: 1-Mar-2020
    • (2020)Parallel multi-objective artificial bee colony algorithm for software requirement optimizationRequirements Engineering10.1007/s00766-020-00328-y25:3(363-380)Online publication date: 27-Jan-2020
    • (2019)Celerity: High-Level C++ for Accelerator ClustersEuro-Par 2019: Parallel Processing10.1007/978-3-030-29400-7_21(291-303)Online publication date: 26-Aug-2019
    • (2018)An Architecture for Translating Sequential Code to ParallelProceedings of the 2nd International Conference on Information System and Data Mining10.1145/3206098.3206104(88-92)Online publication date: 9-Apr-2018

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media