[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Swift: A language for distributed parallel scripting

Published: 01 September 2011 Publication History

Abstract

Scientists, engineers, and statisticians must execute domain-specific application programs many times on large collections of file-based data. This activity requires complex orchestration and data management as data is passed to, from, and among application invocations. Distributed and parallel computing resources can accelerate such processing, but their use further increases programming complexity. The Swift parallel scripting language reduces these complexities by making file system structures accessible via language constructs and by allowing ordinary application programs to be composed into powerful parallel scripts that can efficiently utilize parallel and distributed resources. We present Swift's implicitly parallel and deterministic programming model, which applies external applications to file collections using a functional style that abstracts and simplifies distributed parallel execution.

References

[1]
Haskell 98 Language and Libraries - The Revised Report, Internet document (2002). URL <http://haskell.org/onlinereport/haskell.html>.
[2]
Baker Jr., H.C. and Hewitt, C., The incremental garbage collection of processes. In: Proceedings of the 1977 Symposium on Artificial Intelligence and Programming Languages, ACM, New York. pp. 55-59.
[3]
Birrell, A.D. and Nelson, B.J., Implementing remote procedure calls. ACM Trans. Comput. Syst. v2 i1. 39-59.
[4]
Y. Zhao, M. Hategan, B. Clifford, I. Foster, G. von Laszewski, V. Nefedova, I. Raicu, T. Stef-Praun, M. Wilde, Swift: fast, reliable, loosely coupled parallel computation, in: 2007 IEEE Congress on Services, 2007, pp. 199 -206.
[5]
ImageMagick project web site (2010). URL <http://www.imagemagick.org>.
[6]
B.-D. Kim, J.E. Cazes, Performance and scalability study of Sun Constellation cluster 'Ranger' using application-based benchmarks, in: Proceedings of TeraGrid'2008, 2008.
[7]
IBM Blue Gene team, Overview of the IBM Blue Gene/P project, IBM J. Res. Dev. 52, 2008, pp. 199-220. URL <http://portal.acm.org/citation.cfm?id=1375990.1376008>.
[8]
von Laszewski, G., Hategan, M. and Kodeboyina, D., Java CoG kit workflow. In: Taylor, I., Deelman, E., Gannon, D., Shields, M. (Eds.), Workflows for e-Science, Springer. pp. 341-356.
[9]
Foster, I. and Kesselman, C., Globus: a metacomputing infrastructure toolkit. J. Supercomput. Appl. v11. 115-128.
[10]
Czajkowski, K., Foster, I., Karonis, N., Kesselman, C., Martin, S., Smith, W. and Tuecke, S., A resource management architecture for metacomputing systems. In: Feitelson, D., Rudolph, L. (Eds.), Lecture Notes in Computer Science, vol. 1459. Springer, Berlin. pp. 62-82.
[11]
Allcock, W., Bresnahan, J., Kettimuthu, R., Link, M., Dumitrescu, C., Raicu, I. and Foster, I., The Globus striped GridFTP framework and server. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC '05, IEEE Computer Society, Washington, DC. pp. 54
[12]
D. Thain, M. Livny, The ethernet approach to grid computing, in: Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing, HPDC '03, IEEE Computer Society, Washington, DC, USA, 2003, pp. 138-151. URL http://portal.acm.org/citation.cfm?id=822087.823417.
[13]
Dean, J. and Ghemawat, S., MapReduce: simplified data processing on large clusters. Commun. ACM. v51. 107-113.
[14]
T. Armstrong, M. Wilde, D. Katz, Z. Zhang, I. Foster, Scheduling many-task workloads on supercomputers: dealing with trailing tasks, in: MTAGS 2010: 3rd IEEE Workshop on Many-Task Computing on Grids and Supercomputers, 2010.
[15]
M. Hategan, http://wiki.cogkit.org/wiki/Coasters.
[16]
Frey, J., Tannenbaum, T., Livny, M., Foster, I. and Tuecke, S., Condor-G: a computation management agent for multi-institutional grids. Cluster Comput. v5. 237-246.
[17]
Beckman, P.H., Building the TeraGrid. Philos. Trans. Roy. Soc. A. v363 i1833. 1715-1728.
[18]
Pordes, R., Petravick, D., Kramer, B., Olson, D., Livny, M., Roy, A., Avery, P., Blackburn, K., Wenaus, T., Würthwein, F., Foster, I., Gardner, R., Wilde, M., Blatecky, A., McGee, J. and Quick, R., The open science grid. J. Phys.: Conf. Ser. v78 i1. 012057
[19]
Garzoglio, G., Levshina, T., Mhashilkar, P. and Timm, S., ReSS: a resource selection service for the open science grid. In: Lin, S.C., Yen, E. (Eds.), Grid Computing, Springer, N.Y. pp. 89-98.
[20]
Wilde, M., Foster, I., Iskra, K., Beckman, P., Zhang, Z., Espinosa, A., Hategan, M., Clifford, B. and Raicu, I., Parallel scripting for applications at the petascale and beyond. Computer. v42 i11. 50-60.
[21]
G. Hocky, M. Wilde, J. DeBartolo, M. Hategan, I. Foster, T.R. Sosnick, K.F. Freed, Towards petascale ab initio protein folding through parallel scripting, Technical Report, ANL/MCS-P1612-0409, Argonne National Laboratory, April 2009.
[22]
DeBartolo, J., Hocky, G., Wilde, M., Xu, J., Freed, K.F. and Sosnick, T.R., Protein structure prediction enhanced with evolutionary diversity: speed. Protein Sci. v19 i3. 520-534.
[23]
I. Raicu, Z. Zhang, M. Wilde, I. Foster, P. Beckman, K. Iskra, B. Clifford, Toward loosely coupled programming on petascale systems, in: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC '08, IEEE Press, Piscataway, NJ, USA, 2008, pp. 22:1-22:12. URL <http://portal.acm.org/citation.cfm?id=1413370.1413393>.
[24]
Lee, S., Chen, Y., Luo, H., Wu, A.A., Wilde, M., Schumacker, P.T. and Zhao, Y., The first global screening of protein substrates bearing protein-bound 3,4-dihydroxyphenylalanine in Escherichia coli and human mitochondria. J. Proteome Res. v9 i11. 5705-5714.
[25]
T. Stef-Praun, G. Madeira, I. Foster, R. Townsend, Accelerating solution of a moral hazard problem with Swift, in: e-Social Science 2007, Indianapolis, 2007.
[26]
Stef-Praun, T., Clifford, B., Foster, I., Hasson, U., Hategan, M., Small, S.L., Wilde, M. and Zhao, Y., Accelerating medical research using the Swift workflow system. Stud. Health Technol. Inf. v126. 207-216.
[27]
Hasson, U., Skipper, J.I., Wilde, M.J., Nusbaum, H.C., Small, S.L. and analysis, Improving the, storage and sharing of neuroimaging data using relational databases and distributed computing. NeuroImage. v39 i2. 693-706.
[28]
Kenny, S., Andric, M., M, S.B., Neale, M., Wilde, M. and Small, S.L., Parallel workflows for data-driven structural equation modeling in functional neuroimaging. Front. Neuroinform. v3 i34.
[29]
S. Boker, M. Neale, H. Maes, M. Wilde, M. Spiegel, T. Brick, J. Spies, R. Estabrook, S. Kenny, T. Bates, P. Mehta, J. Fox, OpenMx: An open source extended structural equation modeling framework, Psychometrika, in press.
[30]
A. Fedorov, B. Clifford, S.K. Wareld, R. Kikinis, N. Chrisochoides, Non-rigid Registration for Image-guided Neurosurgery on the TeraGrid: A case study, Technical Report WM-CS-2009-05, College of William and Mary, 2009.
[31]
Biroli, G., Bouchaud, J.P., Cavagna, A., Grigera, T.S. and Verrocchio, P., Thermodynamic signature of growing amorphous order in glass-forming liquids. Nature Phys. v4. 771-775.
[32]
Ousterhout, J., Scripting: Higher level programming for the 21st century. Computer. v31 i3. 23-30.
[33]
Ahuja, S., Carriero, N. and Gelernter, D., Linda and Friends. IEEE Comput. v19 i8. 26-34.
[34]
I. Foster, S. Taylor, Strand: A practical parallel programming language, in: Proceedings of the North American Conference on Logic Programming, 1989, pp. 497-512.
[35]
Foster, I., Olson, R. and Tuecke, S., Productive parallel programming: The PCN approach. Sci. Program. v1. 51-66.
[36]
Pike, R., Dorward, S., Griesemer, R. and Quinlan, S., Interpreting the data: Parallel analysis with Sawzall. Sci. Prog. v13 i4. 277-298.
[37]
C. Chambers, A. Raniwala, F. Perry, S. Adams, R.R. Henry, R. Bradshaw, N. Weizenbaum, FlumeJava: Easy, efficient data-parallel pipelines, in: Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '10, ACM, New York, NY, USA, 2010, pp. 363-375. URL <http://doi.acm.org/10.1145/1806596.1806638>.
[38]
Juric, M.B., Business Process Execution Language for Web Services. 2006. Packt Publishing.
[39]
Wassermann, B., Emmerich, W., Butchart, B., Cameron, N., Chen, L. and Patel, J., Sedna: A BPEL-based environment for visual scientific workflow modeling. In: Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (Eds.), Workflows for e-Science, Springer, London. pp. 428-449.
[40]
Thain, D., Tannenbaum, T. and Livny, M., Distributed computing in practice: The Condor experience. Concurrency Comput.: Pract. Exp. v17 i2-4. 323-356.
[41]
Deelman, E., Singh, G., Su, M.-H., Blythe, J., Gila, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A., Jacob, J.C. and Katz, D.S., Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Prog. v13. 219-237.
[42]
M. Isard, M. Budiu, Y. Yu, A. Birrell, D. Fetterly, Dryad: Distributed data-parallel programs from sequential building blocks, in: Proceedings of European Conference on Computer Systems (EuroSys), 2007.
[43]
Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P.K. Gunda, J. Currey, DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language, in: Proceedings of Symposium on Operating System Design and Implementation (OSDI), 2008.
[44]
Ching Lian, C., Tang, F., Issac, P. and Krishnan, A., Gel: Grid execution language. J. Parallel Distrib. Comput. v65. 857-869.
[45]
E. Walker, W. Xu, V. Chandar, Composing and executing parallel data-flow graphs with shell pipes, in: Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, WORKS '09, ACM, New York, 2009, pp. 11:1-11:10. URL <http://doi.acm.org/10.1145/1645164.1645175>.
[46]
K. Taura, T. Matsuzaki, M. Miwa, Y. Kamoshida, D. Yokoyama, N. Dun, T. Shibata, C.S. Jun, J. Tsujii, Design and implementation of GXP make - a workflow system based on make, in: Proceedings of IEEE International Conference on eScience, IEEE Computer Society, Los Alamitos, CA, 2010, pp. 214-221.
[47]
Yu, L., Moretti, C., Thrasher, A., Emrich, S., Judd, K. and Thain, D., Harnessing parallelism in multicore clusters with the all-pairs, wavefront, and makeflow abstractions. Cluster Comput. v13. 243-256.
[48]
J.M. Wozniak, M. Wilde, Case studies in storage access by loosely coupled petascale applications, in: Proceedings of the 4th Annual Workshop on Petascale Data Storage, PDSW '09, ACM, New York, 2009, pp. 16-20. URL <http://doi.acm.org/10.1145/1713072.1713078>.

Cited By

View all
  • (2024)Scientific workflow execution in the cloud using a dynamic runtime modelSoftware and Systems Modeling (SoSyM)10.1007/s10270-023-01112-623:1(163-193)Online publication date: 1-Feb-2024
  • (2023)Orchestration of materials science workflows for heterogeneous resources at large scaleInternational Journal of High Performance Computing Applications10.1177/1094342023116780037:3-4(260-271)Online publication date: 1-Jul-2023
  • (2023)Interactive Privacy Management: Toward Enhancing Privacy Awareness and Control in the Internet of ThingsACM Transactions on Internet of Things10.1145/36000964:3(1-34)Online publication date: 21-Sep-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Parallel Computing
Parallel Computing  Volume 37, Issue 9
September, 2011
155 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 September 2011

Author Tags

  1. Dataflow
  2. Parallel programming
  3. Scripting
  4. Swift

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Scientific workflow execution in the cloud using a dynamic runtime modelSoftware and Systems Modeling (SoSyM)10.1007/s10270-023-01112-623:1(163-193)Online publication date: 1-Feb-2024
  • (2023)Orchestration of materials science workflows for heterogeneous resources at large scaleInternational Journal of High Performance Computing Applications10.1177/1094342023116780037:3-4(260-271)Online publication date: 1-Jul-2023
  • (2023)Interactive Privacy Management: Toward Enhancing Privacy Awareness and Control in the Internet of ThingsACM Transactions on Internet of Things10.1145/36000964:3(1-34)Online publication date: 21-Sep-2023
  • (2023)Data Flow Lifecycles for Optimizing Workflow CoordinationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607104(1-15)Online publication date: 12-Nov-2023
  • (2023)Accelerating Communications in Federated Applications with Transparent Object ProxiesProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607047(1-15)Online publication date: 12-Nov-2023
  • (2023)QASMBench: A Low-Level Quantum Benchmark Suite for NISQ Evaluation and SimulationACM Transactions on Quantum Computing10.1145/35504884:2(1-26)Online publication date: 24-Feb-2023
  • (2023)Globus automation servicesFuture Generation Computer Systems10.1016/j.future.2023.01.010142:C(393-409)Online publication date: 1-May-2023
  • (2022)The Exascale Framework for High Fidelity coupled Simulations (EFFIS)International Journal of High Performance Computing Applications10.1177/1094342021101911936:1(106-128)Online publication date: 1-Jan-2022
  • (2022)IRDL: an IR definition language for SSA compilersProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523700(199-212)Online publication date: 9-Jun-2022
  • (2022)Enabling dynamic and intelligent workflows for HPC, data analytics, and AI convergenceFuture Generation Computer Systems10.1016/j.future.2022.04.014134:C(414-429)Online publication date: 1-Sep-2022
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media