Abstract
Provenance has been considered as a means to achieve scientific workflow reproducibility to verify the workflow processes and results. Cloud computing provides a new computing paradigm for the workflow execution by offering a dynamic and scalable environment with on-demand resource provisioning. In the absence of Cloud infrastructure information, achieving workflow reproducibility on the Cloud becomes a challenge. This paper presents a framework, named ReCAP, to capture the Cloud infrastructure information and to interlink it with the workflow provenance to establish the Cloud-Aware Provenance (CAP). This paper identifies different scenarios of using the Cloud for workflow execution and presents different mapping approaches. The reproducibility of the workflow execution is performed by re-provisioning the similar Cloud resources using CAP and re-executing the workflow; and by comparing the outputs of workflows. Finally, this paper also presents the evaluation of ReCAP in terms of captured provenance, workflow execution time and workflow output comparison.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Mehmood, Y., Habib, I., Bloodsworth, P., Anjum, A., Lansdale, T., McClatchey, R.: A middleware agnostic infrastructure for neuro-imaging analysis. In: 22nd IEEE International Symposium on Computer-Based Medical Systems, CBMS 2009, pp. 1–4, August 2009
Munir, K., Kiani, S.L., Hasham, K., McClatchey, R., Branson, A., Shamdasani, J.: Provision of an integrated data analysis platform for computational neuroscience experiments. J. Syst. Inf. Technol. 16(3), 150–169 (2014)
Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-science: An overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25(5), 528–540 (2009)
Foster, I., Kesselman, C. (eds.): The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Mell, P. M., Grance, T.: Sp 800–145. The nist definition of cloud computing. Technical report, Gaithersburg, MD, United States (2011)
Deelman, E., Singh, G., Livny, M., Berriman, B., Good, J.: The cost of doing science on the cloud: the montage example. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC 2008. pp. 50:1–50:12. IEEE Press, USA (2008)
Juve, G., Deelman, E.: Scientific workflows and clouds. Crossroads 16(3), 14–18 (2010)
Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance in e-science. SIGMOD Rec. 34(3), 31–36 (2005)
Azarnoosh, S., Rynge, M., Juve, G., Deelman, E., Niec, M., Malawski, M., da Silva, R.: Introducing PRECIP: an API for managing repeatable experiments in the cloud. In: 2013 IEEE 5th International Conference on Cloud Computing Technology and Science (CloudCom), vol. 2, pp. 19–26, December 2013
Belhajjame, K., Roos, M., Garcia-Cuesta, E., Klyne, G., Zhao, J., De Roure, D., Goble, C., Gomez-Perez, J.M., Hettne, K., Garrido, A.: Why workflows break - understanding and combating decay in taverna workflows. In: Proceedings of the 2012 IEEE 8th International Conference on E-Science (e-Science), E-SCIENCE 2012, pp. 1–9. IEEE Computer Society, USA (2012)
Vouk, M.: Cloud computing - issues, research and implementations. In: 30th International Conference on Information Technology Interfaces, ITI 2008, pp. 31–40, June 2008
Zhao, Y., Fei, X., Raicu, I., Lu, S.: Opportunities and challenges in running scientific workflows on the cloud. In: 2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), pp. 455–462, October 2011
Shamdasani, J., Branson, A., McClatchey, R.: Towards semantic provenance in cristal. In: Third International Workshop on the Role of Semantic Web in Provenance Management (SWPM 2012) (2012)
Stevens, R.D., Robinson, A.J., Goble, C.A.: myGrid: personalised bioinformatics on the information grid. Bioinformatics 19, i302–i304 (2003)
de Oliveira, D., Ogasawara, E., Baiao, F., Mattoso, M.: Scicumulus: a lightweight cloud middleware to explore many task computing paradigm in scientific workflows. In: 2010 IEEE 3rd International Conference on Cloud Computing (CLOUD), pp. 378–385, July 2010
Ko, R.K.L., Lee, B.S., Pearson, S.: Towards achieving accountability, auditability and trust in cloud computing. In: Abraham, A., Mauri, J.L., Buford, J.F., Suzuki, J., Thampi, S.M. (eds.) ACC 2011, Part IV. CCIS, vol. 193, pp. 432–444. Springer, Heidelberg (2011)
Foster, I., Vöckler, J., Wilde, M., Zhao, Y.: Chimera: a virtual data system for representing, querying, and automating data derivation. In: Proceedings of the 14th International Conference on Scientific and Statistical Database Management, pp. 37–46 (2002)
Scheidegger, C., Koop, D., Santos, E., Vo, H., Callahan, S., Freire, J., Silva, C.: Tackling the provenance challenge one layer at a time. Concurr. Comput.: Pract. Exper. 20(5), 473–483 (2008)
Kim, J., Deelman, E., Gil, Y., Mehta, G., Ratnakar, V.: Provenance trails in the wings-pegasus system. Concurr. Comput.: Pract. Exper. 20(5), 587–597 (2008)
Zhang, O.Q., Kirchberg, M., Ko, R.K., Lee, B.S.: How to track your data: the case for cloud computing provenance. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), pp. 446–453. IEEE (2011)
Tan, Y.S., Ko, R.K., Jagadpramana, P., Suen, C.H., Kirchberg, M., Lim, T.H., Lee, B.S., Singla, A., Mermoud, K., Keller, D., Duc, H.: Tracking of data leaving the cloud. In: 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, pp. 137–144 (2012)
Macko, P., Chiarini, M., Seltzer, M.: Collecting provenance via the xen hypervisor. In: 3rd USENIX Workshop on the Theory and Practice of Provenance (TAPP) (2011)
Chirigati, F., Shasha, D., Freire, J.: Reprozip: using provenance to support computational reproducibility. In: Proceedings of the 5th USENIX Workshop on the Theory and Practice of Provenance, TaPP 2013, pp. 1:1–1:4. USENIX Association, Berkeley (2013)
Janin, Y., Vincent, C., Duraffort, R.: Care, the comprehensive archiver for reproducible execution. In: Proceedings of the 1st ACM SIGPLAN Workshop on Reproducible Research Methodologies and New Publication Models in Computer Engineering, TRUST 2014, pp. 1:1–1:7. ACM, New York (2014)
Santana-Perez, I., Ferreira da Silva, R., Rynge, M., Deelman, E., Pérez-Hernández, M.S., Corcho, O.: A semantic-based approach to attain reproducibility of computational environments in scientific workflows: a case study. In: Lopes, L., et al. (eds.) Euro-Par 2014, Part I. LNCS, vol. 8805, pp. 452–463. Springer, Heidelberg (2014)
Sandve, G.K., Nekrutenko, A., Taylor, J., Hovig, E.: Ten simple rules for reproducible computational research. PLoS Comput. Biol. 9(10), e1003285 (2013)
Stodden, V.C.: Reproducible research: addressing the need for data and code sharing in computational science. Comput. Sci. Eng. 12, 8–12 (2010)
Santana-Perez, I., Ferreira da Silva, R., Rynge, M., Deelman, E., Perez-Hernandez, M.S., Corcho, O.: Leveraging semantics to improve reproducibility in scientific workflows. In: The Reproducibility at XSEDE Workshop (2014)
Vöckler, J.S., Juve, G., Deelman, E., Rynge, M., Berriman, B.: Experiences using cloud computing for a scientific workflow application. In: Proceedings of the 2nd International Workshop on Scientific Cloud Computing, ScienceCloud 2011, pp. 15–24. ACM, USA (2011)
Howe, B.: Virtual appliances, cloud computing, and reproducible research. Comput. Sci. Eng. 14(4), 36–41 (2012)
Zhao, Y., Li, Y., Raicu, I., Lu, S., Tian, W., Liu, H.: Enabling scalable scientific workflow management in the cloud. Future Gener. Comput. Syst. 46, 3–16 (2014)
Lifschitz, S., Gomes, L., Rehen, S. K.: Dealing with reusability and reproducibility for scientific workflows. In: 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW), pp. 625–632. IEEE (2011)
Missier, P., Woodman, S., Hiden, H., Watson, P.: Provenance and data differencing for workflow reproducibility analysis. Concurr. Comput.: Pract. Exp. (2013)
Abrishami, S., Naghibzadeh, M., Epema, D.H.: Deadline-constrained workflow scheduling algorithms for infrastructure as a service clouds. Future Gener. Comput. Syst. 29(1), 158–169 (2013). Including Special section: AIRCC-NetCoM 2009 and Special section: Clouds and Service-Oriented Architectures
Malawski, M., Juve, G., Deelman, E., Nabrzyski, J.: Algorithms for cost- and deadline-constrained provisioning for scientific workflow ensembles in iaas clouds. Future Gener. Comput. Syst. 48, 1–18 (2015). Special Section, Business and Industry Specific Cloud
Woodman, S., Hiden, H., Watson, P., Missier, P.: Achieving reproducibility by combining provenance with service and workflow versioning. In: Proceedings of the 6th Workshop on Workflows in Support of Large-scale Science, WORKS 2011, pp. 127–136. ACM, USA (2011)
Groth, P., Deelman, E., Juve, G., Mehta, G., Berriman, B.: Pipeline-centric provenance model. In: Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, WORKS 2009, pp. 4:1–4:8. ACM, USA (2009)
Horta, F., Silva, V., Costa, F., de Oliveira, D., Ocaña, K., Ogasawara, E., Dias, J., Mattoso, M.: Provenance traces from chiron parallel workflow engine. In: Proceedings of the Joint EDBT/ICDT 2013 Workshops, EDBT 2013, pp. 337–338. ACM, New York (2013)
Tannenbaum, T., Wright, D., Miller, K., Livny, M.: Beowulf Cluster Computing with Linux, pp. 307–350. MIT Press, Cambridge (2002)
Latchoumy, P., Khader, P.S.A.: Survey on fault tolerance in grid computing. Int. J. Comput. Sci. & Eng. Surv. (IJCSES) 2 (2011)
Stallings, W.: Cryptography and Network Security: Principles and Practice, 5th edn. Prentice Hall Press, Upper Saddle River (2010)
Ramakrishnan, L., Plale, B.: A multi-dimensional classification model for scientific workflow characteristics. In: Proceedings of the 1st International Workshop on Workflow Approaches to New Data-Centric Science, Wands 2010, pp. 4:1–4:12. ACM, USA (2010)
Acknowledgements
This research work has been funded by a European Union FP-7 project, N4U neuGrid4Users (grant agreement n. 283562, 2011-2014). Besides this, the support provided by OSDC by offering a free Cloud infrastructure of 20 cores is highly appreciated.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Hasham, K., Munir, K., McClatchey, R., Shamdasani, J. (2016). Re-provisioning of Cloud-Based Execution Infrastructure Using the Cloud-Aware Provenance to Facilitate Scientific Workflow Execution Reproducibility. In: Helfert, M., Méndez Muñoz, V., Ferguson, D. (eds) Cloud Computing and Services Science. CLOSER 2015. Communications in Computer and Information Science, vol 581. Springer, Cham. https://doi.org/10.1007/978-3-319-29582-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-29582-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29581-7
Online ISBN: 978-3-319-29582-4
eBook Packages: Computer ScienceComputer Science (R0)