[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Re-provisioning of Cloud-Based Execution Infrastructure Using the Cloud-Aware Provenance to Facilitate Scientific Workflow Execution Reproducibility

  • Conference paper
  • First Online:
Cloud Computing and Services Science (CLOSER 2015)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 581))

Included in the following conference series:

Abstract

Provenance has been considered as a means to achieve scientific workflow reproducibility to verify the workflow processes and results. Cloud computing provides a new computing paradigm for the workflow execution by offering a dynamic and scalable environment with on-demand resource provisioning. In the absence of Cloud infrastructure information, achieving workflow reproducibility on the Cloud becomes a challenge. This paper presents a framework, named ReCAP, to capture the Cloud infrastructure information and to interlink it with the workflow provenance to establish the Cloud-Aware Provenance (CAP). This paper identifies different scenarios of using the Cloud for workflow execution and presents different mapping approaches. The reproducibility of the workflow execution is performed by re-provisioning the similar Cloud resources using CAP and re-executing the workflow; and by comparing the outputs of workflows. Finally, this paper also presents the evaluation of ReCAP in terms of captured provenance, workflow execution time and workflow output comparison.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://lhc.web.cern.ch.

  2. 2.

    http://www.phys.utb.edu/griphyn/.

  3. 3.

    http://libcloud.apache.org.

References

  1. Mehmood, Y., Habib, I., Bloodsworth, P., Anjum, A., Lansdale, T., McClatchey, R.: A middleware agnostic infrastructure for neuro-imaging analysis. In: 22nd IEEE International Symposium on Computer-Based Medical Systems, CBMS 2009, pp. 1–4, August 2009

    Google Scholar 

  2. Munir, K., Kiani, S.L., Hasham, K., McClatchey, R., Branson, A., Shamdasani, J.: Provision of an integrated data analysis platform for computational neuroscience experiments. J. Syst. Inf. Technol. 16(3), 150–169 (2014)

    Article  Google Scholar 

  3. Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-science: An overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25(5), 528–540 (2009)

    Article  Google Scholar 

  4. Foster, I., Kesselman, C. (eds.): The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann Publishers Inc., San Francisco (1999)

    Google Scholar 

  5. Mell, P. M., Grance, T.: Sp 800–145. The nist definition of cloud computing. Technical report, Gaithersburg, MD, United States (2011)

    Google Scholar 

  6. Deelman, E., Singh, G., Livny, M., Berriman, B., Good, J.: The cost of doing science on the cloud: the montage example. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC 2008. pp. 50:1–50:12. IEEE Press, USA (2008)

    Google Scholar 

  7. Juve, G., Deelman, E.: Scientific workflows and clouds. Crossroads 16(3), 14–18 (2010)

    Article  Google Scholar 

  8. Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance in e-science. SIGMOD Rec. 34(3), 31–36 (2005)

    Article  Google Scholar 

  9. Azarnoosh, S., Rynge, M., Juve, G., Deelman, E., Niec, M., Malawski, M., da Silva, R.: Introducing PRECIP: an API for managing repeatable experiments in the cloud. In: 2013 IEEE 5th International Conference on Cloud Computing Technology and Science (CloudCom), vol. 2, pp. 19–26, December 2013

    Google Scholar 

  10. Belhajjame, K., Roos, M., Garcia-Cuesta, E., Klyne, G., Zhao, J., De Roure, D., Goble, C., Gomez-Perez, J.M., Hettne, K., Garrido, A.: Why workflows break - understanding and combating decay in taverna workflows. In: Proceedings of the 2012 IEEE 8th International Conference on E-Science (e-Science), E-SCIENCE 2012, pp. 1–9. IEEE Computer Society, USA (2012)

    Google Scholar 

  11. Vouk, M.: Cloud computing - issues, research and implementations. In: 30th International Conference on Information Technology Interfaces, ITI 2008, pp. 31–40, June 2008

    Google Scholar 

  12. Zhao, Y., Fei, X., Raicu, I., Lu, S.: Opportunities and challenges in running scientific workflows on the cloud. In: 2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), pp. 455–462, October 2011

    Google Scholar 

  13. Shamdasani, J., Branson, A., McClatchey, R.: Towards semantic provenance in cristal. In: Third International Workshop on the Role of Semantic Web in Provenance Management (SWPM 2012) (2012)

    Google Scholar 

  14. Stevens, R.D., Robinson, A.J., Goble, C.A.: myGrid: personalised bioinformatics on the information grid. Bioinformatics 19, i302–i304 (2003)

    Article  Google Scholar 

  15. de Oliveira, D., Ogasawara, E., Baiao, F., Mattoso, M.: Scicumulus: a lightweight cloud middleware to explore many task computing paradigm in scientific workflows. In: 2010 IEEE 3rd International Conference on Cloud Computing (CLOUD), pp. 378–385, July 2010

    Google Scholar 

  16. Ko, R.K.L., Lee, B.S., Pearson, S.: Towards achieving accountability, auditability and trust in cloud computing. In: Abraham, A., Mauri, J.L., Buford, J.F., Suzuki, J., Thampi, S.M. (eds.) ACC 2011, Part IV. CCIS, vol. 193, pp. 432–444. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  17. Foster, I., Vöckler, J., Wilde, M., Zhao, Y.: Chimera: a virtual data system for representing, querying, and automating data derivation. In: Proceedings of the 14th International Conference on Scientific and Statistical Database Management, pp. 37–46 (2002)

    Google Scholar 

  18. Scheidegger, C., Koop, D., Santos, E., Vo, H., Callahan, S., Freire, J., Silva, C.: Tackling the provenance challenge one layer at a time. Concurr. Comput.: Pract. Exper. 20(5), 473–483 (2008)

    Article  Google Scholar 

  19. Kim, J., Deelman, E., Gil, Y., Mehta, G., Ratnakar, V.: Provenance trails in the wings-pegasus system. Concurr. Comput.: Pract. Exper. 20(5), 587–597 (2008)

    Article  Google Scholar 

  20. Zhang, O.Q., Kirchberg, M., Ko, R.K., Lee, B.S.: How to track your data: the case for cloud computing provenance. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), pp. 446–453. IEEE (2011)

    Google Scholar 

  21. Tan, Y.S., Ko, R.K., Jagadpramana, P., Suen, C.H., Kirchberg, M., Lim, T.H., Lee, B.S., Singla, A., Mermoud, K., Keller, D., Duc, H.: Tracking of data leaving the cloud. In: 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, pp. 137–144 (2012)

    Google Scholar 

  22. Macko, P., Chiarini, M., Seltzer, M.: Collecting provenance via the xen hypervisor. In: 3rd USENIX Workshop on the Theory and Practice of Provenance (TAPP) (2011)

    Google Scholar 

  23. Chirigati, F., Shasha, D., Freire, J.: Reprozip: using provenance to support computational reproducibility. In: Proceedings of the 5th USENIX Workshop on the Theory and Practice of Provenance, TaPP 2013, pp. 1:1–1:4. USENIX Association, Berkeley (2013)

    Google Scholar 

  24. Janin, Y., Vincent, C., Duraffort, R.: Care, the comprehensive archiver for reproducible execution. In: Proceedings of the 1st ACM SIGPLAN Workshop on Reproducible Research Methodologies and New Publication Models in Computer Engineering, TRUST 2014, pp. 1:1–1:7. ACM, New York (2014)

    Google Scholar 

  25. Santana-Perez, I., Ferreira da Silva, R., Rynge, M., Deelman, E., Pérez-Hernández, M.S., Corcho, O.: A semantic-based approach to attain reproducibility of computational environments in scientific workflows: a case study. In: Lopes, L., et al. (eds.) Euro-Par 2014, Part I. LNCS, vol. 8805, pp. 452–463. Springer, Heidelberg (2014)

    Google Scholar 

  26. Sandve, G.K., Nekrutenko, A., Taylor, J., Hovig, E.: Ten simple rules for reproducible computational research. PLoS Comput. Biol. 9(10), e1003285 (2013)

    Article  Google Scholar 

  27. Stodden, V.C.: Reproducible research: addressing the need for data and code sharing in computational science. Comput. Sci. Eng. 12, 8–12 (2010)

    Google Scholar 

  28. Santana-Perez, I., Ferreira da Silva, R., Rynge, M., Deelman, E., Perez-Hernandez, M.S., Corcho, O.: Leveraging semantics to improve reproducibility in scientific workflows. In: The Reproducibility at XSEDE Workshop (2014)

    Google Scholar 

  29. Vöckler, J.S., Juve, G., Deelman, E., Rynge, M., Berriman, B.: Experiences using cloud computing for a scientific workflow application. In: Proceedings of the 2nd International Workshop on Scientific Cloud Computing, ScienceCloud 2011, pp. 15–24. ACM, USA (2011)

    Google Scholar 

  30. Howe, B.: Virtual appliances, cloud computing, and reproducible research. Comput. Sci. Eng. 14(4), 36–41 (2012)

    Article  MathSciNet  Google Scholar 

  31. Zhao, Y., Li, Y., Raicu, I., Lu, S., Tian, W., Liu, H.: Enabling scalable scientific workflow management in the cloud. Future Gener. Comput. Syst. 46, 3–16 (2014)

    Article  Google Scholar 

  32. Lifschitz, S., Gomes, L., Rehen, S. K.: Dealing with reusability and reproducibility for scientific workflows. In: 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW), pp. 625–632. IEEE (2011)

    Google Scholar 

  33. Missier, P., Woodman, S., Hiden, H., Watson, P.: Provenance and data differencing for workflow reproducibility analysis. Concurr. Comput.: Pract. Exp. (2013)

    Google Scholar 

  34. Abrishami, S., Naghibzadeh, M., Epema, D.H.: Deadline-constrained workflow scheduling algorithms for infrastructure as a service clouds. Future Gener. Comput. Syst. 29(1), 158–169 (2013). Including Special section: AIRCC-NetCoM 2009 and Special section: Clouds and Service-Oriented Architectures

    Article  Google Scholar 

  35. Malawski, M., Juve, G., Deelman, E., Nabrzyski, J.: Algorithms for cost- and deadline-constrained provisioning for scientific workflow ensembles in iaas clouds. Future Gener. Comput. Syst. 48, 1–18 (2015). Special Section, Business and Industry Specific Cloud

    Article  Google Scholar 

  36. Woodman, S., Hiden, H., Watson, P., Missier, P.: Achieving reproducibility by combining provenance with service and workflow versioning. In: Proceedings of the 6th Workshop on Workflows in Support of Large-scale Science, WORKS 2011, pp. 127–136. ACM, USA (2011)

    Google Scholar 

  37. Groth, P., Deelman, E., Juve, G., Mehta, G., Berriman, B.: Pipeline-centric provenance model. In: Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, WORKS 2009, pp. 4:1–4:8. ACM, USA (2009)

    Google Scholar 

  38. Horta, F., Silva, V., Costa, F., de Oliveira, D., Ocaña, K., Ogasawara, E., Dias, J., Mattoso, M.: Provenance traces from chiron parallel workflow engine. In: Proceedings of the Joint EDBT/ICDT 2013 Workshops, EDBT 2013, pp. 337–338. ACM, New York (2013)

    Google Scholar 

  39. Tannenbaum, T., Wright, D., Miller, K., Livny, M.: Beowulf Cluster Computing with Linux, pp. 307–350. MIT Press, Cambridge (2002)

    Google Scholar 

  40. Latchoumy, P., Khader, P.S.A.: Survey on fault tolerance in grid computing. Int. J. Comput. Sci. & Eng. Surv. (IJCSES) 2 (2011)

    Google Scholar 

  41. Stallings, W.: Cryptography and Network Security: Principles and Practice, 5th edn. Prentice Hall Press, Upper Saddle River (2010)

    Google Scholar 

  42. Ramakrishnan, L., Plale, B.: A multi-dimensional classification model for scientific workflow characteristics. In: Proceedings of the 1st International Workshop on Workflow Approaches to New Data-Centric Science, Wands 2010, pp. 4:1–4:12. ACM, USA (2010)

    Google Scholar 

Download references

Acknowledgements

This research work has been funded by a European Union FP-7 project, N4U neuGrid4Users (grant agreement n. 283562, 2011-2014). Besides this, the support provided by OSDC by offering a free Cloud infrastructure of 20 cores is highly appreciated.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khawar Hasham .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Hasham, K., Munir, K., McClatchey, R., Shamdasani, J. (2016). Re-provisioning of Cloud-Based Execution Infrastructure Using the Cloud-Aware Provenance to Facilitate Scientific Workflow Execution Reproducibility. In: Helfert, M., Méndez Muñoz, V., Ferguson, D. (eds) Cloud Computing and Services Science. CLOSER 2015. Communications in Computer and Information Science, vol 581. Springer, Cham. https://doi.org/10.1007/978-3-319-29582-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-29582-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-29581-7

  • Online ISBN: 978-3-319-29582-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics