[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Risk-aware intermediate dataset backup strategy in cloud-based data intensive workflows

Published: 01 February 2016 Publication History

Abstract

Data-intensive workflows are generally computing- and data-intensive with large volume of data generated during their execution. Therefore, some of the data should be saved to avoid the expensive re-execution of tasks in case of exceptions. However, cloud-based data storage services come at some expense. In this paper, we introduce the risk evaluation model tailored for workflow structure to measure and achieve the trade-off between the overhead of backup storage and the cost of data regeneration in failure, making the service selection and execution more efficient and robust. The proposed method computes and compares the potential loss with and without data backup to achieve the trade-off between overhead of intermediate dataset backup and task re-execution after exceptions. We also design the utility function with the model and apply a genetic algorithm to find the optimized schedule. The results show that the robustness of the schedule is increased while the possible risk of failure is minimized, especially when the volume of generated data is not large in comparison with the input. Introduce the risk evaluation model for workflow to measure potential loss.Propose the intermediate dataset backup strategy.Achieve tradeoff between the overhead of backup and re-execution after exceptions.Apply a genetic algorithm to find reliable and cost-effective selection of services.Compares the potential loss with and without our data backup strategy.

References

[1]
F. Magoules, J. Pan, F. Teng, Cloud Computing: Data-Intensive Computing and Scheduling, Chapman & Hall/CRC, 2012.
[2]
C. Hoffa, G. Mehta, T. Freeman, E. Deelman, K. Keahey, B. Berriman, J. Good, On the use of cloud computing for scientific workflows, in: ESCIENCE'08: Proceedings of the 2008 Fourth IEEE International Conference on eScience, IEEE Computer Society, Washington, DC, USA, 2008, pp. 640-645.
[3]
S. Pettifer, J. Ison, M. Kalas, D. Thorne, P. McDermott, I. Jonassen, A. Liaquat, J.M. Fernandez, J.M. Rodriguez, I. Partners, D.G. Pisano, C. Blanchet, M. Uludag, P. Rice, E. Bartaseviciute, K. Rapacki, M. Hekkelman, O. Sand, H. Stockinger, A.B. Clegg, E. Bongcam-Rudloff, J. Salzemann, V. Breton, T.K. Attwood, G. Cameron, G. Vriend, The embrace web service collection, Nucl. Acids Res., 38 (2010) W683-W688.
[4]
M. Wang, K. Ramamohanarao, J. Chen, Dependency-based risk evaluation for robust workflow scheduling, in: 2012 Workshop on Large Scale Distributed Service-Oriented Systems (In conjunction with 26th IEEE International Parallel & Distributed Processing Symposium), 2012.
[5]
M. Rahman, R. Ranjan, R. Buyya, Reputation-based dependable scheduling of workflow applications in peer-to-peer grids, Comput. Netw., 54 (2010) 3341-3359.
[6]
M. Wang, K. Ramamohanarao, J. Chen, Trust-based robust scheduling and runtime adaptation of scientific workflow, Concurr. Comput.: Pract. Exper., 21 (2009) 1982-1998.
[7]
R. Duan, R. Prodan, T. Fahringer, DEE: A distributed fault tolerant workflow enactment engine for grid computing, in: Lecture Notes in Computer Science, Vol. 3726, Springer, Berlin, Heidelberg, 2005, pp. 704-716.
[8]
D. Yuan, Y. Yang, X. Liu, G. Zhang, J. Chen, A data dependency based strategy for intermediate data storage in scientific cloud workflow systems, Concurr. Comput.: Pract. Exper., 24 (2012) 956-976.
[9]
E. Deelman, D. Gannon, M. Shields, I. Taylor, Workflows and e-science: An overview of workflow system features and capabilities, Future Gener. Comput. Syst., 25 (2009) 528-540.
[10]
L. Qi, W. Lin, W. Dou, J. Jiang, J. Chen, A QoS-aware exception handling method in scientific workflow execution, Concurr. Comput.: Pract. Exper., 23 (2011) 1951-1968.
[11]
T. Dalman, M. Weitzel, W. Wiechert, B. Freisleben, K. Noh, An online provenance service for distributed metabolic flux analysis workflows, in: Ninth IEEE European Conference on Web Services, ECOWS, 2011, pp. 91-98. http://dx.doi.org/10.1109/ECOWS.2011.20.
[12]
M. Wei¿ßbach, W. Zimmermann, Termination analysis of business process workflows, in: Proceedings of the 5th International Workshop on Enhanced Web Service Technologies, ACM, New York, NY, USA, 2010, pp. 18-25.
[13]
J. Vanhatalo, H. Völzer, F. Leymann, S. Moser, Automatic workflow graph refactoring and completion, in: Proceedings of the 6th International Conference on Service-Oriented Computing, Springer-Verlag, Berlin, Heidelberg, 2008, pp. 100-115.
[14]
J. Cardoso, A. Sheth, J. Miller, J. Arnold, K. Kochut, Quality of service for workflows and web service processes, Web Semant. Sci. Serv. Agents World Wide Web, 1 (2004) 281-308.
[15]
Project Scheduling Problem Library-PSPLIB, (accessed on 28.05.2014) link. URL: http://129.187.106.231/psplib/.
[16]
K. Meffert, N. Rotstan, C. Knowles, U.B. Sangiorgi, JGAP-Java genetic algorithms and genetic programming package (accessed on 28.05.2014). link. URL: http://jgap.sourceforge.net/.
[17]
X. Zhang, C. Liu, S. Nepal, S. Pandey, J. Chen, A privacy leakage upper-bound constraint based approach for cost-effective privacy preserving of intermediate datasets in cloud, IEEE Trans. Parallel Distrib. Syst., 24 (2013) 1192-1202.
[18]
X. Zhang, L.T. Yang, C. Liu, J. Chen, A scalable two-phase top-down specialization approach for data anonymization using mapreduce on cloud, IEEE Trans. Parallel Distrib. Syst., 25 (2014) 363-373.
[19]
C. Olston, G. Chiou, L. Chitnis, F. Liu, Y. Han, M. Larsson, A. Neumann, V.B. Rao, V. Sankarasubramanian, S. Seth, C. Tian, T. ZiCornell, X. Wang, Nova: continuous pig/hadoop workflows, in: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, ACM, New York, NY, USA, 2011, pp. 1081-1090.
[20]
X. Zhang, C. Liu, S. Nepal, J. Chen, An efficient quasi-identifier index based approach for privacy preservation over incremental data sets on cloud, J. Comput. System Sci., 79 (2013) 542-555.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Future Generation Computer Systems
Future Generation Computer Systems  Volume 55, Issue C
February 2016
547 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 February 2016

Author Tags

  1. Checkpoint
  2. Data-intensive workflow
  3. Intermediate dataset
  4. Risk evaluation
  5. Robustness

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media