Abstract
In data intensive workflows, which often involve files, transfer between tasks is typically accomplished as fast as the network links allow, and once transferred, the files are buffered/stored at their destination. Where a task requires multiple files to execute (from different previous tasks), it must remain idle until all files are available. Hence, network bandwidth and buffer/storage within a workflow are often not used effectively. In this paper, we are quantitatively measuring the impact that applying an intelligent data movement policy can have on buffer/storage in comparison with existing approaches. Our main objective is to propose a metric that considers a workflow structure expressed as a Directed Acyclic Graph (DAG), and performance information collected from historical past executions of the considered workflow. This metric is intended for use at the design-stage, to compare various DAG structures and evaluate their potential for optimisation (of network bandwidth and buffer use).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Park, S.M., Humphrey, M.: Data Throttling for Data-Intensive Workflows. In: IEEE International Symposium on Parallel and Distributed Processing, pp. 1–11 (April 2008)
van der Aalst, W., van Hee, K.: Workflow Management: Models, Methods, and Systems. MIT Press Books, vol. 1. The MIT Press (2004)
van der Aalst, W.M.P., Hirnschall, A., Verbeek, H.M.W.: An Alternative Way to Analyze Workflow Graphs. In: Pidduck, A.B., Mylopoulos, J., Woo, C.C., Ozsu, M.T. (eds.) CAiSE 2002. LNCS, vol. 2348, pp. 535–552. Springer, Heidelberg (2002)
Filgueira, R., Carretero, J., Singh, D.E., Calderón, A., Nuñez, A.: Dynamic-compi: dynamic optimization techniques for mpi parallel applications. The Journal of Supercomputing 59(1), 361–391 (2012)
Yu, J., Buyya, R.: A Taxonomy of Workflow Management Systems for Grid Computing. CoRR 34(3), 44–49 (2005)
Oinn, T., Greenwood, M., Addis, M., Alpdemir, M.N., Ferris, J., Glover, K., Goble, C., Goderis, A., Hull, D., Marvin, D., Li, P., Lord, P., Pocock, M.R., Senger, M., Stevens, R., Wipat, A., Wroe, C.: Taverna: lessons in creating a workflow environment for the life sciences: Research Articles. Concurr. Comput.: Pract. Exper. 18(10), 1067–1100 (2006)
Deelman, E., Mehta, G., Singh, G., Su, M., Vahi, K.: Pegasus: Mapping Large-Scale Workflows to Distributed Resources. In: Workflows for eScience, pp. 376–394. Springer (2007)
Rodríguez, R.J., Tolosana-Calasanz, R., Rana, O.F.: Automating Data-Throttling Analysis for Data-Intensive Workflows. In: Proceedings of CCGrid (accepted for publication, 2012)
Murata, T.: Petri Nets: Properties, Analysis and Applications. Proceedings of the IEEE 77, 541–580 (1989)
Molloy, M.: Performance Analysis Using Stochastic Petri Nets. IEEE Transactions on Computers C-31(9), 913–917 (1982)
Rodríguez, R.J., Júlvez, J.: Accurate Performance Estimation for Stochastic Marked Graphs by Bottleneck Regrowing. In: Aldini, A., Bernardo, M., Bononi, L., Cortellessa, V. (eds.) EPEW 2010. LNCS, vol. 6342, pp. 175–190. Springer, Heidelberg (2010)
Campos, J., Silva, M.: Embedded Product-Form Queueing Networks and the Improvement of Performance Bounds for Petri Net Systems. Performance Evaluation 18(1), 3–19 (1993)
Berriman, G.B., Deelman, E., Good, J., Jacob, J.C., Katz, D.S., Laity, A.C., Prince, T.A., Singh, G., Su, M.H.: Generating Complex Astronomy Workflows. In: Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.) Workflows for e-Science, pp. 19–38. Springer, London (2007)
Casanova, H., Legrand, A., Quinson, M.: SimGrid: a Generic Framework for Large-Scale Distributed Experiments. In: 10th IEEE International Conference on Computer Modeling and Simulation (March 2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rodríguez, R.J., Tolosana-Calasanz, R., Rana, O.F. (2012). Measuring the Effectiveness of Throttled Data Transfers on Data-Intensive Workflows. In: Jezic, G., Kusek, M., Nguyen, NT., Howlett, R.J., Jain, L.C. (eds) Agent and Multi-Agent Systems. Technologies and Applications. KES-AMSTA 2012. Lecture Notes in Computer Science(), vol 7327. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30947-2_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-30947-2_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30946-5
Online ISBN: 978-3-642-30947-2
eBook Packages: Computer ScienceComputer Science (R0)