Abstract
Massive data transmission between distributed data centers is the major efficiency bottleneck of geospatial workflow. Although many data placement methods have been proposed to overcome this problem, few researches have considered the impact of the structure of the workflow. In this paper, we define the problem of data placement for data-intensive geospatial workflow aiming to minimize the data transfer time. An algorithm called ant colony optimization based data placement of data-intensive geospatial workflow (ACO-DPDGW) is proposed to handle this problem. By taking advantage of the node vector to represent the traditional workflow model, the ants could place datasets and tasks in appropriate data centers according to the combination of pheromone information and heuristic information, when they visit the nodes randomly. To prevent premature convergence, a variable neighborhood search operation is embedded into ACO-DPDGW. The experiments show that our algorithm can reduce data transfer volume and data transfer time even as the numbers of datasets, tasks, and data centers increase.
Similar content being viewed by others
References
Altintas I, Berkley C, Jaeger E, et al. (2004) Kepler: an extensible system for design and execution of scientific workflows[C]//proceedings. 16th international conference on scientific and statistical database management, 2004. IEEE, 423–424
Altintas I, Block J, De Callafon R et al (2015) Towards an integrated cyberinfrastructure for scalable data-driven monitoring, dynamic prediction and resilience of wildfires[J]. Procedia Comput Sci 51:1633–1642
Atrey A, Van Seghbroeck G, Volckaert B, et al. (2018) Scalable data placement of data-intensive Services in geo-distributed Clouds[C]//CLOSER2018, the 8th international conference on cloud computing and services science. SCITEPRESS-Science and Technology Publications, 497–508
Bousrih A, Brahmi Z. (2015) Optimizing cost and response time for data intensive services' composition based on ABC algorithm[C]//Information & Communication Technology and accessibility (ICTA), 2015 5th international conference on. IEEE, 1–6
Chen W, Paik I, Li Z (2016) Tology-aware optimal data placement algorithm for network traffic optimization[J]. IEEE Trans Comput 65(8):2603–2617
Chen J, Zhang J, Song A. (2017) Efficient data and task co-scheduling for scientific workflow in geo-distributed datacenters[C]//advanced cloud and big data (CBD), 2017 fifth international conference on. IEEE, 63–68
Cowart C, Block J, Crawl D, et al. (2015) geoKepler Workflow Module for Computationally Scalable and Reproducible Geoprocessing and Modeling[C]//AGU Fall Meeting Abstracts
Davies DK, Ilavajhala S, Wong MM et al (2009) Fire information for resource management system: archiving and distributing MODIS active fire data[J]. IEEE Trans Geosci Remote Sens 47(1):72–79
Davila CC, Reinhart CF, Bemis JL (2016) Modeling Boston: a workflow for the efficient generation and maintenance of urban building energy models from existing geospatial datasets[J]. Energy 117:237–250
Deelman E, Chervenak A. (2008) Data management challenges of data-intensive scientific workflows[C]//cluster computing and the grid, 2008. CCGRID'08. 8th IEEE international symposium on. IEEE, 687–692
Deelman E, Gannon D, Shields M, Taylor I (2009) Workflows and e-science: an overview of workflow system features and capabilities[J]. Futur Gener Comput Syst 25(5):528–540
Deng K, Ren K, Song J, Yuan D, Xiang Y, Chen J (2013) A clustering based coscheduling strategy for efficient scientific workflow execution in cloud computing[J]. Concurr Comput: Pract E 25(18):2523–2539
Deng K, Ren K, Zhu M, et al. (2015) A data and task co-scheduling algorithm for scientific cloud workflows[J]. IEEE Trans Cloud Comput (1): 1–1
Dorigo M (1996) The any system optimization by a colony of cooperating agents[J]. IEEE Trans Syst Man Cybern B 26:1): 1–1):13
Ebrahimi M, Mohan A, Kashlev A, et al. (2015) BDAP: a big data placement strategy for cloud-based scientific workflows[C]//big data computing service and applications (BigDataService), 2015 IEEE first international conference on. IEEE, 105–114
Er-Dun Z, Yong-Qiang Q, Xing-Xing X, et al. (2012) A data placement strategy based on genetic algorithm for scientific workflows[C]//computational intelligence and security (CIS), 2012 eighth international conference on IEEE, 146–149
Gao Y, Guan H, Qi Z et al (2013) A multi-objective ant colony system algorithm for virtual machine placement in cloud computing[J]. J Comput Syst Sci 79(8):1230–1242
Gutjahr WJ (2002) ACO algorithms with guaranteed convergence to the optimal solution[J]. Inf Process Lett 82(3):145–153
Hamrouni T, Slimani S, Charrada FB (2015) A data mining correlated patterns-based periodic decentralized replication strategy for data grids[J]. J Syst Softw 110:10–27
Jiang L, Yue P, Kuhn W, Zhang C, Yu C, Guo X (2018) Advancing interoperability of geospatial data provenance on the web: gap analysis and strategies[J]. Comput Geosci 117:21–31
Kalra M, Singh S (2015) A review of metaheuristic scheduling techniques in cloud computing[J]. Egypt Inf J 16(3):275–295
Lee JG, Kang M (2015) Geospatial big data: challenges and opportunities[J]. Big Data Research 2(2):74–81
Li S, Dragicevic S, Castro FA, Sester M, Winter S, Coltekin A, Pettit C, Jiang B, Haworth J, Stein A, Cheng T (2016a) Geospatial big data handling theory and methods: a review and research challenges[J]. ISPRS J Photogramm Remote Sens 115:119–133
Li X, Zhang L, Wu Y, et al. (2016b) A novel workflow-level data placement strategy for data-sharing scientific cloud workflows[J]. IEEE Trans Serv Comput
Liu XF, Zhan ZH, Deng Jeremiah D et al An energy efficient ant Colony system for virtual machine placement in cloud computing[J]. IEEE Trans Evol Comput 22(1):113–128
Mladenović N, Hansen P (1997) Variable neighborhood search[J]. Comput Oper Res 24(11):1097–1100
Pisinger D (2005) Where are the hard knapsack problems?[J]. Comput Oper Res 32(9):2271–2284
Shabeera TP, Kumar SDM, Salam SM et al (2016) Optimizing VM Allocation and Data Placement for Data-Intensive Applications in Cloud using ACO Metaheuristic Algorithm[J]. Eng Sci Technol Int J 20(2):616–628
Shibata T, Choi S J, Taura K. (2010) File-access patterns of data-intensive workflow applications and their implications to distributed filesystems[C]//proceedings of the 19th ACM international symposium on high performance distributed computing. ACM, 746–755
Shirasuna S, Gannon D (2006) Xbaya: a graphical workflow composer for the web services architecture[J]. Indiana University
Tawfeek MA, El-Sisi AB, Keshk AE et al (2014) Virtual machine placement based on ant colony optimization for minimizing resource wastage[C]//international conference on advanced machine learning technologies and applications. Springer, Cham, pp 153–164
Teylo L, de Paula U, Frota Y, de Oliveira D, Drummond LMA (2017) A hybrid evolutionary algorithm for task scheduling and data assignment of data-intensive scientific workflows on clouds[J]. Futur Gener Comput Syst 76:1–17
van Der Aalst WMP, Ter Hofstede AHM, Kiepuszewski B et al (2003) Workflow patterns[J]. Distrib Parallel Databases 14(1):5–51
Wang L, Shen J, Beydoun G (2013) Enhanced ant colony algorithm for cost-aware data-intensive service provision[C]//2013 IEEE ninth world congress on services. IEEE, 227–234
Wang T, Yao S, Xu Z, Jia S (2016) DCCP: an effective data placement strategy for data-intensive computations in distributed cloud computing systems[J]. J Supercomput 72(7):2537–2564
Wei-Neng CHEN, Zhang J (2008) An ant Colony optimization approach to a grid workflow scheduling problem with various QoS requirements[J]. IEEE Tran Syst Man Cybern C 39(1):29–43
Xu Q, Xu Z, Wang T (2015) A data-placement strategy based on genetic algorithm in cloud computing[J]. Int J Intell Sci 5(03):145–157
Yuan D, Yang Y, Liu X, Chen J (2010) A data placement strategy in scientific cloud workflows[J]. Futur Gener Comput Syst 26(8):1200–1214
Yue P, Zhang M, Tan Z (2015) A geoprocessing workflow system for environmental monitoring and integrated modelling[J]. Environ Model Softw 69:128–140
Zeng L, Veeravalli B, Zomaya AY (2015) An integrated task computation and data management scheduling strategy for workflow applications in cloud environments[J]. J Netw Comput Appl 50:39–48
Zhang XL, Chen XF, He ZJ (2010) An ACO-based algorithm for parameter optimization of support vector machines[J]. Expert Syst Appl 37(9):6618–6628
Zhang J, Wang M, Luo J, Dong F, Zhang J (2015) Towards optimized scheduling for data-intensive scientific workflow in multiple datacenter environment[J]. Concurr Comput: Pract E 27(18):5606–5622
Zhao Q, Xiong C, Zhao X, et al. (2015) A data placement strategy for data-intensive scientific workflows in cloud[C]//cluster, cloud and grid computing (CCGrid), 2015 15th IEEE/ACM international symposium on. IEEE, 928–934
Zhao Q, Xiong C, Wang P (2016) Heuristic data placement for data-intensive applications in heterogeneous cloud[J]. J Electr Comput Eng 2016:1–8
Acknowledgments
The research was supported by Key Science and Technology Plan Projects of Fujian Province (2015H0015), Education and Technology Plan Projects of Fujian Province (JAT160088), and Foundation of China Scholarship Council (201706655035).
Author information
Authors and Affiliations
Contributions
Xiaozhu Wu and Ying Liu conceived, designed and performed the experiments. All of the authors analyzed the data. Xiaozhu Wu wrote the paper. Xiaozhu Wu and Ying Liu revised the paper.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wu, X., Liu, Y. & Chen, C. ACO-DPDGW: an ant colony optimization algorithm for data placement of data-intensive geospatial workflow. Earth Sci Inform 12, 641–658 (2019). https://doi.org/10.1007/s12145-019-00401-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12145-019-00401-3