Abstract
Cloud computing has become one of the most popular frameworks for big data processing and analysis. To have better performance, the data chunks and their corresponding process nodes must be near each other, which is known as data locality. Moreover, to meet the deadline constraints, the available processing power next to the data locations is crucial. Considering the time and resource constraints, designing an appropriate job scheduler is essential for the effective management of memory and processing capacity. These concerns coalesce into the problem of joint data placement and job scheduling with data locality and deadline constraints. In this paper, two kinds of job sets are determined, one with equal deadlines and the other one with various deadlines. The problem of joint data placement and job scheduling with a single deadline is formulated as the problem of bin packing with splittable items and cardinality constraints, which is strongly NP-hard. Moreover, this paper proposes the polynomial-time Meddal algorithm to find a feasible solution with a few processing nodes. Experiment results have shown that the proposed method is effective, with the number of required nodes reduced by \(33\%\) when compared to the algorithms Cred and First Fit.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The datasets generated and analyzed during the current study are available in the IUST cloud repository, https://drive.iust.ac.ir/index.php/s/7MCtZXgnpcnbQ5CMeddalData.
References
Puthal D, Sahoo B, Mishra S, Swain S (2015) Cloud computing features, issues, and challenges: a big picture. IEEE, pp 116–123
Josep AD et al (2010) A view of cloud computing. Commun ACM 53(4):50
Mell P, Grance T (2011) The NIST definition of cloud computing. Technical Reports 800-145, National Institute of Standards and Technology (NIST). http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
Xu M, Alamro S, Lan T, Subramaniam S (2017) CRED: cloud right-sizing with execution deadlines and data locality. IEEE Trans Parallel Distrib Syst 28(12):3389–3400
Shi S, Wu C, Li Z (2015) Cost-minimizing online VM purchasing for application service providers with arbitrary demands. pp 146–154
Tarafdar A, Debnath M, Khatua S, Das RK (2021) Energy and makespan aware scheduling of deadline sensitive tasks in the cloud environment. J Grid Comput 19(2):1–25
Singh S, Chana I (2016) A survey on resource scheduling in cloud computing: issues and challenges. J Grid Comput 14(2):217–264
Zhan ZH et al (2015) Cloud computing resource scheduling and a survey of its evolutionary approaches. ACM Comput Surv 47(4):63:1-63:33
Magoulès F, Pan J, Teng F (2016) Cloud computing: lata-intensive computing and scheduling. Chapman and Hall/CRC
Ma X, Fan X, Liu J, Jiang H, Peng K (2017) vLocality: revisiting data locality for MapReduce in virtualized clouds. IEEE Netw 31(1):28–35
Zou X, et al (2021) The dilemma between deduplication and locality: can both be achieved? pp 171–185
Shabeera T, Madhu Kumar S (2015) Optimising virtual machine allocation in MapReduce cloud for improved data locality. Int J Big Data Intell 2(1):2–8
Lee S, Jo J, Kim Y (2019) Survey of data locality in apache hadoop. IEEE, pp 46–53
Palanisamy B, Singh A, Liu L, Jain B, Lathrop SA, Costa J, Kramer W (2011) Purlieus: locality-aware resource allocation for MapReduce in a cloud. In: Lathrop SA, Costa J, Kramer W (eds.), Proceedings of conference on high performance computing networking, storage and analysis, ACM, Seattle, November 12–18 2011, pp 58:1–58:11
Tang S, Lee B, He B (2014) DynamicMR: a dynamic slot allocation optimization framework for MapReduce clusters. IEEE Trans Cloud Comput 2(3):333–347
Naik NS, Negi A, Br TB, Anitha R (2019) A data locality based scheduler to enhance MapReduce performance in heterogeneous environments. Future Gener Comput Syst 90:423–434
Ru J, Yang Y, Grundy J, Keung J, Hao L (2019) A highly efficient data locality aware task scheduler for cloud-based systems. IEEE, pp 496–498
Jalalian Z, Sharifi M (2021) A hierarchical multi-objective task scheduling approach for fast big data processing. J Supercomput 78:1–30
Bittencourt L et al (2018) The internet of things, fog and cloud continuum: integration and challenges. Internet Things 3:134
Li S, Lan T, Ra M-R, Panta R (2018) Joint scheduling and source selection for background traffic in erasure-coded storage. IEEE Trans Parallel Distrib Syst 29(12):2826–2837
Epstein L, van Stee R (2011) Improved results for a memory allocation problem. Theory Comput Syst 48(1):79–92
Garey MR, Johnson DS (1978) “Strong’’ NP-completeness results: motivation, examples, and implications. J ACM 25(3):499–508
Li C, Cai Q, Lou Y (2022) Optimal data placement strategy considering capacity limitation and load balancing in geographically distributed cloud. Future Gener Comput Syst 127:142–159
Ru J, Yang Y, Grundy J, Keung J, Hao L (2021) An efficient deadline constrained and data locality aware dynamic scheduling framework for multitenancy clouds. Concurr Comput Pract Exp 33(5):e6037
Swain CK, Gupta B, Sahu A (2020) Constraint aware profit maximization scheduling of tasks in heterogeneous datacenters. Computing 102(10):2229–2255
Li C et al (2019) Data locality optimization based on data migration and hotspots prediction in geo-distributed cloud environment. Knowl Based Syst 165:321–334
Xu M, Alamro S, Lan T, Subramaniam S (2018) Chronos: a unifying optimization framework for speculative execution of deadline-critical MapReduce jobs. IEEE, pp 718–729
Li C, Liu J, Wang M, Luo Y (2022) Fault-tolerant scheduling and data placement for scientific workflow processing in geo-distributed clouds. J Syst Softw 187:111227
Kang Y, Pan L, Liu S (2022) Job scheduling for big data analytical applications in clouds: a taxonomy study. Future Gener Comput Syst 135:129
Albers S, Quedenfeld J (2018) Optimal algorithms for right-sizing data centers
Alamro S, Xu M, Lan T, Subramaniam S (2020) Shed+: optimal dynamic speculation to meet application deadlines in cloud. IEEE Trans Netw Serv Manag 17(3):1515–1526
Toosi AN, Sinnott RO, Buyya R (2018) Resource provisioning for data-intensive applications with deadline constraints on hybrid clouds using Aneka. Future Gener Comput Syst 79:765–775
Yuan H et al (2017) TTSA: an effective scheduling approach for delay bounded tasks in hybrid clouds. IEEE Trans Cybern 47(11):3658–3668
Kaur K, Kumar N, Garg S, Rodrigues JJPC (2018) EnLoc: data locality-aware energy-efficient scheduling scheme for cloud data centers. pp 1–6
Lee JH, Jang H, Kim HJ (2020) Iterative job splitting algorithms for parallel machine scheduling with job splitting and setup resource constraints. J Oper Res Soc 72:1–20
Epstein L, Levin A, van Stee R (2012) Approximation schemes for packing splittable items with cardinality constraints. Algorithmica 62(1–2):102–129
LeCun B, Mautor T, Quessette F, Weisser M-A (2015) Bin packing with fragmentable items: presentation and approximations. Theor Comput Sci 602:50–59
Grégoire JC, Hamel AM (2014) On scheduling live media streaming in the cloud—a study. IEEE, pp 1–6
Casazza M, Ceselli A (2016) Exactly solving packing problems with fragmentation. Comput Oper Res 75:202–213
Beaumont O, Eyraud-Dubois L, Caro CT, Rejeb H (2013) Heterogeneous resource allocation under degree constraints. IEEE Trans Parallel Distrib Syst 24(5):926–937
Jaykrishnan G, Levin A (2022) EPTAS for the dual of splittable bin packing with cardinality constraint. arXiv preprint arXiv:2204.04685
Fleszar K (2022) A MILP model and two heuristics for the bin packing problem with conflicts and item fragmentation. Eur J Oper Res 303:37
Ekici A (2022) Variable-sized bin packing problem with conflicts and item fragmentation. Comput Ind Eng 163:107844
Ekici A (2021) Bin packing problem with conflicts and item fragmentation. Comput Oper Res 126:105113
Jalaparti V et al (2015) Network-aware scheduling for data-parallel jobs: plan when you can. Comput Commun Rev 45(5):407–420
Reiss C, Wilkes J, Hellerstein JL (2011) Google cluster-usage traces: format + schema. Technical Report, Google Inc., Mountain View, CA, USA. Revised 2014-11-17 for version 2.1. Posted at https://github.com/google/cluster-data
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Malekimajd, M. Meddal: meeting deadlines and data locality via bin packing in cloud environment. Computing 105, 249–273 (2023). https://doi.org/10.1007/s00607-022-01122-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-022-01122-0