Abstract
Parallel computing is the fundamental base for MapReduce framework in Hadoop. A big data is split into small data chunks, where Map task is referred to processing a data chunk. Each data chunk is replicated over three servers in Hadoop for increasing the availability of data and decreasing the probability of data loss. As a result, the three servers that have the Map task stored on their disk are the fastest servers to process them, which are called local servers. All the servers in the same rack as local servers are called rack-local servers that are slower than local servers in processing Map tasks since the data chunk associated with the Map task should be fetched through the top of the rack switch. All the other servers are called remote servers that are the slowest servers for processing a Map task since they need to fetch data from a local server in another rack, so data should be transmitted through at least two top of the rack switches and a core switch. Note that the number of switches in the path of data transfer depends on the internal network structure of data centers. The First-In-First-Out (FIFO) and Hadoop Fair Scheduler (HFS) algorithms do not take the rack structure of data centers into account, so they are known to not be heavy-traffic delay optimal or even throughput optimal. The recent advances on scheduling for data centers considering the rack structure of them and the heterogeneity of servers resulted in the state-of-the-art Balanced-PANDAS algorithm that outperforms the classic MaxWeight algorithm and its derivation, JSQ-MaxWeight algorithm. In both Balanced-PANDAS and MaxWeight algorithms, the processing rate of local, rack-local, and remote servers are assumed to be known. However, with the change of traffic over time in addition to estimation errors of processing rates, it is not realistic to consider the processing rates to be known. In this work, we study the robustness of Balanced-PANDAS and MaxWeight algorithms in terms of inaccurate estimations of processing rates. We observe that Balanced-PANDAS is not as sensitive as MaxWeight on the accuracy of processing rates, making it more appealing to use in data centers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
This manuscript is archived in https://arxiv.org/abs/2004.06254.
References
Alinia, B., Sadegh Talebi, M., Hajiesmaili, M.H., Yekkehkhany, A., Crespi, N.: Competitive online scheduling algorithms with applications in deadline-constrained EV charging. In: 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), pp. 1–10. IEEE (2018)
Almalki, M.M., Chehardeh, M.I., Hatziadoniu, C.J.: Capacitor bank switching transient analysis using frequency dependent network equivalents. In: North American Power Symposium (NAPS), 2015. IEEE (2015)
Bell, S., Williams, R., et al.: Dynamic scheduling of a parallel server system in heavy traffic with complete resource pooling: asymptotic optimality of a threshold policy. Electr. J. Probab. 10, 1044–1115 (2005)
Bell, S.L., Williams, R.J., et al.: Dynamic scheduling of a system with two parallel servers in heavy traffic with resource pooling: asymptotic optimality of a threshold policy. Ann. Appl. Probab. 11(3), 608–649 (2001)
Broder, A.: A taxonomy of web search. In: ACM SIGIR Forum, vol. 36, pp. 3–10. ACM (2002)
Byers, J.W., Considine, J., Mitzenmacher, M.: Geometric generalizations of the power of two choices. In: Proceedings of the Sixteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, pp. 54–63. ACM (2004)
Chehardeh, M.I., Almalki, M.M., Hatziadoniu, C.J.: Remote feeder transfer between out-of-phase sources using STS. In: Power and Energy Conference at Illinois (PECI), 2016 IEEE. IEEE (2016)
Chehardeh, M.I., Hatziadoniu, C.J.: A systematic method for reliability index evaluation of distribution networks in the presence of distributed generators. In: 2018 North American Power Symposium (NAPS). IEEE (2018)
Clower, R., Leijonhufvud, A.: The coordination of economic activities: a Keynesian perspective. Am. Econ. Rev. 65(2), 182–188 (1975)
Cooper, C., Elsässer, R., Radzik, T.: The power of two choices in distributed voting. In: International Colloquium on Automata, Languages, and Programming, pp. 435–446. Springer, Berlin (2014)
Daghighi, A.: Application of an artificial neural network as a third-party database auditing system (2019)
Daghighi, A., Kavousi, M.: Scheduling for data centers with multi-level data locality. In: 2017 Iranian Conference on Electrical Engineering (ICEE), pp. 927–936. IEEE (2017)
Dahlgaard, S., Knudsen, M.B.T., Rotenberg, E., Thorup, M.: The power of two choices with simple tabulation. In: Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms, pp. 1631–1642. SIAM (2016)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Deilami, S., Masoum, A.S., Moses, P.S., Masoum, M.A.S: Real-time coordination of plug-in electric vehicle charging in smart grids to minimize power losses and improve voltage profile. IEEE Trans. Smart Grid 2(3), 456–467 (2011)
Doerr, B., Goldberg, L.A., Minder, L., Sauerwald, T., Scheideler, C.: Stabilizing consensus with the power of two choices. In: Proceedings of the Twenty-Third Annual ACM Symposium on Parallelism in Algorithms and Architectures, pp. 149–158. ACM (2011)
Eisenhauer, E.: In poor health: supermarket redlining and urban nutrition. GeoJournal 53(2), 125–133 (2001)
Gan, L., Topcu, U., Low, S.H.: Optimal decentralized protocol for electric vehicle charging. IEEE Trans. Power Syst. 28(2), 940–951 (2013)
Gardner, K., Harchol-Balter, M., Scheller-Wolf, A., Velednitsky, M., Zbarsky, S.: Redundancy-d: the power of d choices for redundancy. Oper. Res. 65(4), 1078–1094 (2017)
Gast, N.: The power of two choices on graphs: the pair-approximation is accurate? ACM SIGMETRICS Perform. Eval. Rev. 43(2), 69–71 (2015)
Harrison, J.M.: Heavy traffic analysis of a system with parallel servers: asymptotic optimality of discrete-review policies. Ann. Appl. Probab. 822–848 (1998)
Harrison, J.M., López, M.J.: Heavy traffic resource pooling in parallel-server systems. Queueing Syst. 33(4), 339–368 (1999)
He, C., Lu, Y., Swanson, D.: Matchmaking: a new MapReduce scheduling technique. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), pp. 40–47. IEEE (2011)
Hosseini, M., Jiang, Y., Yekkehkhany, A., Berlin, R.R., Sha, L.: A mobile geo-communication dataset for physiology-aware dash in rural ambulance transport. In: Proceedings of the 8th ACM on Multimedia Systems Conference. ACM (2017)
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: ACM SIGOPS Operating Systems Review, vol. 41, pp. 59–72. ACM (2007)
Isard, M., Prabhakaran, V., Currey, J., Wieder, U., Talwar, K., Goldberg, A.: Quincy: fair scheduling for distributed computing clusters. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, pp. 261–276. ACM (2009)
Jin, J., Luo, J., Song, A., Dong, F., Xiong, R.: Bar: an efficient data locality driven task scheduling algorithm for cloud computing. In: Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 295–304. IEEE Computer Society (2011)
Kavousi, M.: Affinity scheduling and the applications on data center scheduling with data locality. arXiv preprint arXiv:1705.03125 (2017)
Kreimer, J.: Real-time system with homogeneous servers and nonidentical channels in steady-state. Comput. Oper. Res. 29(11), 1465–1473 (2002)
Krishna, K.H., Rani, K.A.: Reducing the energy consumption energy-efficient query processing node in web search engines (2018)
Livny, M., Melman, M.: Load balancing in homogeneous broadcast distributed systems. In: ACM SIGMETRICS Performance Evaluation Review, vol. 11, pp. 47–55. ACM (1982)
Lowery, J.C., Collins, M.A., Schroeder, B.: Systems and methods for provisioning homogeneous servers. US Patent App. 11/562,921, 22 May 2008
Luczak, M.J., McDiarmid, C., et al.: On the power of two choices: balls and bins in continuous time. Ann. Appl. Probab. 15(3), 1733–1764 (2005)
Lumetta, S., Mitzenmacher, M.: Using the power of two choices to improve bloom filters. Internet Math. 4(1), 17–33 (2007)
Mandelbaum, A., Stolyar, A.L.: Scheduling flexible servers with convex delay costs: heavy-traffic optimality of the generalized c\(\mu \)-rule. Oper. Res. 52(6), 836–855 (2004)
Meyn, S.: Stability and asymptotic optimality of generalized MaxWeight policies. SIAM J. Control Optim. 47(6), 3259–3294 (2009)
Mitzenmacher, M.: The power of two choices in randomized load balancing. IEEE Trans. Parallel Distrib. Syst. 12(10), 1094–1104 (2001)
Moaddeli, A., Ahmadi, I.N., Abhar, N.: The power of d choices in scheduling for data centers with heterogeneous servers. arXiv preprint arXiv:1904.00447 (2019)
Musavi, N.: A game theoretical framework for the evaluation of unmanned aircraft systems airspace integration concepts. arXiv preprint arXiv:1904.08477 (2019)
Musavi, N., Onural, D., Gunes, K., Yildiz, Y.: Unmanned aircraft systems airspace integration: a game theoretical framework for concept evaluations. J. Guidance Control Dyn. 40, 96–109 (2016)
Musavi, N., Tekelioğlu, K.B., Yildiz, Y., Gunes, K., Onural, D.: A game theoretical modeling and simulation framework for the integration of unmanned aircraft systems in to the national airspace. In: AIAA Infotech@ Aerospace (2016)
Polo, J., et al.: Resource-aware adaptive scheduling for MapReduce clusters. In: ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing, pp. 187–207. Springer, Berlin (2011)
Richa, A.W., Mitzenmacher, M., Sitaraman, R.: The power of two random choices: a survey of techniques and results. Comb. Optim. 9, 255–304 (2001)
Saadatmand, S., Azizi, S., Kavousi, M., Wunsch, D.: Autonomous control of a line follower robot using a q-learning controller. In: 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0556–0561. IEEE (2020)
Saadatmand, S., Kavousi, M., Azizi, S.: The voltage regulation of boost converters using dual heuristic programming. In: 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0531–0536. IEEE (2020)
Saadatmand, S., Sanjarinia, M.S., Shamsi, P., Ferdowsi, M.: Dual heuristic dynamic programing control of grid-connected synchronverters. In: North American Power Symposium (NAPS), 2019, pp. 1–6 (2019)
Saadatmand, S., Sanjarinia, M.S., Shamsi, P., Ferdowsi, M., Wunsch, D.C.: Heuristic dynamic programming for adaptive virtual synchronous generators. In: North American Power Symposium (NAPS), 2019, pp. 1–6 (2019)
Sadiq, B., De Veciana, G.: Throughput optimality of delay-driven MaxWeight scheduler for a wireless system with flow dynamics. In: 47th Annual Allerton Conference on Communication, Control, and Computing, Allerton, pp. 1097–1102. IEEE (2009)
Salehi, M.: Optimal physiology-aware scheduling of clinical states in rural ambulance transport. In: 2017 International Conference on Inventive Computing and Informatics (ICICI), pp. 247–252. IEEE (2017)
Salehi, S., Du, J.T., Ashman, H.: Use of web search engines and personalisation in information searching for educational purposes. Inf. Res. Int. Electr. J. 23(2), n2 (2018)
Schwartz, C.: Web search engines. J. Am. Soc. Inf. Sci. 49(11), 973–982 (1998)
Singh, V.P.: Two-server Markovian queues with balking: heterogeneous vs. homogeneous servers. Oper. Res. 18(1), 145–159 (1970)
Stolyar, A.L., et al.: MaxWeight scheduling in a generalized switch: state space collapse and workload minimization in heavy traffic. Ann. Appl. Probab. 14(1), 1–53 (2004)
van de Ven, P., Borst, S., Shneer, S.: Instability of MaxWeight scheduling algorithms. In: INFOCOM 2009, pp. 1701–1709. IEEE (2009)
Van Mieghem J.A.: Dynamic scheduling with convex delay costs: the generalized c| mu rule. Ann. Appl. Probab. 809–833 (1995)
Wang, C.-S., Stielau, O.H., Covic, G.A.: Design considerations for a contactless electric vehicle battery charger. IEEE Trans. Ind. Electr. 52(5), 1308–1314 (2005)
Wang, W., Zhu, K., Ying, L., Tan, J., Zhang, L.: A throughput optimal algorithm for map task scheduling in MapReduce with data locality. ACM SIGMETRICS Perform. Eval. Rev. 40(4), 33–42 (2013)
Wang, W., Zhu, K., Ying, L., Tan, J., Zhang, L.: Maptask scheduling in MapReduce with data locality: throughput and heavy-traffic optimality. IEEE/ACM Trans. Netw. (TON) 24(1), 190–203 (2016)
Tom White: Hadoop: the definitive guide, yahoo (2010)
White, T.: Hadoop: The Definitive Guide. O’Reilly Media Inc., Newton (2012)
Winkler, F.: Consumerism in health care: beyond the supermarket model. Policy Politics 15(1), 1–8 (1987)
Xie, Q., Lu, Y.: Priority algorithm for near-data scheduling: throughput and heavy-traffic optimality. In: 2015 IEEE Conference on Computer Communications (INFOCOM), pp. 963–972. IEEE (2015)
Xie, Q., Yekkehkhany, A., Lu, Y.: Scheduling with multi-level data locality: throughput and heavy-traffic optimality. In: INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications. IEEE (2016)
Xie, X., Liu, Y., de Rijke, M., He, J., Zhang, M., Ma, S.: Why people search for images using web search engines. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 655–663. ACM (2018)
Yekkehkhany, A.: Near data scheduling for data centers with multi levels of data locality. Dissertation, University of Illinois at Urbana-Champaign (2017)
Yekkehkhany, A.: Risk-averse multi-armed bandits and game theory. Dissertation, University of Illinois at Urbana-Champaign (2020)
Yekkehkhany, A., Arian, E., Hajiesmaili, M., Nagi, R.: Risk-averse explore-then-commit algorithms for finite-time bandits. In: 2019 IEEE 58th Conference on Decision and Control (CDC) (2019)
Yekkehkhany, A., Arian, E., Nagi, R., Shomorony, I.: A cost-based analysis for risk-averse explore-then-commit finite-time bandits (2020)
Yekkehkhany, A., Hojjati, A., Hajiesmaili, M.H.: GB-PANDAS: throughput and heavy-traffic optimality analysis for affinity scheduling. ACM SIGMETRICS Perform. Eval. Rev. 45(2), 2–14 (2018)
Yekkehkhany, A., Nagi, R.: Blind GB-PANDAS: a blind throughput-optimal load balancing algorithm for affinity scheduling. IEEE/ACM Trans. Netw. 28(3), 1199–1212 (2020)
Yildiz, Y., Agogino, A., Brat, G.: Predicting pilot behavior in medium scale scenarios using game theory and reinforcement learning. In: AIAA Modeling and Simulation Technologies (MST) Conference, p. 4908 (2013)
Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems, pp. 265–278. ACM (2010)
Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R.H., Stoica, I.: Improving mapreduce performance in heterogeneous environments. In: Osdi, vol. 8, p. 7 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Daghighi, A., Chen, J.Q. (2022). Robustness Comparison of Scheduling Algorithms in MapReduce Framework. In: Arai, K. (eds) Intelligent Computing. Lecture Notes in Networks and Systems, vol 283. Springer, Cham. https://doi.org/10.1007/978-3-030-80119-9_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-80119-9_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80118-2
Online ISBN: 978-3-030-80119-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)