Abstract
Modern datacenter networks are facing various challenges, e.g., highly dynamic workloads, congestion, topology asymmetry. ECMP, as a traditional load balancing mechanism which is widely used in today’s datacenters, can balance load poorly and lead to congestion. Variety of load balancing schemes are proposed to address the problems of ECMP. However, these traditional schemes usually make load balancing decision only based on network knowledge for a snapshot or a short time past. In this paper, we propose a Reinforcement Learning (RL) based approach, called RILNET (ReInforcement Learning NETworking), aiming at load balancing for datacenter networks. RILNET employs RL to learn a network and control it based on the learned experience. To achieve a higher granularity of control, RILNET is constructed to route flowlet rather than flows. Moreover, RILNET makes routing decisions for aggregation flows (an aggregation flow is a flow set that includes all flows flowing from the same source edge switch to the same destination edge switch) instead of a single flow. In order to test performance of RILNET, we propose a flow-level simulation and a packet-level simulation, and the both results show that RILNET can balance traffic load much more effectively than ECMP and another load balancing solution, i.e., DRILL. Compared with DRILL, RILNET outperforms DRILL in data loss and maximal link delay. Specifically, the maximal link data loss and the maximal link delay of RILNET are 44.4% and 25.4% smaller than DRILL, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Note that RILNET can be used for multiple purposes, including load balancing, reducing data loss, reducing flow completion time, etc. In this paper, we focus on load balancing and leave the other purposes in our future work.
References
Al-Fares, M., Loukissas, A., Vahdat, A.: A scalable, commodity data center network architecture. In: ACM SIGCOMM Computer Communication Review, vol. 38, pp. 63–74. ACM (2008)
Al-Fares, M., Radhakrishnan, S., Raghavan, B., Huang, N., Vahdat, A.: Hedera: dynamic flow scheduling for data center networks. In: NSDI, vol. 10, p. 19 (2010)
Alizadeh, M., et al.: CONGA: distributed congestion-aware load balancing for datacenters. In: ACM SIGCOMM Computer Communication Review, vol. 44, pp. 503–514. ACM (2014)
Benson, T., Akella, A., Maltz, D.A.: Network traffic characteristics of data centers in the wild. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, pp. 267–280. ACM (2010)
Chavula, J., Densmore, M., Suleman, H.: Using SDN and reinforcement learning for traffic engineering in UbuntuNet alliance. In: 2016 International Conference on Advances in Computing and Communication Engineering (ICACCE), pp. 349–355. IEEE (2016)
Ghorbani, S., Godfrey, B., Ganjali, Y., Firoozshahian, A.: micro load balancing in data centers with drill. In: Proceedings of the 14th ACM Workshop on Hot Topics in Networks, p. 17. ACM (2015)
Gill, P., Jain, N., Nagappan, N.: Understanding network failures in data centers: measurement, analysis, and implications. In: ACM SIGCOMM Computer Communication Review, vol. 41, pp. 350–361. ACM (2011)
Grondman, I., Busoniu, L., Lopes, G.A., Babuska, R.: A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(6), 1291–1307 (2012)
Guo, C., et al.: BCube: a high performance, server-centric network architecture for modular data centers. ACM SIGCOMM Comput. Commun. Rev. 39(4), 63–74 (2009)
Guo, C., et al.: Pingmesh: a large-scale system for data center network latency measurement and analysis. ACM SIGCOMM Comput. Commun. Rev. 45(4), 139–152 (2015)
He, K., Rozner, E., Agarwal, K., Felter, W., Carter, J., Akella, A.: Presto: edge-based load balancing for fast datacenter networks. ACM SIGCOMM Comput. Commun. Rev. 45(4), 465–478 (2015)
Kandula, S., Katabi, D., Sinha, S., Berger, A.: Dynamic load balancing without packet reordering. ACM SIGCOMM Comput. Commun. Rev. 37(2), 51–62 (2007)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Lin, S.C., Akyildiz, I.F., Wang, P., Luo, M.: QoS-aware adaptive routing in multi-layer hierarchical software defined networks: a reinforcement learning approach. In: 2016 IEEE International Conference on Services Computing (SCC), pp. 25–33. IEEE (2016)
Popa, L., Kumar, G., Chowdhury, M., Krishnamurthy, A., Ratnasamy, S., Stoica, I.: FairCloud: sharing the network in cloud computing. In: Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, pp. 187–198. ACM (2012)
Rasley, J., et al.: Planck: millisecond-scale monitoring and control for commodity networks. In: ACM SIGCOMM Computer Communication Review, vol. 44, pp. 407–418. ACM (2014)
The MAWI Working Group: MAWI working group traffic archive. http://mawi.wide.ad.jp/mawi/. Accessed 21 June 2018
Varga, A.: OMNeT++ user manual version 4.6. OpenSim Ltd (2014)
Zhang, H., Zhang, J., Bai, W., Chen, K., Chowdhury, M.: Resilient datacenter load balancing in the wild. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication, pp. 253–266. ACM (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Lin, Q., Gong, Z., Wang, Q., Li, J. (2019). RILNET: A Reinforcement Learning Based Load Balancing Approach for Datacenter Networks. In: Renault, É., Mühlethaler, P., Boumerdassi, S. (eds) Machine Learning for Networking. MLN 2018. Lecture Notes in Computer Science(), vol 11407. Springer, Cham. https://doi.org/10.1007/978-3-030-19945-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-19945-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-19944-9
Online ISBN: 978-3-030-19945-6
eBook Packages: Computer ScienceComputer Science (R0)