Abstract
General and adaptive strategies have been a highly pursued goal of the optimization community, due to the domain-dependent set of configurations (operators and parameters) that is usually required for achieving high quality solutions. This work investigates a Deep Q-Network (DQN) selection strategy under an online selection Hyper-Heuristic algorithm and compares it with two state-of-the-art Multi-Armed Bandit (MAB) approaches. We conducted the experiments on all six problem domains from the HyFlex Framework. With our definition of state representation and reward scheme, the DQN was able to quickly identify the good and bad operators, which resulted on better performance than the MAB strategies on the problem instances that a more exploitative behavior deemed advantageous.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2), 235–256 (2002)
Blum, C., Puchinger, J., Raidl, G.R., Roli, A.: Hybrid metaheuristics in combinatorial optimization: a survey. Appl. Soft Comput. 11(6), 4135–4151 (2011)
Burke, E.K., Hyde, M., Kendall, G., Ochoa, G., Özcan, E., Woodward, J.R.: A Classification of Hyper-heuristic Approaches, pp. 449–468. Springer, US, Boston, MA (2010). https://doi.org/10.1007/978-1-4419-1665-5_15
Buzdalova, A., Kononov, V., Buzdalov, M.: Selecting evolutionary operators using reinforcement learning: initial explorations. In: Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation, pp. 1033–1036 (2014)
DaCosta, L., Fialho, A., Schoenauer, M., Sebag, M.: Adaptive operator selection with dynamic multi-armed bandits. In: Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation, pp. 913–920. GECCO ’08, Association for Computing Machinery, New York, NY, USA (2008)
Dantas, A., Rego, A.F.D., Pozo, A.: Using deep q-network for selection hyper-heuristics. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 1488–1492. GECCO ’21, Association for Computing Machinery, New York, NY, USA (2021)
Fialho, Á.: Adaptive Operator Selection for Optimization. Université Paris Sud - Paris XI (Dec, Theses (2010)
Handoko, S.D., Nguyen, D.T., Yuan, Z., Lau, H.C.: Reinforcement learning for adaptive operator selection in memetic search applied to quadratic assignment problem. In: Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation, pp. 193–194. GECCO Comp ’14, Association for Computing Machinery, New York, NY, USA (2014)
Karimi-Mamaghan, M., Mohammadi, M., Meyer, P., Karimi-Mamaghan, A.M., Talbi, E.G.: Machine learning at the service of meta-heuristics for solving combinatorial optimization problems: A state-of-the-art. European Journal of Operational Research (2021)
Li, K., Fialho, Á., Kwong, S., Zhang, Q.: Adaptive operator selection with bandits for a multiobjective evolutionary algorithm based on decomposition. IEEE Trans. Evol. Comput. 18(1), 114–130 (2014)
Mosadegh, H., Ghomi, S.F., Süer, G.A.: Stochastic mixed-model assembly line sequencing problem: Mathematical modeling and q-learning based simulated annealing hyper-heuristics. Eur. J. Oper. Res. 282(2), 530–544 (2020)
Ochoa, G., et al.: HyFlex: A Benchmark Framework for Cross-domain Heuristic Search, vol. 7245, pp. 136–147 (2012)
Pedregosa, F., et al.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Puterman, M.L.: Chapter 8 markov decision processes. In: Handbooks in Operations Research and Management Science, Stochastic Models, vol. 2, pp. 331–434. Elsevier (1990)
Sharma, M., Komninos, A., López-Ibáñez, M., Kazakov, D.: Deep reinforcement learning based parameter control in differential evolution. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 709–717 (2019)
Soria-Alcaraz, J.A., Ochoa, G., Sotelo-Figeroa, M.A., Burke, E.K.: A methodology for determining an effective subset of heuristics in selection hyper-heuristics. Eur. J. Oper. Res. 260(3), 972–983 (2017)
Sutton, R.S., Barto, A.G.: Reinforcement Learning, Second Edition: An Introduction. MIT Press (2018)
Teng, T.-H., Handoko, S.D., Lau, H.C.: Self-organizing neural network for adaptive operator selection in evolutionary search. In: Festa, P., Sellmann, M., Vanschoren, J. (eds.) LION 2016. LNCS, vol. 10079, pp. 187–202. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50349-3_13
Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3), 279–292 (1992)
Acknowledgements
This work was financially supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) and by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Dantas, A., Pozo, A. (2021). Online Selection of Heuristic Operators with Deep Q-Network: A Study on the HyFlex Framework. In: Britto, A., Valdivia Delgado, K. (eds) Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science(), vol 13073. Springer, Cham. https://doi.org/10.1007/978-3-030-91702-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-91702-9_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91701-2
Online ISBN: 978-3-030-91702-9
eBook Packages: Computer ScienceComputer Science (R0)