Abstract
Reinforcement learning (RL) methods work in discrete time. In order to apply RL to inherently continuous problems like robotic control, a specific time discretization needs to be defined. This is a choice between sparse time control, which may be easier to train, and finer time control, which may allow for better ultimate performance. In this work, we propose SusACER, an off-policy RL algorithm that combines the advantages of different time discretization settings. Initially, it operates with sparse time discretization and gradually switches to a fine one. We analyze the effects of the changing time discretization in robotic control environments: Ant, HalfCheetah, Hopper, and Walker2D. In all cases our proposed algorithm outperforms state of the art.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Defined as the number of failures before the first success.
- 2.
References
Baird, L.: Reinforcement learning in continuous time: advantage updating. In: Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN 1994). vol. 4, pp. 2448–2453 (1994). https://doi.org/10.1109/ICNN.1994.374604
Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can learn difficult learning control problems. IEEE Trans. Syst. Man Cybern. B 13, 834–846 (1983)
Biedenkapp, A., Rajan, R., Hutter, F., Lindauer, M.: Temporl: learning when to act. CoRR abs/2106.05262 (2021). https://arxiv.org/abs/2106.05262
Coumans, E., Bai, Y.: Pybullet, a python module for physics simulation for games, robotics and machine learning (2016–2021). https://pybullet.org
Dabney, W., Ostrovski, G., Barreto, A.: Temporally-extended (\(\epsilon \))-greedy exploration. CoRR abs/2006.01782 (2020). https://arxiv.org/abs/2006.01782
Dulac-Arnold, G., et al.: Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Mach. Learn. 110(9), 2419–2468 (2021)
ElDahshan, K.A., Farouk, H., Mofreh, E.: Deep reinforcement learning based video games: a review. In: 2022 2nd International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), pp. 302–309 (2022). https://doi.org/10.1109/MIUCC55081.2022.9781752
Gürtler, N., Büchler, D., Martius, G.: Hierarchical reinforcement learning with timed subgoals (2021)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: offpolicy maximum entropy deep reinforcement learning with a stochastic actor (2018). arXiv:1801.01290
Kalyanakrishnan, S., et al.: An analysis of frame-skipping in reinforcement learning (2021)
Kimura, H., Kobayashi, S.: An analysis of actor/critic algorithms using eligibility traces: reinforcement learning with imperfect value function. In: ICML (1998)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015). https://arxiv.org/abs/1412.6980
Lakshminarayanan, A., Sharma, S., Ravindran, B.: Dynamic action repetition for deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1 (2017). https://doi.org/10.1609/aaai.v31i1.10918. https://ojs.aaai.org/index.php/AAAI/article/view/10918
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning (2016), arXiv:1509.02971
Liu, R., Nageotte, F., Zanne, P., de Mathelin, M., Dresp-Langley, B.: Deep reinforcement learning for the control of robotic manipulation: a focussed mini-review. Robotics 10(1) (2021). https://doi.org/10.3390/robotics10010022. https://www.mdpi.com/2218-6581/10/1/22
Mann, T.A., Mannor, S., Precup, D.: Approximate value iteration with temporally extended actions. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, IJCAI 2017, pp. 5035–5039. AAAI Press (2017)
Metelli, A.M., Mazzolini, F., Bisi, L., Sabbioni, L., Restelli, M.: Control frequency adaptation via action persistence in batch reinforcement learning. CoRR abs/2002.06836 (2020). https://arxiv.org/abs/2002.06836
Mnih, V., et al.: Playing atari with deep reinforcement learning (2013). arXiv:1312.5602
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 522–533 (2015)
Akkaya, I., et al.: Solving Rubik’s cube with a robot hand (2019)
Park, S., Kim, J., Kim, G.: Time discretization-invariant safe action repetition for policy gradient methods. CoRR abs/2111.03941 (2021). https://arxiv.org/abs/2111.03941
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017). arXiv:1707.06347
Sharma, S., Srinivas, A., Ravindran, B.: Learning to repeat: fine grained action repetition for deep reinforcement learning (2020)
Singh, B., Kumar, R., Singh, V.P.: Reinforcement learning in robotic applications: a comprehensive survey. Artif. Intell. Rev. 55(2), 945–990 (2022). https://doi.org/10.1007/s10462-021-09997-9
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. The MIT Press, Cambridge (2018)
Szulc, M., Łyskawa, J., Wawrzyński, P.: A framework for reinforcement learning with autocorrelated actions. In: International Conference on Neural Information Processing, pp. 90–101 (2020)
Sünderhauf, N., et al.: The limits and potentials of deep learning for robotics. Int. J. Rob. Res. 37(4–5), 405–420 (2018). https://doi.org/10.1177/0278364918770733
Tallec, C., Blier, L., Ollivier, Y.: Making deep q-learning methods robust to time discretization. In: International Conference on Machine Learning (ICML), pp. 6096–6104 (2019)
Uhlenbeck, G.E., Ornstein, L.S.: On the theory of the Brownian motion. Phys. Rev. 36, 823–841 (1930). https://doi.org/10.1103/PhysRev.36.823
Wawrzyński, P.: Real-time reinforcement learning by sequential actor-critics and experience replay. Neural Netw. 22(10), 1484–1497 (2009)
Yu, C., Liu, J., Nemati, S., Yin, G.: Reinforcement learning in healthcare: a survey. ACM Comput. Surv. 55(1) (2021). https://doi.org/10.1145/3477600
Yu, H., Xu, W., Zhang, H.: TASAC: temporally abstract soft actor-critic for continuous control. CoRR abs/2104.06521 (2021). https://arxiv.org/abs/2104.06521
Łyskawa, J., Wawrzyński, P.: ACERAC: efficient reinforcement learning in fine time discretization. IEEE Trans. Neural Netw. Learn. Syst. (2022). https://doi.org/10.1109/TNNLS.2022.3190973
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Ethical Statement
This work does not focus on processing personal data. The novel solutions presented in this paper cannot be directly used to collect, process, or infer personal information. We also believe that reinforcement learning methods, including SusACER, are currently not viable solutions for control processes used for policing or the military. This work does not have any ethical implications.
Hyperparameters
Hyperparameters
In this section we provide hyperparameters used to obtain results in the Sect. 6. Table 3 contains common parameters for the offline algorithms, namely for SusACER, ACER and SAC. Table 4 contains shared parameters for SusACER and ACER algorithms. Tables 5 and 6 contain hyperparameters for SAC and PPO, respectively. Table 7 contains environment-specific reward scaling parameter values for the SAC algorithm.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Łyskawa, J., Wawrzyński, P. (2024). Actor-Critic with Variable Time Discretization via Sustained Actions. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14447. Springer, Singapore. https://doi.org/10.1007/978-981-99-8079-6_37
Download citation
DOI: https://doi.org/10.1007/978-981-99-8079-6_37
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8078-9
Online ISBN: 978-981-99-8079-6
eBook Packages: Computer ScienceComputer Science (R0)