Actor-Critic with Variable Time Discretization via Sustained Actions

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14447))

Included in the following conference series:

International Conference on Neural Information Processing

1491 Accesses

Abstract

Reinforcement learning (RL) methods work in discrete time. In order to apply RL to inherently continuous problems like robotic control, a specific time discretization needs to be defined. This is a choice between sparse time control, which may be easier to train, and finer time control, which may allow for better ultimate performance. In this work, we propose SusACER, an off-policy RL algorithm that combines the advantages of different time discretization settings. Initially, it operates with sparse time discretization and gradually switches to a fine one. We analyze the effects of the changing time discretization in robotic control environments: Ant, HalfCheetah, Hopper, and Walker2D. In all cases our proposed algorithm outperforms state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 55.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 69.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Real-Time Actor-Critic Architecture for Continuous Control

A Novel Heterogeneous Actor-critic Algorithm with Recent Emphasizing Replay Memory

Article 23 April 2021

A Framework for Reinforcement Learning with Autocorrelated Actions

Notes

1.
Defined as the number of failures before the first success.
2.
https://github.com/lychanl/acer-release/releases/tag/SusACER.

References

Baird, L.: Reinforcement learning in continuous time: advantage updating. In: Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN 1994). vol. 4, pp. 2448–2453 (1994). https://doi.org/10.1109/ICNN.1994.374604
Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can learn difficult learning control problems. IEEE Trans. Syst. Man Cybern. B 13, 834–846 (1983)
Article Google Scholar
Biedenkapp, A., Rajan, R., Hutter, F., Lindauer, M.: Temporl: learning when to act. CoRR abs/2106.05262 (2021). https://arxiv.org/abs/2106.05262
Coumans, E., Bai, Y.: Pybullet, a python module for physics simulation for games, robotics and machine learning (2016–2021). https://pybullet.org
Dabney, W., Ostrovski, G., Barreto, A.: Temporally-extended (\(\epsilon \))-greedy exploration. CoRR abs/2006.01782 (2020). https://arxiv.org/abs/2006.01782
Dulac-Arnold, G., et al.: Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Mach. Learn. 110(9), 2419–2468 (2021)
Article MathSciNet MATH Google Scholar
ElDahshan, K.A., Farouk, H., Mofreh, E.: Deep reinforcement learning based video games: a review. In: 2022 2nd International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), pp. 302–309 (2022). https://doi.org/10.1109/MIUCC55081.2022.9781752
Gürtler, N., Büchler, D., Martius, G.: Hierarchical reinforcement learning with timed subgoals (2021)
Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: offpolicy maximum entropy deep reinforcement learning with a stochastic actor (2018). arXiv:1801.01290
Kalyanakrishnan, S., et al.: An analysis of frame-skipping in reinforcement learning (2021)
Google Scholar
Kimura, H., Kobayashi, S.: An analysis of actor/critic algorithms using eligibility traces: reinforcement learning with imperfect value function. In: ICML (1998)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015). https://arxiv.org/abs/1412.6980
Lakshminarayanan, A., Sharma, S., Ravindran, B.: Dynamic action repetition for deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1 (2017). https://doi.org/10.1609/aaai.v31i1.10918. https://ojs.aaai.org/index.php/AAAI/article/view/10918
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning (2016), arXiv:1509.02971
Liu, R., Nageotte, F., Zanne, P., de Mathelin, M., Dresp-Langley, B.: Deep reinforcement learning for the control of robotic manipulation: a focussed mini-review. Robotics 10(1) (2021). https://doi.org/10.3390/robotics10010022. https://www.mdpi.com/2218-6581/10/1/22
Mann, T.A., Mannor, S., Precup, D.: Approximate value iteration with temporally extended actions. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, IJCAI 2017, pp. 5035–5039. AAAI Press (2017)
Google Scholar
Metelli, A.M., Mazzolini, F., Bisi, L., Sabbioni, L., Restelli, M.: Control frequency adaptation via action persistence in batch reinforcement learning. CoRR abs/2002.06836 (2020). https://arxiv.org/abs/2002.06836
Mnih, V., et al.: Playing atari with deep reinforcement learning (2013). arXiv:1312.5602
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 522–533 (2015)
Article Google Scholar
Akkaya, I., et al.: Solving Rubik’s cube with a robot hand (2019)
Google Scholar
Park, S., Kim, J., Kim, G.: Time discretization-invariant safe action repetition for policy gradient methods. CoRR abs/2111.03941 (2021). https://arxiv.org/abs/2111.03941
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017). arXiv:1707.06347
Sharma, S., Srinivas, A., Ravindran, B.: Learning to repeat: fine grained action repetition for deep reinforcement learning (2020)
Google Scholar
Singh, B., Kumar, R., Singh, V.P.: Reinforcement learning in robotic applications: a comprehensive survey. Artif. Intell. Rev. 55(2), 945–990 (2022). https://doi.org/10.1007/s10462-021-09997-9
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. The MIT Press, Cambridge (2018)
MATH Google Scholar
Szulc, M., Łyskawa, J., Wawrzyński, P.: A framework for reinforcement learning with autocorrelated actions. In: International Conference on Neural Information Processing, pp. 90–101 (2020)
Google Scholar
Sünderhauf, N., et al.: The limits and potentials of deep learning for robotics. Int. J. Rob. Res. 37(4–5), 405–420 (2018). https://doi.org/10.1177/0278364918770733
Article Google Scholar
Tallec, C., Blier, L., Ollivier, Y.: Making deep q-learning methods robust to time discretization. In: International Conference on Machine Learning (ICML), pp. 6096–6104 (2019)
Google Scholar
Uhlenbeck, G.E., Ornstein, L.S.: On the theory of the Brownian motion. Phys. Rev. 36, 823–841 (1930). https://doi.org/10.1103/PhysRev.36.823
Article MATH Google Scholar
Wawrzyński, P.: Real-time reinforcement learning by sequential actor-critics and experience replay. Neural Netw. 22(10), 1484–1497 (2009)
Article MATH Google Scholar
Yu, C., Liu, J., Nemati, S., Yin, G.: Reinforcement learning in healthcare: a survey. ACM Comput. Surv. 55(1) (2021). https://doi.org/10.1145/3477600
Yu, H., Xu, W., Zhang, H.: TASAC: temporally abstract soft actor-critic for continuous control. CoRR abs/2104.06521 (2021). https://arxiv.org/abs/2104.06521
Łyskawa, J., Wawrzyński, P.: ACERAC: efficient reinforcement learning in fine time discretization. IEEE Trans. Neural Netw. Learn. Syst. (2022). https://doi.org/10.1109/TNNLS.2022.3190973

Download references

Author information

Authors and Affiliations

Warsaw University of Technology, Pl. Politechniki 1, 00-661, Warsaw, Poland
Jakub Łyskawa
Ideas NCBR, ul. Chmielna 69, 00-801, Warsaw, Poland
Paweł Wawrzyński

Authors

Jakub Łyskawa
View author publications
You can also search for this author in PubMed Google Scholar
Paweł Wawrzyński
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jakub Łyskawa .

Editor information

Editors and Affiliations

Central South University, Changsha, China
Biao Luo
Chinese Academy of Sciences, Beijing, China
Long Cheng
Zhejiang University, Hangzhou, China
Zheng-Guang Wu
Guangdong University of Technology, Guangzhou, China
Hongyi Li
UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Ethics declarations

Ethical Statement

This work does not focus on processing personal data. The novel solutions presented in this paper cannot be directly used to collect, process, or infer personal information. We also believe that reinforcement learning methods, including SusACER, are currently not viable solutions for control processes used for policing or the military. This work does not have any ethical implications.

Hyperparameters

In this section we provide hyperparameters used to obtain results in the Sect. 6. Table 3 contains common parameters for the offline algorithms, namely for SusACER, ACER and SAC. Table 4 contains shared parameters for SusACER and ACER algorithms. Tables 5 and 6 contain hyperparameters for SAC and PPO, respectively. Table 7 contains environment-specific reward scaling parameter values for the SAC algorithm.

Table 3. Common parameters for offline algorithms (SusACER, ACER, SAC).

Full size table

Table 4. SusACER and ACER hyperparameters.

Full size table

Table 5. SAC general hyperparameters. For environment-specific hyperparameters see Tab. 7

Full size table

Table 6. PPO hyperparameters.

Full size table

Table 7. SAC reward scaling.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Łyskawa, J., Wawrzyński, P. (2024). Actor-Critic with Variable Time Discretization via Sustained Actions. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14447. Springer, Singapore. https://doi.org/10.1007/978-981-99-8079-6_37

Download citation

DOI: https://doi.org/10.1007/978-981-99-8079-6_37
Published: 14 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8078-9
Online ISBN: 978-981-99-8079-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Actor-Critic with Variable Time Discretization via Sustained Actions

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Real-Time Actor-Critic Architecture for Continuous Control

A Novel Heterogeneous Actor-critic Algorithm with Recent Emphasizing Replay Memory

A Framework for Reinforcement Learning with Autocorrelated Actions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Ethical Statement

Hyperparameters

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Actor-Critic with Variable Time Discretization via Sustained Actions

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Real-Time Actor-Critic Architecture for Continuous Control

A Novel Heterogeneous Actor-critic Algorithm with Recent Emphasizing Replay Memory

A Framework for Reinforcement Learning with Autocorrelated Actions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Ethical Statement

Hyperparameters

Hyperparameters

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation