Critic Guided Segmentation of Rewarding Objects in First-Person Views

Andrew Melnik¹¹,
Augustin Harter¹¹,
Christian Limberg¹¹,
Krishan Rana¹²,
Niko Sünderhauf¹² &
…
Helge Ritter¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12873))

Included in the following conference series:

German Conference on Artificial Intelligence (Künstliche Intelligenz)

1080 Accesses
4 Citations

Abstract

This work discusses a learning approach to mask rewarding objects in images using sparse reward signals from an imitation learning dataset. For that we train an Hourglass network using only feedback from a critic model. The Hourglass network learns to produce a mask to decrease the critic’s score of a high score image and increase the critic’s score of a low score image by swapping the masked areas between these two images. We trained the model on an imitation learning dataset from the NeurIPS 2020 MineRL Competition Track, where our model learned to mask rewarding objects in a complex interactive 3D environment with a sparse reward signal. This approach was part of the 1st place winning solution in this competition. Video demonstration and code: https://rebrand.ly/critic-guided-segmentation.

A. Melnik and A. Harter—Shared first authorship.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 51.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 64.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics

Semantic Curiosity for Active Visual Learning

Unsupervised Salient Patch Selection for Data-Efficient Reinforcement Learning

References

Bach, N., Melnik, A., Schilling, M., Korthals, T., Ritter, H.: Learn to move through a combination of policy gradient algorithms: DDPG, D4PG, and TD3. In: Nicosia, G., et al. (eds.) LOD 2020. LNCS, vol. 12566, pp. 631–644. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64580-9_52
Chapter Google Scholar
Greydanus, S., Koul, A., Dodge, J., Fern, A.: Visualizing and understanding Atari agents. In: International Conference on Machine Learning, pp. 1792–1801. PMLR (2018)
Google Scholar
Gunning, D., Aha, D.: Darpa’s explainable artificial intelligence (XAI) program. AI Mag. 40(2), 44–58 (2019)
Google Scholar
Guss, W.H., et al.: Towards robust and domain agnostic reinforcement learning competitions: MineRL 2020. In: NeurIPS 2020 Competition and Demonstration Track, PMLR, pp. 233–252 (2021). https://proceedings.mlr.press/v133/guss21a
Harter, A., Melnik, A., Kumar, G., Agarwal, D., Garg, A., Ritter, H.: Solving physics puzzles by reasoning about paths. In: 1st NeurIPS workshop on Interpretable Inductive Biases and Physically Structured Learning (2020). https://arxiv.org/abs/2011.07357
Hilton, J., Cammarata, N., Carter, S., Goh, G., Olah, C.: Understanding RL vision. Distill (2020). https://doi.org/10.23915/distill.00029, https://distill.pub/2020/understanding-rl-vision
Jaderberg, M., et al.: Reinforcement learning with unsupervised auxiliary tasks. arXiv preprint arXiv:1611.05397 (2016)
Kaiser, L., et al.: Model-based reinforcement learning for Atari. arXiv preprint arXiv:1903.00374 (2019)
Konen, K., Korthals, T., Melnik, A., Schilling, M.: Biologically-inspired deep reinforcement learning of modular control for a six-legged robot. In: 2019 IEEE International Conference on Robotics and Automation Workshop on Learning Legged Locomotion Workshop, (ICRA) 2019, Montreal, CA, 20–25 May 2019 (2019)
Google Scholar
König, P., Melnik, A., Goeke, C., Gert, A.L., König, S.U., Kietzmann, T.C.: Embodied cognition. In: 2018 6th International Conference on Brain-Computer Interface (BCI), pp. 1–4. IEEE (2018)
Google Scholar
Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with gaussian edge potentials. CoRR abs/1210.5644 (2012). http://arxiv.org/abs/1210.5644
Li, S.: Simple introduction about hourglass-like model. https://medium.com/@sunnerli/simple-introduction-about-hourglass-like-model-11ee7c30138
Melnik, A., Bramlage, L., Voss, H., Rossetto, F., Ritter, H.: Combining causal modelling and deep reinforcement learning for autonomous agents in minecraft. In: 4th Workshop on Semantic Policy and Action Representations for Autonomous Robots at IROS 2019 (2019)
Google Scholar
Melnik, A., Fleer, S., Schilling, M., Ritter, H.: Modularization of end-to-end learning: case study in arcade games. In: 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Workshop on Causal Learning (2018). https://arxiv.org/pdf/1901.09895.pdf
Melnik, A., Lach, L., Plappert, M., Korthals, T., Haschke, R., Ritter, H.: Using tactile sensing to improve the sample efficiency and performance of deep deterministic policy gradients for simulated in-hand manipulation tasks. Front. Robot. AI 8, 57 (2021). https://doi.org/10.3389/frobt.2021.538773
Article Google Scholar
Melnik, A., Schüler, F., Rothkopf, C.A., König, P.: The world as an external memory: the price of saccades in a sensorimotor task. Front. Behav. Neurosci. 12, 253 (2018). https://doi.org/10.3389/fnbeh.2018.00253
Article Google Scholar
Olah, C., Mordvintsev, A., Schubert, L.: Feature visualization. Distill 2(11), e7 (2017)
Article Google Scholar
Olah, C., et al.: The building blocks of interpretability. Distill 3(3), e10 (2018)
Article Google Scholar
Schilling, M., Melnik, A.: An approach to hierarchical deep reinforcement learning for a decentralized walking control architecture. In: Samsonovich, A.V. (ed.) BICA 2018. AISC, vol. 848, pp. 272–282. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-99316-4_36
Chapter Google Scholar
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps (2014)
Google Scholar
Srinivas, A., Laskin, M., Abbeel, P.: Curl: contrastive unsupervised representations for reinforcement learning. arXiv preprint arXiv:2004.04136 (2020)
taigw: Simple CRF python package. https://github.com/HiLab-git/SimpleCRF

Download references

Author information

Authors and Affiliations

CITEC, Bielefeld University, Bielefeld, Germany
Andrew Melnik, Augustin Harter, Christian Limberg & Helge Ritter
Centre for Robotics, Queensland University of Technology (QUT), Brisbane, Australia
Krishan Rana & Niko Sünderhauf

Authors

Andrew Melnik
View author publications
You can also search for this author in PubMed Google Scholar
Augustin Harter
View author publications
You can also search for this author in PubMed Google Scholar
Christian Limberg
View author publications
You can also search for this author in PubMed Google Scholar
Krishan Rana
View author publications
You can also search for this author in PubMed Google Scholar
Niko Sünderhauf
View author publications
You can also search for this author in PubMed Google Scholar
Helge Ritter
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Czech Technical University in Prague, Prague, Czech Republic
Stefan Edelkamp
University of Lübeck, Lübeck, Germany
Ralf Möller
University of Leoben, Leoben, Austria
Elmar Rueckert

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Melnik, A., Harter, A., Limberg, C., Rana, K., Sünderhauf, N., Ritter, H. (2021). Critic Guided Segmentation of Rewarding Objects in First-Person Views. In: Edelkamp, S., Möller, R., Rueckert, E. (eds) KI 2021: Advances in Artificial Intelligence. KI 2021. Lecture Notes in Computer Science(), vol 12873. Springer, Cham. https://doi.org/10.1007/978-3-030-87626-5_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-87626-5_25
Published: 30 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87625-8
Online ISBN: 978-3-030-87626-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Critic Guided Segmentation of Rewarding Objects in First-Person Views

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics

Semantic Curiosity for Active Visual Learning

Unsupervised Salient Patch Selection for Data-Efficient Reinforcement Learning

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Critic Guided Segmentation of Rewarding Objects in First-Person Views

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics

Semantic Curiosity for Active Visual Learning

Unsupervised Salient Patch Selection for Data-Efficient Reinforcement Learning

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation