Abstract
This work discusses a learning approach to mask rewarding objects in images using sparse reward signals from an imitation learning dataset. For that we train an Hourglass network using only feedback from a critic model. The Hourglass network learns to produce a mask to decrease the critic’s score of a high score image and increase the critic’s score of a low score image by swapping the masked areas between these two images. We trained the model on an imitation learning dataset from the NeurIPS 2020 MineRL Competition Track, where our model learned to mask rewarding objects in a complex interactive 3D environment with a sparse reward signal. This approach was part of the 1st place winning solution in this competition. Video demonstration and code: https://rebrand.ly/critic-guided-segmentation.
A. Melnik and A. Harter—Shared first authorship.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bach, N., Melnik, A., Schilling, M., Korthals, T., Ritter, H.: Learn to move through a combination of policy gradient algorithms: DDPG, D4PG, and TD3. In: Nicosia, G., et al. (eds.) LOD 2020. LNCS, vol. 12566, pp. 631–644. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64580-9_52
Greydanus, S., Koul, A., Dodge, J., Fern, A.: Visualizing and understanding Atari agents. In: International Conference on Machine Learning, pp. 1792–1801. PMLR (2018)
Gunning, D., Aha, D.: Darpa’s explainable artificial intelligence (XAI) program. AI Mag. 40(2), 44–58 (2019)
Guss, W.H., et al.: Towards robust and domain agnostic reinforcement learning competitions: MineRL 2020. In: NeurIPS 2020 Competition and Demonstration Track, PMLR, pp. 233–252 (2021). https://proceedings.mlr.press/v133/guss21a
Harter, A., Melnik, A., Kumar, G., Agarwal, D., Garg, A., Ritter, H.: Solving physics puzzles by reasoning about paths. In: 1st NeurIPS workshop on Interpretable Inductive Biases and Physically Structured Learning (2020). https://arxiv.org/abs/2011.07357
Hilton, J., Cammarata, N., Carter, S., Goh, G., Olah, C.: Understanding RL vision. Distill (2020). https://doi.org/10.23915/distill.00029, https://distill.pub/2020/understanding-rl-vision
Jaderberg, M., et al.: Reinforcement learning with unsupervised auxiliary tasks. arXiv preprint arXiv:1611.05397 (2016)
Kaiser, L., et al.: Model-based reinforcement learning for Atari. arXiv preprint arXiv:1903.00374 (2019)
Konen, K., Korthals, T., Melnik, A., Schilling, M.: Biologically-inspired deep reinforcement learning of modular control for a six-legged robot. In: 2019 IEEE International Conference on Robotics and Automation Workshop on Learning Legged Locomotion Workshop, (ICRA) 2019, Montreal, CA, 20–25 May 2019 (2019)
König, P., Melnik, A., Goeke, C., Gert, A.L., König, S.U., Kietzmann, T.C.: Embodied cognition. In: 2018 6th International Conference on Brain-Computer Interface (BCI), pp. 1–4. IEEE (2018)
Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with gaussian edge potentials. CoRR abs/1210.5644 (2012). http://arxiv.org/abs/1210.5644
Li, S.: Simple introduction about hourglass-like model. https://medium.com/@sunnerli/simple-introduction-about-hourglass-like-model-11ee7c30138
Melnik, A., Bramlage, L., Voss, H., Rossetto, F., Ritter, H.: Combining causal modelling and deep reinforcement learning for autonomous agents in minecraft. In: 4th Workshop on Semantic Policy and Action Representations for Autonomous Robots at IROS 2019 (2019)
Melnik, A., Fleer, S., Schilling, M., Ritter, H.: Modularization of end-to-end learning: case study in arcade games. In: 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Workshop on Causal Learning (2018). https://arxiv.org/pdf/1901.09895.pdf
Melnik, A., Lach, L., Plappert, M., Korthals, T., Haschke, R., Ritter, H.: Using tactile sensing to improve the sample efficiency and performance of deep deterministic policy gradients for simulated in-hand manipulation tasks. Front. Robot. AI 8, 57 (2021). https://doi.org/10.3389/frobt.2021.538773
Melnik, A., Schüler, F., Rothkopf, C.A., König, P.: The world as an external memory: the price of saccades in a sensorimotor task. Front. Behav. Neurosci. 12, 253 (2018). https://doi.org/10.3389/fnbeh.2018.00253
Olah, C., Mordvintsev, A., Schubert, L.: Feature visualization. Distill 2(11), e7 (2017)
Olah, C., et al.: The building blocks of interpretability. Distill 3(3), e10 (2018)
Schilling, M., Melnik, A.: An approach to hierarchical deep reinforcement learning for a decentralized walking control architecture. In: Samsonovich, A.V. (ed.) BICA 2018. AISC, vol. 848, pp. 272–282. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-99316-4_36
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps (2014)
Srinivas, A., Laskin, M., Abbeel, P.: Curl: contrastive unsupervised representations for reinforcement learning. arXiv preprint arXiv:2004.04136 (2020)
taigw: Simple CRF python package. https://github.com/HiLab-git/SimpleCRF
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Melnik, A., Harter, A., Limberg, C., Rana, K., Sünderhauf, N., Ritter, H. (2021). Critic Guided Segmentation of Rewarding Objects in First-Person Views. In: Edelkamp, S., Möller, R., Rueckert, E. (eds) KI 2021: Advances in Artificial Intelligence. KI 2021. Lecture Notes in Computer Science(), vol 12873. Springer, Cham. https://doi.org/10.1007/978-3-030-87626-5_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-87626-5_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87625-8
Online ISBN: 978-3-030-87626-5
eBook Packages: Computer ScienceComputer Science (R0)