Semantic Curiosity for Active Visual Learning

Devendra Singh Chaplot¹²,
Helen Jiang¹²,
Saurabh Gupta¹³ &
…
Abhinav Gupta¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12351))

Included in the following conference series:

European Conference on Computer Vision

Abstract

In this paper, we study the task of embodied interactive learning for object detection. Given a set of environments (and some labeling budget), our goal is to learn an object detector by having an agent select what data to obtain labels for. How should an exploration policy decide which trajectory should be labeled? One possibility is to use a trained object detector’s failure cases as an external reward. However, this will require labeling millions of frames required for training RL policies, which is infeasible. Instead, we explore a self-supervised approach for training our exploration policy by introducing a notion of semantic curiosity. Our semantic curiosity policy is based on a simple observation – the detection outputs should be consistent. Therefore, our semantic curiosity rewards trajectories with inconsistent labeling behavior and encourages the exploration policy to explore such areas. The exploration policy trained via semantic curiosity generalizes to novel scenes and helps train an object detector that outperforms baselines trained with other possible alternatives such as random exploration, prediction-error curiosity, and coverage-maximizing exploration.

D. S. Chaplot and H. Jiang—Equal Contribution.

Webpage: https://devendrachaplot.github.io/projects/SemanticCuriosity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Look Around and Learn: Self-training Object Detection by Exploration

Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning

Disentangling What and Where for 3D Object-Centric Representations Through Active Inference

Notes

1.
Note that curiosity-based policy has the lowest mAP because of outlier toilet category.

References

Ammirato, P., Poirson, P., Park, E., Košecká, J., Berg, A.C.: A dataset for developing and benchmarking active vision. In: 2017 IEEE International Conference on Robotics and Automation (ICRA). pp. 1378–1385. IEEE (2017)
Google Scholar
Anderson, P., et al.: Vision-and-language navigation: interpreting visually-grounded navigation instructions in real environments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3674–3683 (2018)
Google Scholar
Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3(Nov), 397–422 (2002)
Google Scholar
Badrinarayanan, V., Galasso, F., Cipolla, R.: Label propagation in video sequences. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3265–3272. IEEE (2010)
Google Scholar
Bajcsy, R.: Active perception. Proc. IEEE 76(8), 966–1005 (1988)
Article Google Scholar
Bengio, Y., Delalleau, O., Le Roux, N.: Label propagation and quadratic criterion. In: Semi-Supervised Learning (2006)
Google Scholar
Chandra, S., Couprie, C., Kokkinos, I.: Deep spatio-temporal random fields for efficient video segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8915–8924 (2018)
Google Scholar
Chang, A., et al.: Matterport3D: learning from RGB-D data in indoor environments. International Conference on 3D Vision (3DV) (2017). http://kaldir.vc.in.tum.de/matterport/MP_TOS.pdf
Chaplot, D.S., Gandhi, D., Gupta, S., Gupta, A., Salakhutdinov, R.: Learning to explore using active neural SLAM. In: ICLR (2020). https://openreview.net/forum?id=HklXn1BKDH
Chaplot, D.S., Lample, G.: Arnold: an autonomous agent to play fps games. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Chaplot, D.S., Parisotto, E., Salakhutdinov, R.: Active neural localization. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=ry6-G_66b
Chaplot, D.S., Salakhutdinov, R., Gupta, A., Gupta, S.: Neural topological SLAM for visual navigation. In: CVPR (2020)
Google Scholar
Chaplot, D.S., Sathyendra, K.M., Pasumarthi, R.K., Rajagopal, D., Salakhutdinov, R.: Gated-attention architectures for task-oriented language grounding. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Chen, T., Gupta, S., Gupta, A.: Learning exploration policies for navigation. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=SyMWn05F7
Chen, X., Shrivastava, A., Gupta, A.: NEIL: extracting visual knowledge from web data. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1409–1416 (2013)
Google Scholar
Das, A., Datta, S., Gkioxari, G., Lee, S., Parikh, D., Batra, D.: Embodied question answering. In: CVPR (2018)
Google Scholar
Dosovitskiy, A., Koltun, V.: Learning to act by predicting the future. In: ICLR (2017)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html
Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: learning skills without a reward function. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=SJx63jRqFm
Fang, K., Toshev, A., Fei-Fei, L., Savarese, S.: Scene memory transformer for embodied agents in long-horizon tasks. In: CVPR (2019)
Google Scholar
Fathi, A., Balcan, M.F., Ren, X., Rehg, J.M.: Combining self training and active learning for video segmentation. In: Proceedings of the British Machine Vision Conference, Georgia Institute of Technology (2011)
Google Scholar
Fox, D., Burgard, W., Thrun, S.: Active Markov localization for mobile robots. Robot. Auton. Syst. 25(3–4), 195–207 (1998)
Article Google Scholar
Gadde, R., Jampani, V., Gehler, P.V.: Semantic video CNNs through representation warping. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4453–4462 (2017)
Google Scholar
Gal, Y., Islam, R., Ghahramani, Z.: Deep Bayesian active learning with image data. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1183–1192. JMLR.org (2017)
Google Scholar
Gordon, D., Kembhavi, A., Rastegari, M., Redmon, J., Fox, D., Farhadi, A.: IQA: visual question answering in interactive environments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4089–4098 (2018)
Google Scholar
Gupta, S., Davidson, J., Levine, S., Sukthankar, R., Malik, J.: Cognitive mapping and planning for visual navigation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2616–2625 (2017)
Google Scholar
Hermann, K.M., et al.: Grounded language learning in a simulated 3D world. arXiv preprint arXiv:1706.06551 (2017)
Jaksch, T., Ortner, R., Auer, P.: Near-optimal regret bounds for reinforcement learning. J. Mach. Learn. Res. 11(Apr), 1563–1600 (2010)
Google Scholar
Jayaraman, D., Grauman, K.: Learning to look around: intelligently exploring unseen environments for unknown tasks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1238–1247 (2018)
Google Scholar
Kuo, W., Häne, C., Yuh, E., Mukherjee, P., Malik, J.: Cost-sensitive active learning for intracranial hemorrhage detection. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11072, pp. 715–723. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00931-1_82
Chapter Google Scholar
Lample, G., Chaplot, D.S.: Playing FPS games with deep reinforcement learning. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings of 8th International Conference on Computer Vision, vol. 2, pp. 416–423, July 2001
Google Scholar
Mirowski, P., et al.: Learning to navigate in complex environments. ICLR (2017)
Google Scholar
Misra, I., Girshick, R., Fergus, R., Hebert, M., Gupta, A., van der Maaten, L.: Learning by asking questions. In: CVPR (2018)
Google Scholar
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 16–17 (2017)
Google Scholar
Pathak, D., Gandhi, D., Gupta, A.: Self-supervised exploration via disagreement. In: ICML (2019)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Savva, M., Chang, A.X., Dosovitskiy, A., Funkhouser, T., Koltun, V.: MINOS: multimodal indoor simulator for navigation in complex environments. arXiv:1712.03931 (2017)
Savva, M., et al.: Habitat: a platform for embodied AI research. In: ICCV (2019)
Google Scholar
Schmidhuber, J.: A possibility for implementing curiosity and boredom in model-building neural controllers. In: Proceedings of the International Conference on Simulation of Adaptive Behavior: From Animals to Animats, pp. 222–227 (1991)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Sener, O., Savarese, S.: Active learning for convolutional neural networks: a core-set approach. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=H1aIuk-RW
Settles, B.: Active learning literature survey. Technical report, University of Wisconsin-Madison Department of Computer Sciences (2009)
Google Scholar
Siddiqui, Y., Valentin, J., Nießner, M.: ViewAL: active learning with viewpoint entropy for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9433–9443 (2020)
Google Scholar
Straub, J., et al.: The replica dataset: a digital replica of indoor spaces. arXiv preprint arXiv:1906.05797 (2019). https://github.com/facebookresearch/Replica-Dataset/blob/master/LICENSE
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998). http://www.cs.ualberta.ca/~sutton/book/the-book.html
Tulsiani, S., Zhou, T., Efros, A.A., Malik, J.: Multi-view supervision for single-view reconstruction via differentiable ray consistency. TPAMI (2019)
Google Scholar
Vijayanarasimhan, S., Grauman, K.: Large-scale live active learning: training object detectors with crawled data and crowds. Int. J. Comput. Vis. 108(1–2), 97–114 (2014)
Article MathSciNet Google Scholar
Vondrick, C., Patterson, D., Ramanan, D.: Efficiently scaling up crowdsourced video annotation. Int. J. Comput. Vis. 1–21 (2013). http://dx.doi.org/10.1007/s11263-012-0564-1
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2
Wu, Y., Tian, Y.: Training agent for first-person shooter game with actor-critic curriculum learning. In: ICLR (2017)
Google Scholar
Xia, F., Zamir, A.R., He, Z.Y., Sax, A., Malik, J., Savarese, S.: Gibson Env: real-world perception for embodied agents. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2018). http://svl.stanford.edu/gibson2/assets/GDS_agreement.pdf
Yang, J., Lu, J., Lee, S., Batra, D., Parikh, D.: Visual curiosity: learning to ask questions to learn visual recognition. In: Conference on Robot Learning, pp. 63–80 (2018)
Google Scholar
Yang, J., et al.: Embodied amodal recognition: learning to move to perceive objects. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2040–2050 (2019)
Google Scholar
Yoo, D., Kweon, I.S.: Learning loss for active learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 93–102 (2019)
Google Scholar
Zhu, Y., et al.: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3357–3364. IEEE (2017)
Google Scholar

Download references

Acknowledgements

This work was supported by IARPA DIVA D17PC00340, ONR MURI, ONR Grant N000141812861, ONR Young Investigator, DARPA MCS, and NSF Graduate Research Fellowship. We would also like to thank NVIDIA for GPU support.

Licenses for referenced datasets:

Gibson: http://svl.stanford.edu/gibson2/assets/GDS_agreement.pdf

Matterport3D: http://kaldir.vc.in.tum.de/matterport/MP_TOS.pdf

Replica: https://raw.githubusercontent.com/facebookresearch/Replica-Dataset/master/LICENSE.

Author information

Authors and Affiliations

Carnegie Mellon University, Pittsburgh, USA
Devendra Singh Chaplot, Helen Jiang & Abhinav Gupta
UIUC, Champaign, USA
Saurabh Gupta

Authors

Devendra Singh Chaplot
View author publications
You can also search for this author in PubMed Google Scholar
Helen Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Saurabh Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Abhinav Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Devendra Singh Chaplot .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 5714 KB)

Supplementary material 2 (mp4 9164 KB)

Supplementary material 3 (mp4 5119 KB)

Supplementary material 4 (mp4 9828 KB)

Supplementary material 5 (mp4 1906 KB)

Supplementary material 6 (mp4 8418 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chaplot, D.S., Jiang, H., Gupta, S., Gupta, A. (2020). Semantic Curiosity for Active Visual Learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12351. Springer, Cham. https://doi.org/10.1007/978-3-030-58539-6_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-58539-6_19
Published: 07 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58538-9
Online ISBN: 978-3-030-58539-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Semantic Curiosity for Active Visual Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Look Around and Learn: Self-training Object Detection by Exploration

Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning

Disentangling What and Where for 3D Object-Centric Representations Through Active Inference

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Semantic Curiosity for Active Visual Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Look Around and Learn: Self-training Object Detection by Exploration

Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning

Disentangling What and Where for 3D Object-Centric Representations Through Active Inference

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation