Abstract
We present a system architecture for domestic robots that allows them to learn object categories after one sample object was initially learned. We explore the situation in which a human teaches a robot a novel object, and the robot enhances such learning by using a large amount of image data from the Internet. The main goal of this research is to provide a robot with capabilities to enhance its learning while minimizing time and effort required for a human to train a robot. Our active learning approach consists of learning the object name using speech interface, and creating a visual object model by using a depth-based attention model adapted to the robot’s personal space. Given the object’s name (keyword), a large amount of object-related images from two main image sources (Google Images and the LabelMe website) are collected. We deal with the problem of separating good training samples from noisy images by performing two steps: (1) Similar image selection using a Simile Selector Classifier, and (2) non-real image filtering by implementing a variant of Gaussian Discriminant Analysis. After web image selection, object category classifiers are then trained and tested using different objects of the same category. Our experiments demonstrate the effectiveness of our robot learning approach.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agarwal S, Awan A, Roth D (2004) Learning to detect objects in images via a sparse, part-based representation. IEEE PAMI 20(11):1475–1490
Leibe B, Leonardis A, Schiele B (2004) Combined object categorization and segmentation with an implicit shape model. In: Workshop on statistical learning in computer vision, ECCV
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the CVPR, pp 511–518
Opelt A, Fessenegger A, Auer P (2004) Weak hypotheses and boosting for generic object detection and recognition. In: ECCV
Thomaz AL, Cakmak M (2009) Learning about objects with human teachers. In: Proceedings of the international conference on human robot interaction (HRI)
Fergus R, Fei-Fei L, Perona P, Zisserman A (2005) Learning object categories from Google’s Image search. ICCV 2
Vijayanarasimhan S, Grauman K (2008) Keywords to visual categories: multiple-instance learning for weakly supervised object categorization. In: CVPR
Vijayanarasimhan S, Grauman K (2011) Large-scale live active learning: training object detectos with crawled data and crowds. In: CVPR
Tsai D, Jing Y, Liu Y, Rowley H, Ioffe S, Rehg JM (2011) Large-scale image annotation using visual synset. In: ICCV
Li L-J, Fei-Fei L (2007) Optimol: automatic object picture collection via incremental model learning. In: CVPR
Russell BC, Torralba A, Murphy KP, Freeman WT (2008) LabelMe: a database and web-based tool for image annotation. Int J Comput Vis 77(1–3):157–173
Breazeal C, Thomaz AL (2008) Learning from human teachers with socially guided exploration. In: Proceedings of the international conference on robots and automation (ICRA)
Vogel A, Raghunathan K, Jurafsky D (2005) Dialog with robots. In: AAAI
Mansur A, Sakata K, Rukhsana T, Kobayashi Y, Kuno Y (2008) Human robot interaction through simple expressions for object recognition. The 17th IEEE international symposium on robot and human interactive communication, RO-MAM
Cao L, Kobayashi Y, Kuno Y (2009) Spatial relation model for object recognition in human-robot interaction. In: Proceedings of the 5th international conference on Emerging intelligent computing technology and applications, ICIC
Microsoft Speech Application Programming Interface (API) and SDK, Version 5.1, Microsoft Corporation, http://www.microsoft.com/speech
Drummond C, Holte R (2003) Class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on international conference on machine learning, ICML
Ha TM, Bunke H (1997) Off-line, handwritten numeral recognition by perturbation method. IEEE Trans Pattern Anal Mach Intell 19(5):535–539
Itti L, Koch C (2001) Computational modeling of visual attention. Nat Rev: Neurosci 2:194–203
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259
Sun Y, Fisher R (2003) Object-based visual attention for computer vision. Artif Intell 146:77–123
Frintrop S (2006) VOCUS: a visual attention system for object detection and goal-directed search.Springer, Heidelberg, vol 3899. LNAI 3–540-32759-2
Hall ET (1966) The hidden dimension. Anchor Books, New York
Microsoft Knect for Windows SDK BETA from Microsoft Research, http://research.microsoft.com/en-us/um/redmond/projects/kinectsdk
Shotton J, Fitzgibbon AW, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: CVPR, pp 1297–1304
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Leibe B, Leonardis A, Schiele B (2008) Robust object detection with interleaved categorization and segmentation. Int J Comput Vis 77(1-3):259–289
Gall J, Lempitsky V (2009) Class-specific hough forests for object detection. IEEE conference on computer vision and pattern recognition, pp 1022–1029
Acknowledgments
This work was supported in part by Grant-in-Aid for Scientific Research (C) 23500242.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Penaloza, C.I., Mae, Y., Ohara, K. et al. Web-enhanced object category learning for domestic robots. Intel Serv Robotics 6, 53–67 (2013). https://doi.org/10.1007/s11370-012-0126-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11370-012-0126-y