Abstract
Discriminative methods for visual object category recognition are typically non-probabilistic, predicting class labels but not directly providing an estimate of uncertainty. Gaussian Processes (GPs) provide a framework for deriving regression techniques with explicit uncertainty models; we show here how Gaussian Processes with covariance functions defined based on a Pyramid Match Kernel (PMK) can be used for probabilistic object category recognition. Our probabilistic formulation provides a principled way to learn hyperparameters, which we utilize to learn an optimal combination of multiple covariance functions. It also offers confidence estimates at test points, and naturally allows for an active learning paradigm in which points are optimally selected for interactive labeling. We show that with an appropriate combination of kernels a significant boost in classification performance is possible. Further, our experiments indicate the utility of active learning with probabilistic predictive models, especially when the amount of training data labels that may be sought for a category is ultimately very small.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Abramson, Y., & Freund, Y. (2004). Active learning for visual object recognition (Technical report). UCSD.
Belongie, S., Malik, J., & Puzicha, J. (2001). Matching shapes. In ICCV.
Berg, A., & Malik, J. (2001). Geometric blur for template matching. In CVPR.
Boiman, O., Shechtman, E., & Irani, M. (2008). In defense of nearest-neighbor based image classification. In CVPR.
Bosch, A., Zisserman, A., & Muñoz, X. (2007). Representing shape with a spatial pyramid kernel. In CIVR.
Chang, C., & Lin, C. (2001). LIBSVM: a library for SVMs.
Chang, E. Y., Tong, S., Goh, K., & Chang, C. (2005). Support vector machine concept-dependent active learning for image retrieval. IEEE Transactions on Multimedia.
Chum, O., & Zisserman, A. (2007). An exemplar model for learning object classes. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Evgeniou, T., Pontil, M., & Poggio, T. (2000). Regularization networks and support vector machines. Advances in Computational Mathematics, 13(1).
Fei-Fei, L., Fergus, R., & Perona, P. (2006). One-shot learning of object categories. IEEE Transaction on Pattern Recognition and Machine Intelligence.
Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In CVPR.
Freund, Y., Seung, H. S., Shamir, E., & Tishby, N. (1997). Selective sampling using the query by committee algorithm. Machine Learning, 28(2–3).
Frome, A., Singer, Y., Sha, F., & Malik, J. (2007). Learning globally-consistent local distance functions for shape-based image retrieval and classification. In ICCV.
Grauman, K., & Darrell, T. (2005). The pyramid match kernel: Discriminative classification with sets of image features. In ICCV.
Grauman, K., & Darrell, T. (2006a). Approximate correspondences in high dimensions. In NIPS.
Grauman, K., & Darrell, T. (2006b). Unsupervised learning of categories from sets of partially matching image features. In CVPR.
Kadir, T., & Brady, M. (2003). Scale saliency: A novel approach to salient feature and scale selection. In International conference visual information engineering.
Kapoor, A., Grauman, K., Urtasun, R., & Darrell, T. (2007). Active learning with Gaussian processes for object categorization. In ICCV.
Kim, H. C., Kim, D., Ghahramani, Z., & Bang, S. Y. (2006). Appearance-based gender classification with Gaussian processes. Pattern Recognition Letters.
Krause, A., Singh, A., & Guestrin, C. (2008). Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies. In JMLR.
Kumar, A., & Sminchisescu, C. (2007). Support kernel machines for object recognition. In ICCV.
Lawrence, N. (2004). Gaussian process latent variable models for visualisation of high dimensional data. In NIPS.
Lawrence, N., Seeger, M., & Herbrich, R. (2002). Fast sparse Gaussian process method: Informative vector machines. In NIPS.
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR.
Lin, Y. Y., Liu, T. Y., & Fuh, C. S. (2007). Local ensemble kernel learning for object category recognition. In CVPR.
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. IJCV, 60(2).
MacKay, D. (1992) Information-based objective functions for active data selection. Neural Computation, 4(4).
McCallum, A. K., & Nigam, K. (1998). Employing EM in pool-based active learning for text classification. In ICML.
Mikolajczyk, K., & Schmid, C. (2001). Indexing based on scale invariant interest points. In ICCV.
Mikolajczyk, K., & Schmid, C. (2004). Scale and affine invariant interest point detectors. IJCV, 1(60), 63–86.
Minka, T. P. (2001). A family of algorithms for approximate Bayesian inference. PhD thesis, MIT.
Moosmann, B. T. F., & Jurie, F. (2007). Fast discriminative visual codebooks using randomized clustering forests. In NIPS.
Muslea, I., Minton, S., & Knoblock, C. A. (2002). Active + semi-supervised learning = robust multi-view learning. In ICML.
Nister, D., & Stewenius, H. (2006). Scalable recognition with a vocabulary tree. In CVPR.
Nowak, E., Jurie, F., & Triggs, B. (2006). Sampling strategies for bag-of-features image classification. In ECCV.
Rasmusen, C. E., & Williams, C. (2006). Gaussian processes for machine learning. Cambridge: MIT Press.
Seeger, M. (2004). Gaussian processes for machine learning. International Journal of Neural Systems, 14(2).
Shen, Y., Ng, A., & Seeger, M. (2006). Fast Gaussian process regression using kd-trees. In NIPS.
Sivic, J., & Zisserman, A. (2003). Video Google: a text retrieval approach to object matching in videos. In ICCV.
Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering object categories in image collections. In ICCV.
Snelson, E., & Ghahramani, Z. (2006). Sparse Gaussian processes using pseudo-inputs. In NIPS.
Sudderth, E., Torralba, A., Freeman, W., & Willsky, A. (2005). Describing visual scenes using transformed Dirichlet processes. In NIPS.
Tong, S., & Koller, D. (2000). Support vector machine active learning with applications to text classification. In ICML.
Tresp, V. (2000). Mixtures of Gaussian processes. In NIPS.
Tsang, I. W.-H., & Kwok, J. T.-Y. (2006). Efficient hyperkernel learning using second-order cone programming. IEEE Transactions on Neural Networks.
Urtasun, R., & Darrell, T. (2008). Local probabilistic regression for activity-independent human pose inference. In CVPR.
Urtasun, R., Fleet, D. J., Hertzman, A., & Fua, P. (2005). Priors for people tracking from small training sets. In ICCV.
Urtasun, R., Fleet, D. J., & Fua, P. (2006). Gaussian process dynamical models for 3d people tracking. In CVPR.
Varma, M., & Ray, D. (2007). Learning the discriminative power-invariance trade-off. In ICCV.
von Ahn, L., & Dabbish, L. (2004). Labeling images with a computer game. In ACM CHI.
von Ahn, L., Liu, R., & Blum, M. (2006). Peekaboom: A game for locating objects in images. In ACM CHI.
Wallraven, C., Caputo, B., & Graf, A. (2003). Recognition with local features: the kernel recipe. In ICCV.
Williams, C., & Barber, D. (1998). Bayesian classification with Gaussian processes. IEEE Transaction on Pattern Recognition and Machine Intelligence, 20(12), 1342–1351.
Williams, O. (2006). A switched Gaussian process for estimating disparity and segmentation in binocular stereo. In NIPS.
Zhang, H., Berg, A., Maire, M., & Malik, J. (2006). SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. In CVPR.
Zhu, X., Lafferty, J., & Ghahramani, Z. (2003). Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions. In Workshop on the continuum from labeled to unlabeled data in machine learning and data mining at ICML.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Kapoor, A., Grauman, K., Urtasun, R. et al. Gaussian Processes for Object Categorization. Int J Comput Vis 88, 169–188 (2010). https://doi.org/10.1007/s11263-009-0268-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-009-0268-3