Abstract
Given a set of images of scenes containing multiple object categories (e.g. grass, roads, buildings) our objective is to discover these objects in each image in an unsupervised manner, and to use this object distribution to perform scene classification. We achieve this discovery using probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature, here applied to a bag of visual words representation for each image. The scene classification on the object distribution is carried out by a k-nearest neighbour classifier.
We investigate the classification performance under changes in the visual vocabulary and number of latent topics learnt, and develop a novel vocabulary using colour SIFT descriptors. Classification performance is compared to the supervised approaches of Vogel & Schiele [19] and Oliva & Torralba [11], and the semi-supervised approach of Fei Fei & Perona [3] using their own datasets and testing protocols. In all cases the combination of (unsupervised) pLSA followed by (supervised) nearest neighbour classification achieves superior results. We show applications of this method to image retrieval with relevance feedback and to scene classification in videos.
Chapter PDF
Similar content being viewed by others
References
Csurka, G., Bray, C., Dance, C., Fan, L.: Visual categorization with bags of keypoints. In: SLCV Workshop, ECCV, pp. 1–22 (2004)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, San Diego, California (2005)
Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: CVPR, Washington, DC, USA, pp. 524–531 (2005)
Geodeme, T., Tuytelaars, T., Vanacker, G., Nuttin, M., Van Gool, L.: Omnidirectional Sparse Visual Path Following with Occlusion-Robust Feature Tracking. In: OMNIVIS Workshop, ICCV (2005)
Hofmann, T.: Probabilistic latent semantic indexing. ACM SIGIR (1998)
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 41, 177–196 (2001)
Lazebnik, S., Schmid, C., Ponce, J.: A sparse texture representation using affine-invariant regions. In: CVPR, vol. 2, pp. 319–324 (2003)
Leung, T., Malik, J.: Representing and recognizing the visual appearance of materials using three-dimensional textons. IJCV 43, 29–44 (2001)
Lowe, D.: Distinctive image features from scale invariant keypoints. IJCV 60, 91–110 (2004)
Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. IJCV 60, 63–86 (2004)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV (42) 145–175
Quelhas, P., Monay, F., Odobez, J., Gatica-Perez, D., Tuytelaars, T., Van Gool, L.: Modeling scenes with local descriptors and latent aspects. In: ICCV, Beijing, China (2005)
Rocchio, J.: Relevance feedback in information retrieval. In: The SMART Retrieval System - Experiments in Automatic Document Processing. Prentice Hall, Englewood Cliffs, NJ (1971)
Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: ICCV (2003)
Sivic, J., Russell, B., Efros, A., Zisserman, A., Freeman, W.T.: Discovering objets and their locations in images. In: ICCV, Beijing, China (2005)
Szummer, M., Picard, R.W.: Indoor-outdoor image classification. In: ICCV, Bombay, India, pp. 42–50 (1998)
Vailaya, A., Figueiredo, A., Jain, A., Zhang, H.: Image classification for content-based indexing. T-IP 10 (2001)
Varma, M., Zisserman, A.: Texture classification: Are filter banks necessary? In: CVPR, Madison, Wisconsin, vol. 2, pp. 691–698 (2003)
Vogel, J., Schiele, B.: Natural scene retrieval based on a semantic modeling step. In: Enser, P.G.B., Kompatsiaris, Y., O’Connor, N.E., Smeaton, A.F., Smeulders, A.W.M. (eds.) CIVR 2004. LNCS, vol. 3115, pp. 207–215. Springer, Heidelberg (2004)
Zhang, R., Zhang, Z.: Hidden semantic concept discovery in region based image retrieval. In: CVPR, Washington, DC, USA (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bosch, A., Zisserman, A., Muñoz, X. (2006). Scene Classification Via pLSA. In: Leonardis, A., Bischof, H., Pinz, A. (eds) Computer Vision – ECCV 2006. ECCV 2006. Lecture Notes in Computer Science, vol 3954. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11744085_40
Download citation
DOI: https://doi.org/10.1007/11744085_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33838-3
Online ISBN: 978-3-540-33839-0
eBook Packages: Computer ScienceComputer Science (R0)