Abstract
In order to bridge the semantic gap, learning the semantics of images automatically using visual features alone has been an area of active research. Recently, visual keywords extracted from images have been shown to provide a useful intermediate representation for image characterization and retrieval. A challenging problem is to find effectively ways of extracting, representing and using the context of visual keyword for learning image semantic. In this paper, we will present a number of kernel and spectral methods which our research group has developed for learning the semantics of images, which can be applied to a variety of image annotation, categorization and retrieval tasks. To capture the context of visual keywords, we propose two contextual kernels, called spatial Markov kernel and spatial mismatch kernel, respectively. The first kernel is defined based on Markov models, while the second kernel is motivated from the concept of string kernel and derived without the use of any generative models. The experimental results show that the context captured by our kernels is very effective for learning the semantics of images. Moreover, to learn a semantically compact (or high level) vocabulary, we further propose a spectral embedding method to capture the local intrinsic geometric (i.e. manifold) structure of the original abundant visual keywords. This spectral method can also be applied to manifold learning on textual keywords for image annotation refinement. The experimental results show that our spectral methods lead to significant improvement in performance by capturing the manifold structure of visual or textual keywords.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Feng, S., Manmatha, R, Lavrenko, V.: Multiple Bernoulli Relevance Models for Image and Video Annotation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1002–1009. IEEE Press (2004)
Lu, Z., Peng, Y., Ip, H.: Image Categorization via Robust pLSA. Pattern Recognition Letters, 31 (1), 36–43 (2010)
Lu, Z., Ip, H.: Generalized Relevance Models for Automatic Image Annotation. In: Pacific-Rim Conference on Multimedia (PCM), pp. 245–255. Springer Press (2009)
Lu, Z., Ip, H., He, Q.: Context-Based Multi-Label Image Annotation. In: ACM International Conference on Image and Video Retrieval (CIVR). ACM Press (2009)
Li, J., Wang, J.: Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach. IEEE Trans. on Pattern Analysis and Machine Intelligence, 25(9), 1075—1088 (2003)
Yu, F., Ip H.: Automatic Semantic Annotation of Images Using Spatial Hidden Markov Model. In: International Conference on Multimedia and Expo (ICME), pp. 305–308. IEEE Press (2006)
Wang, L., Lu, Z., Ip, H.: Image Categorization Based on a Hierarchical Spatial Markov Model. In: International Conference on Computer Analysis of Images and Patterns (CAIP), pp. 766–773. Springer Press (2009)
Lu, Z., Ip, H.: Combining Context, Consistency, and Diversity Cues for Interactive Image Categorization. IEEE Trans. on Multimedia, 12(3), 194–203 (2010)
Lu, Z., Ip, H.: Image Categorization with Spatial Mismatch Kernels. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 397–404. IEEE Press (2009)
Lu, Z., Ip, H.: Learning the Semantics of Images Using Visual and Semantic Context. IEEE Trans. on Multimedia. (Under second round review)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2169–2178. IEEE Press (2006)
Leslie, C., Eskin, E., Noble, W.: The Spectrum Kernel: A String Kernel for SVM Protein Classification. In: Pacific Symposium on Biocomputing, pp. 566–575. (2002)
Rodgers, J., Nicewander, W: Thirteen Ways to Look at the Correlation Coefficient. The American Statistician, 42 (1), 59–66 (1988)
Liu, J., Yang, Y., Shah, M.: Learning Semantic Visual Vocabularies Using Diffusion Distance. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 461–468. IEEE Press (2009).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media B.V.
About this paper
Cite this paper
Ip, H.H.S. (2011). Kernel and Spectral Methods for Learning the Semantics of Images. In: Gelenbe, E., Lent, R., Sakellari, G., Sacan, A., Toroslu, H., Yazici, A. (eds) Computer and Information Sciences. Lecture Notes in Electrical Engineering, vol 62. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9794-1_60
Download citation
DOI: https://doi.org/10.1007/978-90-481-9794-1_60
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-9793-4
Online ISBN: 978-90-481-9794-1
eBook Packages: EngineeringEngineering (R0)