Abstract
Visual vocabulary representation approach has been successfully applied to many multimedia and vision applications, including visual recognition, image retrieval, and scene modeling/categorization. The idea behind the visual vocabulary representation is that an image can be represented by visual words, a collection of local features of images. In this work, we will develop a new scheme for the construction of visual vocabulary based on the analysis of visual word contents. By considering the content homogeneity of visual words, we design a visual vocabulary which contains macro-sense and micro-sense visual words. The two types of visual words are appropriately further combined to describe an image effectively. We also apply the visual vocabulary to construct image retrieving and categorization systems. The performance evaluation for the two systems indicates that the proposed visual vocabulary achieves promising results.
Similar content being viewed by others
References
Ancuti C, Bekaert P (2007) SIFT-CCH: Increasing the SIFT distinctness by Color Co-occurrence Histograms. IEEE Int Symp Image Signal Process Anal 130–135
Baker LD, McCallum AK (1998) Distributional clustering of words for text classification. Proc Assoc Comput Mach Spec Interes Group Inf Retr 96–103
Bekkerman R, El-Yaniv R, Tishby N, Winter Y (2001) On feature distributional clustering for text categorization. Proc. Assoc Comput Mach Spec Interes Group Inf Retr 146–153
Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):77–84
Blei DM, Ng A, Jordan M (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Bolovinou A, Pratikakis I, Perantonis S (2012) Bag of spatio-visual words for context inference in scene classification. Pattern Recognit 46(2013):1039–1053
Bosch A, Zisserman A, Muñoz X (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans Pattern Anal Mach Intell 30(4):712–727
Cao Y, Sun F, Wang D, Zhou J (2012) Image cluster and retrieval with latent Dirichlet allocation model. Int J Digit Content Technol Appl 6(18):89–98
Deng Y, Manjunath BS, Kenney C, Moore MS, Shin H (2001) An efficient color representation for image retrieval. IEEE Trans Image Proc 10(1)
Hörster E, Lienhart R, Slaney M (2007) Image retrieval on large-scale image databases. Proceedings of the 6th ACM international conference on Image and video retrieval. 17–24
Ji R, Yao H, Liu W, Sun X, Tian Q (2012) Task-dependent visual-codebook compression. IEEE Trans Image Process 21(4):2282–2293
Jiang YG, Yang J, Ngo CW, Hauptmann AG (2010) Representations of keypoint-based semantic concept detection: a comprehensive study. IEEE Trans Multimedia 12(1):42–53
Kesorn K, Poslad S (2012) An enhanced bag-of-visual word vector space model to represent visual content in athletics images. IEEE Trans Multimedia 14(1):211–222
Kuo C, Yang NC, Kuo CM, Huang LK (2015) Image retrieval using point- and block-based visual vocabulary. IEEE 2015 Int Sympo Next Gener Electron 1–4
Li T, Mei T, Kweon IS, Hua XS (2011) Contextual bag-of-words for visual categorization. IEEE Trans Circ Syst Video Technol 21(4):381–392
Li FF, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. IEEE Comput Vis Pattern Recognit 2:524–531
Liu H, Zhang C (2007) Codebook design of keyblock based image retrieval. LNCS Entertain Comput Icec470–474
López-Sastre RJ, Tuytelaars T, RodrÍguez FJA, Bascón SM (2010) Towards a more discriminative and semantic visual vocabulary. Comput Vis Image Underst 115(2011):415–425
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Ma WY, Deng Y, Manjunath BS (1997) Tools for texture/color based search of images. Proc SPIE 3106:496–507
Manjunath BS, Ohm JR, Vasudevan VV, Yamada A (2001) Color and texture descriptors. IEEE Trans Circ Syst Video Technol 11(6):703–714
Mojsilovic A, Hu J, Soljanin E (2002) Extraction of perceptually important colors and similarity measurement for image matching, retrieval, and analysis. IEEE Trans Image Proc 11(11)
Mojsilovic A, Kovacevic J, Hu J, Safranek RJ, Ganapathy SK (2000) Matching and retrieval based on the vocabulary and grammar of color patterns. IEEE Trans Image Proc 9(1)
Perronnin F (2008) Universal and adapted vocabularies for generic visual categorization. IEEE Trans Pattern Anal Mach Intell 30(7):1243–1256
Qin J, Yung NC (2009) Scene categorization via contextual visual words. Pattern Recognit 43(2010):1874–1888
Ren R, Collomosse J (2012) Visual sentences for pose retrieval over low-resolution cross-media dance collections. IEEE Trans Multimedia 14(6):1652–1661
Rocha A, Carvalho T, Jelinek HF, Goldenstein S, Wainer J (2012) Points of interest and visual dictionaries for automatic retinal lesion detection. IEEE Trans Biomed Eng 59(8):2244–2253
Sudderth EB, Torralba A, Freeman WT, Willsky AS (2005) Describing visual scenes using transformed dirichlet processes. Adv Neural Inf Proc Syst 1297–1304
Thibos L (1989) Image processing by the human eye. Adv Intell Robot Syst Conf 1989:1148–1153
Wang C, Blei D, Li FF (2009) Simultaneous image classification and annotation. IEEE Comput Vis Pattern Recog (CVPR) 1903–1910
Ward M, Grinstein G, Keim D (2010) Interactive data visualization: foundations, techniques, and application, chapter 3. Hum Percept Inf Proc 73–128, A K Peters/CRC Press
Wei S, Cheng C (2009) Wood image retrieval algorithm based on keyblock distribution. IEEE Int Conf Comput Intell Softw Eng
Wu L, Hoi SCH, Yu N (2010) Semantics-preserving bag-of-words models and applications. IEEE Trans Image Proc 19(7):1908–1920
Xu S, Fang T, Li D, Wang S (2010) Object classification of aerial images with bag-of-visual words. IEEE Geosci Remote Sens Lett 7(2):366–370
Yamada A, Pickering M, Jeannin S, Jens LC (2001) MPEG-7 visual part of experimentation model version 9.0-part 3 dominant color. ISO/IEC JTC1/SC29/WG11/N3914, Pisa
Yang NC, Chang WH, Kuo CM, Li TH (2008) A fast MPEG-7 dominant color extraction with new similarity measure for image retrieval. J Vis Commun Image Represent 19:92–105
Zhang S, Tian Q, Hua G, Huang Q, Gao W (2011) Generating descriptive visual words and visual phrases for large-scale image applications. IEEE Trans Image Proc 20(9):3664–2677
Zhou W, Li H, Lu Y, Tian Q (2012) Principal visual word discovery for automatic license plate detection. IEEE Trans Image Proc 21(9):4269–4279
Zhu L, Rao A, Zhang A (2002) Theory of keyblock-based image retrieval. ACM Trans Inf Syst 224–257
Zhu L, Tang C, Rao A, Zhang A (2001) Using thesaurus to model keyblock-based image retrieval. IEEE Int Conf Multimedia Expo 237–240
Zhu L, Zhang A, Rao A, Cedar RS (2000) Keyblock: an approach for content-based image retrieval. ACM Multimedia 157–166
Acknowledgments
The authors would like to express their sincere thanks to the anonymous reviewers for their invaluable comments and suggestions. This work was supported by the National Science Counsel of R.O.C. Granted NSC. 102-2221-E-214 -040.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kuo, CM., Hsieh, CH., Yang, NC. et al. Constructing a discriminative visual vocabulary with macro and micro sense of visual words. Multimed Tools Appl 75, 16983–17017 (2016). https://doi.org/10.1007/s11042-015-2970-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-2970-1