Abstract
Scene recognition is an important and challenging problem in computer vision. One of the most used scene recognition methods is the bag-of-visual words. Despite the interesting results, this approach does not capture the detail richness of spatial information of the visual words on the image. In this paper, we propose a new method to describe the visual words using the fractal dimension. Our method estimates the fractal dimension of each visual word on image through box-counting method. The fractal dimension is capable of providing complex and spatial information of the visual words in a simple and efficient way. We validate our method on three well-known scene and object datasets, and the experimental results reveal that our method leads to highly discriminative features of the visual words. In addition, the proposed method has achieved competitive results compared to popular methods in scene classification.
Similar content being viewed by others
References
Backes AR, Eler DM, Minghim R, Bruno OM (2010) Characterizing 3d shapes using fractal dimension. In: 15th Iberoamerican congress on pattern recognition. Springer, Berlin, pp 14–21
Bader M (2013) How to construct space-filling curves. Springer, Berlin, pp 15–30
Bhattacharya P, Gavrilova M (2013) A survey of landmark recognition using the bag-of-words framework. In: Intelligent computer graphics 2012, chap. Springer, Berlin, pp 243–263. https://doi.org/10.1007/978-3-642-31745-3_13
Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: In workshop on statistical learning in computer vision, ECCV, pp 1–22
Cui Y, Cai Z, Lu W (2008) Scene recognition for mine rescue robot localization based on vision. Trans Nonferrous Metals Soc China 18(2):432–437
Fei-Fei L, Fergus R, Perona P (2004) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: Conference on computer vision and pattern recognition workshop, 2004. CVPRW ’04, pp 178–178. https://doi.org/10.1109/CVPR.2004.109
Gonçalves WN, Bruno OM (2013) Combining fractal and deterministic walkers for texture analysis and classification. Pattern Recognit. 46(11):2953–2968
Gonçalves WN, Machado BB, Bruno OM (2014) Texture descriptor combining fractal dimension and artificial crawlers. Physica A Stat Mech Appl 395:358–370. https://doi.org/10.1016/j.physa.2013.10.011
Hearst M, Dumais S, Osman E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst Appl 13(4):18–28. https://doi.org/10.1109/5254.708428
Huang K, Wang C, Tao D (2015) High-order topology modeling of visual words for image classification. IEEE Trans Image Process 24:3598–3608. https://doi.org/10.1109/TIP.2015.2449081
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651–666
Jiang Y, Yuan J, Yu G (2012) Randomized spatial partition for scene recognition. In: Computer vision–ECCV 2012. Springer, pp 730–743
Johnston R (2014) Least squares regression line. Springer, Dordrecht, pp 3526–3529
Khan R, Barat C, Muselet D, Ducottet C et al (2012) Spatial orientations of visual word pairs to improve bag-of-visual-words model. In: Proceedings of the British machine vision conference
Kwitt R, Vasconcelos N, Rasiwasia N (2012) Scene recognition on the semantic manifold. In: Computer vision–ECCV 2012. Springer, pp 359–372
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the 2006 IEEE computer society conference on computer vision and pattern recognition, CVPR ’06. IEEE Computer Society, Washington, DC, USA, pp 2169–2178. https://doi.org/10.1109/CVPR.2006.68
Li C, Reiter A, Hager GD (2015) Beyond spatial pooling: fine-grained representation learning in multiple domains. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 4913–4922 . https://doi.org/10.1109/CVPR.2015.7299125
Li LJ, Fei-Fei L (2007) What, where and who? Classifying events by scene and object recognition. In: 2007 IEEE 11th international conference on computer vision, pp 1–8. https://doi.org/10.1109/ICCV.2007.4408872
Li LJ, Su H, Fei-Fei L, Xing EP (2010) Object bank: a high-level image representation for scene classification and semantic feature sparsification. In: Advances in neural information processing systems, pp 1378–1386
Li LJ, Su H, Lim Y, Fei-Fei L (2012) Objects as attributes for scene classification. In: Trends and topics in computer vision. Springer, pp 57–69
Liu C, Yuen J, Torralba A, Sivic J, Freeman WT: Sift flow: dense correspondence across different scenes. In: Computer vision–ECCV 2008. Springer, pp 28–42 (2008)
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
Mandelbrot B (1983) The fractal geometry of nature. Einaudi paperbacks. Henry Holt and Company. https://books.google.co.uk/books?id=0R2LkE3N7-oC
Novianto S, Suzuki Y, Maeda J (2003) Near optimum estimation of local fractal dimension for image segmentation. Pattern Recognit Lett 24(1–3):365–374. https://doi.org/10.1016/S0167-8655(02)00261-1
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175. https://doi.org/10.1023/A:1011139631724
Pandey M, Lazebnik S (2011) Scene recognition and weakly supervised object localization with deformable part-based models. In: 2011 IEEE international conference on computer vision (ICCV). IEEE, pp 1307–1314
Parizi SN, Oberlin JG, Felzenszwalb PF (2012) Reconfigurable models for scene recognition. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2775–2782
Peitgen HO, Jürgens H, Saupe D (1992) Chaos and fractals: new frontiers of science. Springer, Berlin
Sarkar N, Chaudhuri BB (1994) An efficient differential box-counting approach to compute fractal dimension of image. IEEE Trans Syst Man Cybern 24(1):115–120. https://doi.org/10.1109/21.259692
Shabou A, LeBorgne H (2012) Locality-constrained and spatially regularized coding for scene categorization. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3618–3625
Torresani L, Szummer M, Fitzgibbon A (2010) Efficient object category recognition using classemes. In: Computer vision–ECCV 2010. Springer, pp 776–789
Tsai CF (2012) Bag-of-words representation in image annotation: A review. ISRN Artificial Intelligence 2012
Vailaya A, Figueiredo MAT, Jain AK, Zhang HJ (2001) Image classification for content-based indexing. IEEE Trans Image Process 10(1):117–130. https://doi.org/10.1109/83.892448
van Gemert JC, Veenman CJ, Smeulders AWM, Geusebroek JM (2010) Visual word ambiguity. IEEE Trans Pattern Anal Mach Intell 32(7):1271–1283. https://doi.org/10.1109/TPAMI.2009.132
Varma M, Garg R (2007) Locally invariant fractal features for statistical texture classification. In: 2007 IEEE 11th international conference on computer vision, pp 1–8 (2007). https://doi.org/10.1109/ICCV.2007.4408876
Wu J, Rehg JM (2011) Centrist: a visual descriptor for scene categorization. IEEE Trans Pattern Anal Mach Intell 33(8):1489–1501
Xu S, Weng Y (2006) A new approach to estimate fractal dimensions of corrosion images. Pattern Recognit Lett 27(16):1942–1947. https://doi.org/10.1016/j.patrec.2006.05.005
Xu Y, Huang S, Ji H, Fermuller C (2009) Combining powerful local and global statistics for texture description. In: IEEE conference on computer vision and pattern recognition, pp 573–580. https://doi.org/10.1109/CVPR.2009.5206741
Zhang E, Mayo M (2010) Improving bag-of-words model with spatial information. In: 2010 25th international conference of image and vision computing New Zealand (IVCNZ), pp 1–8. https://doi.org/10.1109/IVCNZ.2010.6148795
Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in neural information processing systems, pp 487–495
Zhu J, Li LJ, Fei-Fei L, Xing EP (2010) Large margin learning of upstream scene understanding models. In: Advances in neural information processing systems, pp 2586–2594
Acknowledgements
This work was supported by the FUNDECT—State of Mato Grosso do Sul Foundation to Support Education, Science and Technology, CAPES—Brazilian Federal Agency for Support and Evaluation of Graduate Education, and CNPq—National Council for Scientific and Technological Development. The Titan X Pascal used for this research was donated by the NVIDIA Corporation. Lucas Correia Ribas gratefully acknowledges the financial support grant #2016/23763-8, São Paulo Research Foundation (FAPESP). Odemir M. Bruno thanks the financial support of CNPq (Grant # 307797/2014-7) and FAPESP (Grant #s 14/08026-1 and 16/18809-9).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ribas, L.C., Gonçalves, D.N., de Andrade Silva, J. et al. Fractal dimension of bag-of-visual words. Pattern Anal Applic 22, 89–98 (2019). https://doi.org/10.1007/s10044-018-0736-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-018-0736-x