Abstract
Bag-of-Words methods can be robust to image scaling, translation, and occlusion. An important step in this methodology, and other visual recognition systems like Convolutional Neural Networks, is spatial pooling, where the descriptors of neighbouring elements are combined into a local or a global feature vector. The combined vector must contain relevant information, while removing irrelevant and confusing details. Maximum and average are the most common aggregation functions used in the pooling step. In this work we present a study about the cardinality of ordered average pooling, i.e. the number of ordered elements to be aggregated such that after the pooling process the relevant information is maintained without degrading their discriminative power for classification. We provide an extensive evaluation that shows that for different values of cardinalities we can obtain results better than simple average pooling and than maximum pooling when dealing with small dictionary sizes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Maximum expectation is not used with triangle assignment coding due to its values are bigger than 1.
References
Lowe, D.G.: Distinctive image features from scale invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Koniusz, P., Yan, F., Mikolajczyk, K.: Comparison of mid-level feature coding approaches and pooling strategies in visual concept detection. Comput. Vis. Image Underst. 117(5), 479–492 (2013)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178 (2006)
Van Gemert, J.C., Veenman, C.J., Smeulders, A.W.M., Geusebroek, J.M.: Visual word ambiguity. IEEE Trans. Pattern Anal. Mach. Intell. 32(7), 1271–1283 (2010)
Coates, A., Arbor, A., Ng, A.Y.: An analysis of single-layer networks in unsupervised feature learning. Aistats 2011, 215–223 (2011)
Wang, C., Huang, K.: How to use Bag-of-Words model better for image classification. Image Vis. Comput. 38, 65–74 (2015)
Boureau, Y.L., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2559–2566 (2010)
Boureau, Y.L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: ICML, pp. 111–118 (2010)
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: Conference on Computer Vision and Pattern Recognition Workshop (CVPR 2004), p. 178 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Pagola, M., Forcen, J.I., Barrenechea, E., Fernández, J., Bustince, H. (2017). A Study on the Cardinality of Ordered Average Pooling in Visual Recognition. In: Alexandre, L., Salvador Sánchez, J., Rodrigues, J. (eds) Pattern Recognition and Image Analysis. IbPRIA 2017. Lecture Notes in Computer Science(), vol 10255. Springer, Cham. https://doi.org/10.1007/978-3-319-58838-4_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-58838-4_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58837-7
Online ISBN: 978-3-319-58838-4
eBook Packages: Computer ScienceComputer Science (R0)