Hierarchical Average Precision Training for Pertinent Image Retrieval

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13674))

Included in the following conference series:

European Conference on Computer Vision

Abstract

Image Retrieval is commonly evaluated with Average Precision (AP) or Recall@k. Yet, those metrics, are limited to binary labels and do not take into account errors’ severity. This paper introduces a new hierarchical AP training method for pertinent image retrieval (HAPPIER). HAPPIER is based on a new \(\mathcal {H}\text {-AP}\) metric, which leverages a concept hierarchy to refine AP by integrating errors’ importance and better evaluate rankings. To train deep models with \(\mathcal {H}\text {-AP}\), we carefully study the problem’s structure and design a smooth lower bound surrogate combined with a clustering loss that ensures consistent ordering. Extensive experiments on 6 datasets show that HAPPIER significantly outperforms state-of-the-art methods for hierarchical retrieval, while being on par with the latest approaches when evaluating fine-grained ranking performances. Finally, we show that HAPPIER leads to better organization of the embedding space, and prevents most severe failure cases of non-hierarchical methods. Our code is publicly available at https://github.com/elias-ramzi/HAPPIER.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 79.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 99.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Multiple Positives Enhanced NCE Loss for Image-Text Retrieval

Data-Efficient Ranking Distillation for Image Retrieval

Unifying Deep Local and Global Features for Image Search

Notes

1.
For the sake of readability, our notations are given for a single query. During training, HAPPIER optimizes our hierarchical retrieval objective by averaging several queries.
2.
CSL’s score on Table 2 are above those reported in [32]; personal discussions with the authors [32] validate that our results are valid for CSL, see supplementary B.5.

References

Bertinetto, L., Mueller, R., Tertikas, K., Samangooei, S., Lord, N.A.: Making better mistakes: leveraging class hierarchies with deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12506–12515 (2020)
Google Scholar
Brown, A., Xie, W., Kalogeiton, V., Zisserman, A.: Smooth-AP: smoothing the path towards large-scale image retrieval. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 677–694. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_39
Chapter Google Scholar
Bruch, S., Zoghi, M., Bendersky, M., Najork, M.: Revisiting approximate metric optimization in the age of deep neural networks. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1241–1244 (2019)
Google Scholar
Burges, C., et al.: Learning to rank using gradient descent. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 89–96. ICML 2005, Association for Computing Machinery, New York, NY, USA (2005). https://doi.org/10.1145/1102351.1102363
Burges, C., Ragno, R., Le, Q.: Learning to rank with nonsmooth cost functions. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems, vol. 19. MIT Press (2006). https://proceedings.neurips.cc/paper/2006/file/af44c4c56f385c43f2529f9b1b018f6a-Paper.pdf
Cakir, F., He, K., Xia, X., Kulis, B., Sclaroff, S.: Deep metric learning to rank. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1861–1870 (2019)
Google Scholar
Chan, D.M., Rao, R., Huang, F., Canny, J.F.: GPU accelerated t-distributed stochastic neighbor embedding. J. Parallel Distrib. Comput. 131, 1–13 (2019)
Article Google Scholar
Chang, D., Pang, K., Zheng, Y., Ma, Z., Song, Y.Z., Guo, J.: Your “flamingo” is my “bird”: fine-grained, or not. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11476–11485 (2021)
Google Scholar
Chapelle, O., Chang, Y.: Yahoo! learning to rank challenge overview. In: Proceedings of the learning to rank challenge, pp. 1–24. PMLR (2011)
Google Scholar
Croft, W.B., Metzler, D., Strohman, T.: Search engines: information retrieval in practice, vol. 520. Addison-Wesley Reading (2010)
Google Scholar
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4690–4699 (2019)
Google Scholar
Dhall, A., Makarova, A., Ganea, O., Pavllo, D., Greeff, M., Krause, A.: Hierarchical image classification using entailment cone embeddings. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 836–837 (2020)
Google Scholar
Dupret, G., Piwowarski, B.: A user behavior model for average precision and its generalization to graded judgments. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 531–538. SIGIR 2010, Association for Computing Machinery, New York, NY, USA (2010). https://doi.org/10.1145/1835449.1835538
Dupret, G., Piwowarski, B.: Model based comparison of discounted cumulative gain and average precision. J. Discrete Algorithms 18, 49–62 (2013). https://doi.org/10.1016/j.jda.2012.10.002. https://www.sciencedirect.com/science/article/pii/S1570866712001372 Selected papers from the 18th International Symposium on String Processing and Information Retrieval (SPIRE 2011)
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol. 2, pp. 1735–1742. IEEE (2006)
Google Scholar
Hjørland, B.: The foundation of the concept of relevance. J. Am. Soc. Inform. Sci. Technol. 61(2), 217–237 (2010)
Google Scholar
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. (TOIS) 20(4), 422–446 (2002)
Article Google Scholar
Järvelin, K., Kekäläinen, J.: IR evaluation methods for retrieving highly relevant documents. In: ACM SIGIR Forum, vol. 51, pp. 243–250. ACM New York, NY, USA (2017)
Google Scholar
Kekäläinen, J., Järvelin, K.: Using graded relevance assessments in ir evaluation. J. Am. Soc. Inf. Sci. Technol. 53(13), 1120–1129 (2002). https://doi.org/10.1002/asi.10137. https://onlinelibrary.wiley.com/doi/abs/10.1002/asi.10137
Law, M.T., Thome, N., Cord, M.: Learning a distance metric from relative comparisons between quadruplets of images. Int. J. Comput. Vision 121(1), 65–94 (2017)
Article MathSciNet Google Scholar
Movshovitz-Attias, Y., Toshev, A., Leung, T.K., Ioffe, S., Singh, S.: No fuss distance metric learning using proxies. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 360–368 (2017)
Google Scholar
Oh Song, H., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4004–4012 (2016)
Google Scholar
P., M.V., Paulus, A., Musil, V., Martius, G., Rolínek, M.: Differentiation of blackbox combinatorial solvers. In: ICLR (2020)
Google Scholar
Qin, T., Liu, T.: Introducing LETOR 4.0 datasets. arXiv preprint arXiv:1306.2597 (2013)
Qin, T., Liu, T.Y., Li, H.: A general approximation framework for direct optimization of information retrieval measures. Inf. Retrieval 13, 375–397 (2009)
Article Google Scholar
Radenović, F., Tolias, G., Chum, O.: CNN image retrieval learns from BoW: unsupervised fine-tuning with hard examples. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 3–20. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_1
Chapter Google Scholar
Ramzi, E., Thome, N., Rambour, C., Audebert, N., Bitot, X.: Robust and decomposable average precision for image retrieval. Advances in Neural Information Processing Systems 34 (2021)
Google Scholar
Revaud, J., Almazán, J., Rezende, R.S., Souza, C.R.D.: Learning with average precision: Training image retrieval with a listwise loss. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5107–5116 (2019)
Google Scholar
Robertson, S.E., Kanoulas, E., Yilmaz, E.: Extending average precision to graded relevance judgments. In: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pp. 603–610 (2010)
Google Scholar
Rolínek, M., Musil, V., Paulus, A., Vlastelica, M., Michaelis, C., Martius, G.: Optimizing rank-based metrics with blackbox differentiation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7620–7630 (2020)
Google Scholar
Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29. Curran Associates, Inc. (2016). https://proceedings.neurips.cc/paper/2016/file/6b180037abbebea991d8b1232f8a8ca9-Paper.pdf
Sun, Y., et al.: Dynamic metric learning: Towards a scalable metric space to accommodate multiple semantic scales. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5393–5402 (2021)
Google Scholar
Taylor, M., Guiver, J., Robertson, S., Minka, T.: SoftRank: optimizing non-smooth rank metrics. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 77–86. WSDM 2008, Association for Computing Machinery, New York, NY, USA (2008). https://doi.org/10.1145/1341531.1341544
Teh, E.W., DeVries, T., Taylor, G.W.: ProxyNCA++: revisiting and revitalizing proxy neighborhood component analysis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12369, pp. 448–464. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58586-0_27
Chapter Google Scholar
van der Maaten, L., Hinton, G.: Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
MATH Google Scholar
Van Horn, G., et al.: The inaturalist species classification and detection dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8769–8778 (2018)
Google Scholar
Wang, H., et al.: CosFace: large margin cosine loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5265–5274 (2018)
Google Scholar
Wang, X., Han, X., Huang, W., Dong, D., Scott, M.R.: Multi-similarity loss with general pair weighting for deep metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5022–5030 (2019)
Google Scholar
Wang, X., Zhang, H., Huang, W., Scott, M.R.: Cross-batch memory for embedding learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6388–6397 (2020)
Google Scholar
Wu, C.Y., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2840–2848 (2017)
Google Scholar
Zhai, A., Wu, H.Y.: Classification is a strong baseline for deep metric learning. arXiv preprint arXiv:1811.12649 (2018)

Download references

Acknowledgement

This work was done under a grant from the the AHEAD ANR program (ANR-20-THIA-0002). It was granted access to the HPC resources of IDRIS under the allocation 2021-AD011012645 made by GENCI.

Author information

Authors and Affiliations

CEDRIC, Conservatoire National des Arts et Métiers, Paris, France
Elias Ramzi, Nicolas Audebert, Nicolas Thome & Clément Rambour
Coexya, Paris, France
Elias Ramzi & Xavier Bitot
Sorbonne Université, CNRS, ISIR, 75005, Paris, France
Nicolas Thome

Authors

Elias Ramzi
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Audebert
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Thome
View author publications
You can also search for this author in PubMed Google Scholar
Clément Rambour
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Bitot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elias Ramzi .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 5596 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ramzi, E., Audebert, N., Thome, N., Rambour, C., Bitot, X. (2022). Hierarchical Average Precision Training for Pertinent Image Retrieval. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13674. Springer, Cham. https://doi.org/10.1007/978-3-031-19781-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-19781-9_15
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19780-2
Online ISBN: 978-3-031-19781-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Hierarchical Average Precision Training for Pertinent Image Retrieval

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Multiple Positives Enhanced NCE Loss for Image-Text Retrieval

Data-Efficient Ranking Distillation for Image Retrieval

Unifying Deep Local and Global Features for Image Search

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 5596 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Hierarchical Average Precision Training for Pertinent Image Retrieval

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Multiple Positives Enhanced NCE Loss for Image-Text Retrieval

Data-Efficient Ranking Distillation for Image Retrieval

Unifying Deep Local and Global Features for Image Search

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 5596 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation