[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3372278.3390690acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Continuous ODE-defined Image Features for Adaptive Retrieval

Published: 08 June 2020 Publication History

Abstract

In the last years, content-based image retrieval largely benefited from representation extracted from deeper and more complex convolutional neural networks, which became more effective but also more computationally demanding. Despite existing hardware acceleration, query processing times may be easily saturated by deep feature extraction in high-throughput or real-time embedded scenarios, and usually, a trade-off between efficiency and effectiveness has to be accepted. In this work, we experiment with the recently proposed continuous neural networks defined by parametric ordinary differential equations, dubbed ODE-Nets, for adaptive extraction of image representations. Given the continuous evolution of the network hidden state, we propose to approximate the exact feature extraction by taking a previous "near-in-time" hidden state as features with a reduced computational cost. To understand the potential and the limits of this approach, we also evaluate an ODE-only architecture in which we minimize the number of classical layers in order to delegate most of the representation learning process --- and thus the feature extraction process --- to the continuous part of the model. Preliminary experiments on standard benchmarks show that we are able to dynamically control the trade-off between efficiency and effectiveness of feature extraction at inference-time by controlling the evolution of the continuous hidden state. Although ODE-only networks provide the best fine-grained control on the effectiveness-efficiency trade-off, we observed that mixed architectures perform better or comparably to standard residual nets in both the image classification and retrieval setups while using fewer parameters and retaining the controllability of the trade-off.

References

[1]
Giuseppe Amato, Fabio Carrara, Fabrizio Falchi, Claudio Gennaro, and Lucia Vadicamo. 2019. Large-scale instance-level image retrieval. Information Processing & Management (2019), 102100.
[2]
Giuseppe Amato, Fabrizio Falchi, Claudio Gennaro, and Lucia Vadicamo. 2016. Deep permutations: deep convolutional neural networks and permutation-based indexing. In International Conference on Similarity Search and Applications. Springer, 93--106.
[3]
Relja Arandjelovic, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. 2016. NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5297--5307.
[4]
Artem Babenko, Anton Slesarev, Alexandr Chigorin, and Victor Lempitsky. 2014. Neural codes for image retrieval. In European conference on computer vision. Springer, 584--599.
[5]
Fatih Cakir, Kun He, Sarah Adel Bargal, and Stan Sclaroff. 2017. MIHash: Online hashing with mutual information. In Proceedings of the IEEE International Conference on Computer Vision. 437--445.
[6]
Fabio Carrara, Giuseppe Amato, Fabrizio Falchi, and Claudio Gennaro. 2019. Evaluation of continuous image features learned by ode nets. In International Conference on Image Analysis and Processing. Springer, 432--442.
[7]
Bo Chang, Lili Meng, Eldad Haber, Lars Ruthotto, David Begert, and Elliot Holtham. 2018a. Reversible architectures for arbitrarily deep residual neural networks. In Thirty-Second AAAI Conference on Artificial Intelligence .
[8]
Bo Chang, Lili Meng, Eldad Haber, Frederick Tung, and David Begert. 2018b. Multi-level Residual Networks from Dynamical Systems View. In International Conference on Learning Representations. https://openreview.net/forum?id=SyJS-OgR-
[9]
Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. 2018. Neural ordinary differential equations. In Advances in Neural Information Processing Systems. 6572--6583.
[10]
John R Dormand and Peter J Prince. 1980. A family of embedded Runge-Kutta formulae. Journal of computational and applied mathematics, Vol. 6, 1 (1980), 19--26.
[11]
Matthijs Douze, Hervé Jégou, and Florent Perronnin. 2016. Polysemous codes. In European Conference on Computer Vision. Springer, 785--801.
[12]
Emilien Dupont, Arnaud Doucet, and Yee Whye Teh. 2019. Augmented Neural ODEs. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8--14 December 2019, Vancouver, BC, Canada. 3134--3144. http://papers.nips.cc/paper/8577-augmented-neural-odes
[13]
Albert Gordo, Jon Almazan, Jerome Revaud, and Diane Larlus. 2017. End-to-end learning of deep visual representations for image retrieval. International Journal of Computer Vision, Vol. 124, 2 (2017), 237--254.
[14]
Will Grathwohl, Ricky T. Q. Chen, Jesse Bettencourt, Ilya Sutskever, and David Duvenaud. 2019. FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models. International Conference on Learning Representations (2019).
[15]
Eldad Haber and Lars Ruthotto. 2017. Stable architectures for deep neural networks. Inverse Problems, Vol. 34, 1 (2017), 014004.
[16]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016a. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[17]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016b. Identity mappings in deep residual networks. In European conference on computer vision. Springer, 630--645.
[18]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132--7141.
[19]
Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2008. Hamming embedding and weak geometric consistency for large scale image search. In European conference on computer vision. Springer, 304--317.
[20]
Yannis Kalantidis, Clayton Mellina, and Simon Osindero. 2016. Cross-dimensional weighting for aggregated deep convolutional features. In European conference on computer vision. Springer, 685--701.
[21]
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Technical Report. Citeseer.
[22]
Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. 2017. FractalNet: Ultra-Deep Neural Networks without Residuals. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24--26, 2017, Conference Track Proceedings. https://openreview.net/forum?id=S1VaB4cex
[23]
Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. 1998. Gradient-based learning applied to document recognition. Proc. IEEE, Vol. 86, 11 (1998), 2278--2324.
[24]
Wei Liu, Jun Wang, Rongrong Ji, Yu-Gang Jiang, and Shih-Fu Chang. 2012. Supervised hashing with kernels. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2074--2081.
[25]
Yiping Lu, Aoxiao Zhong, Quanzheng Li, and Bin Dong. 2018. Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm"a ssan, Stockholm, Sweden, July 10--15, 2018. 3282--3291. http://proceedings.mlr.press/v80/lu18d.html
[26]
Yury A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence (2018).
[27]
Nicola Messina, Giuseppe Amato, Fabio Carrara, Fabrizio Falchi, and Claudio Gennaro. 2018. Learning relationship-aware visual features. In Proceedings of the European Conference on Computer Vision (ECCV). 0--0.
[28]
James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching. In 2007 IEEE conference on computer vision and pattern recognition. IEEE, 1--8.
[29]
Filip Radenović, Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, and Ondvr ej Chum. 2018a. Revisiting oxford and paris: Large-scale image retrieval benchmarking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5706--5715.
[30]
Filip Radenović, Giorgos Tolias, and Ondrej Chum. 2018b. Fine-tuning CNN image retrieval with no human annotation. IEEE transactions on pattern analysis and machine intelligence (2018).
[31]
Ali S Razavian, Josephine Sullivan, Stefan Carlsson, and Atsuto Maki. 2016. Visual instance retrieval with deep convolutional networks. ITE Transactions on Media Technology and Applications, Vol. 4, 3 (2016), 251--258.
[32]
Yulia Rubanova, Ricky TQ Chen, and David Duvenaud. 2019. Latent odes for irregularly-sampled time series. arXiv preprint arXiv:1907.03907 (2019).
[33]
Lars Ruthotto and Eldad Haber. 2019. Deep Neural Networks Motivated by Partial Differential Equations. Journal of Mathematical Imaging and Vision (18 Sep 2019). https://doi.org/10.1007/s10851-019-00903--1
[34]
Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN features off-the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 806--813.
[35]
Giorgos Tolias, Ronan Sicre, and Hervé Jégou. 2015. Particular object retrieval with integral max-pooling of CNN activations. arXiv preprint arXiv:1511.05879 (2015).
[36]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.
[37]
Yuxin Wu and Kaiming He. 2018. Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV). 3--19.
[38]
Xingcheng Zhang, Zhizhong Li, Chen Change Loy, and Dahua Lin. 2017. Polynet: A pursuit of structural diversity in very deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 718--726.
[39]
Mai Zhu, Bo Chang, and Chong Fu. 2018. Convolutional Neural Networks combined with Runge-Kutta Methods. arXiv preprint arXiv:1802.08831 (2018).

Cited By

View all
  • (2024)Visual Attention and ODE-inspired Fusion Network for image dehazingEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107692130(107692)Online publication date: Apr-2024
  • (2022)FedNKD: A Dependable Federated Learning Using Fine-tuned Random Noise and Knowledge DistillationProceedings of the 2022 International Conference on Multimedia Retrieval10.1145/3512527.3531372(185-193)Online publication date: 27-Jun-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMR '20: Proceedings of the 2020 International Conference on Multimedia Retrieval
June 2020
605 pages
ISBN:9781450370875
DOI:10.1145/3372278
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adaptive computation
  2. continuous neural networks
  3. feature extraction
  4. image retrieval
  5. ordinary differential equations

Qualifiers

  • Research-article

Funding Sources

Conference

ICMR '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)1
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Visual Attention and ODE-inspired Fusion Network for image dehazingEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107692130(107692)Online publication date: Apr-2024
  • (2022)FedNKD: A Dependable Federated Learning Using Fine-tuned Random Noise and Knowledge DistillationProceedings of the 2022 International Conference on Multimedia Retrieval10.1145/3512527.3531372(185-193)Online publication date: 27-Jun-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media