Learning to Compose Hypercolumns for Visual Correspondence

Juhong Min^12,13,
Jongmin Lee^12,13,
Jean Ponce^14,15 &
…
Minsu Cho^12,13

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12360))

Included in the following conference series:

European Conference on Computer Vision

3573 Accesses
39 Citations

Abstract

Feature representation plays a crucial role in visual correspondence, and recent methods for image matching resort to deeply stacked convolutional layers. These models, however, are both monolithic and static in the sense that they typically use a specific level of features, e.g., the output of the last layer, and adhere to it regardless of the images to match. In this work, we introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match. Inspired by both multi-layer feature composition in object detection and adaptive inference architectures in classification, the proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network. We demonstrate the effectiveness on the task of semantic correspondence, i.e., establishing correspondences between images depicting different instances of the same object or scene category. Experiments on standard benchmarks show that the proposed method greatly improves matching performance over the state of the art in an adaptive and efficient manner.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Unsupervised Feature Learning for Dense Correspondences Across Scenes

Article 03 June 2015

Learning Contrastive Representation for Semantic Correspondence

Article 24 March 2022

Semantic-Aware Fine-Grained Correspondence

Notes

1.
For example, we can obtain keypoint annotations for free by forming a synthetic pair by applying random geometric transformation (e.g., affine or TPS [8]) on an image and then sampling some corresponding points between the original image and the warped image using the transformation applied.

References

Andreas, J., Rohrbach, M., Darrell, T., Klein, D.: Learning to compose neural networks for question answering. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL) (2016)
Google Scholar
Andreas, J., Rohrbach, M., Darrell, T., Klein, D.: Neural module networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Bristow, H., Valmadre, J., Lucey, S.: Dense semantic correspondence where every pixel is a classifier. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Cho, M., Kwak, S., Schmid, C., Ponce, J.: Unsupervised object discovery and localization in the wild: part-based matching with bottom-up region proposals. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Choy, C.B., Gwak, J., Savarese, S., Chandraker, M.: Universal correspondence network. In: Proceedings of the Neural Information Processing Systems (NeurIPS) (2016)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)
Google Scholar
Donato, G., Belongie, S.: Approximate thin plate spline mappings. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2352, pp. 21–31. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47977-5_2
Chapter Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. (IJCV) 88, 303–338 (2010)
Article Google Scholar
Fathy, M.E., Tran, Q.-H., Zia, M.Z., Vernaza, P., Chandraker, M.: Hierarchical metric learning and matching for 2D and 3D geometric correspondences. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 832–850. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_49
Chapter Google Scholar
Figurnov, M., et al.: Spatially adaptive computation time for residual networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Forsyth, D., Ponce, J.: Computer Vision: A Modern Approach, 2nd edn. Prentice Hall (2011)
Google Scholar
Gao, X., Zhao, Y., Dudziak, L., Mullins, R., Xu, C.Z.: Dynamic channel pruning: feature boosting and suppression. In: Proceedings of the International Conference on Learning Representations (ICLR) (2019)
Google Scholar
Gumbel, E.: Statistical theory of extreme values and some practical applications: a series of lectures. Applied mathematics series, U.S. Govt. Print. Office (1954)
Google Scholar
Ham, B., Cho, M., Schmid, C., Ponce, J.: Proposal flow. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Ham, B., Cho, M., Schmid, C., Ponce, J.: Proposal flow: semantic correspondences from object proposals. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 40, 1711–1725 (2018)
Article Google Scholar
Han, K., et al.: SCNet: learning semantic correspondence. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Hua, W., De Sa, C., Zhang, Z., Suh, G.E.: Channel gating neural networks. arXiv preprint arXiv:1805.12549 (2018)
Huang*, G., Liu*, Z., van der Maaten, L., Weinberger, K.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Huang, S., Wang, Q., Zhang, S., Yan, S., He, X.: Dynamic context correspondence network for semantic alignment. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Jang, E., Gu, S., Poole, B.: Categorical reparameterization with Gumbel-Softmax. In: Proceedings of the International Conference on Learning Representations (ICLR) (2017)
Google Scholar
Jeon, S., Kim, S., Min, D., Sohn, K.: PARN: pyramidal affine regression networks for dense semantic correspondence. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 355–371. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_22
Chapter Google Scholar
Kanazawa, A., Jacobs, D.W., Chandraker, M.: WarpNet: weakly supervised matching for single-view reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Kim, J., Liu, C., Sha, F., Grauman, K.: Deformable spatial pyramid matching for fast dense correspondences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Google Scholar
Kim, S., Lin, S., Jeon, S.R., Min, D., Sohn, K.: Recurrent transformer networks for semantic correspondence. In: Proceedings of the Neural Information Processing Systems (NeurIPS) (2018)
Google Scholar
Kim, S., Min, D., Ham, B., Jeon, S., Lin, S., Sohn, K.: FCSS: fully convolutional self-similarity for dense semantic correspondence. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Kim, S., Min, D., Lin, S., Sohn, K.: DCTM: discrete-continuous transformation matching for semantic flow. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Kong, T., Yao, A., Chen, Y., Sun, F.: HyperNet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the Neural Information Processing Systems (NeurIPS) (2012)
Google Scholar
Lee, J., Kim, D., Ponce, J., Ham, B.: SFNet: learning object-aware semantic correspondence. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Li, F.F., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 28, 594–611 (2006)
Article Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Liu, C., Yuen, J., Torralba, A.: Nonparametric scene parsing: label transfer via dense scene alignment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)
Google Scholar
Liu, C., Yuen, J., Torralba, A.: Sift flow: dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 33, 978–994 (2011)
Article Google Scholar
Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 404–419. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_24
Chapter Google Scholar
Long, J.L., Zhang, N., Darrell, T.: Do convnets learn correspondence? In: Proceedings of the Neural Information Processing Systems (NeurIPS) (2014)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. (IJCV) 60, 91–110 (2004)
Article Google Scholar
Maddison, C., Mnih, A., Whye Teh, Y.: The concrete distribution: a continuous relaxation of discrete random variables. In: Proceedings of the International Conference on Learning Representations (ICLR) (2017)
Google Scholar
Min, J., Lee, J., Ponce, J., Cho, M.: Hyperpixel flow: semantic correspondence with multi-layer neural features. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Min, J., Lee, J., Ponce, J., Cho, M.: SPair-71k: a large-scale benchmark for semantic correspondence. arXiv prepreint arXiv:1908.10543 (2019)
Novotny, D., Larlus, D., Vedaldi, A.: AnchorNet: a weakly supervised network to learn geometry-sensitive features for semantic matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Rocco, I., Arandjelovic, R., Sivic, J.: Convolutional neural network architecture for geometric matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Rocco, I., Arandjelović, R., Sivic, J.: End-to-end weakly-supervised semantic alignment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Rocco, I., Cimpoi, M., Arandjelović, R., Torii, A., Pajdla, T., Sivic, J.: Neighbourhood consensus networks. In: Proceedings of the Neural Information Processing Systems (NeurIPS) (2018)
Google Scholar
Schonberger, J.L., Hardmeier, H., Sattler, T., Pollefeys, M.: Comparative evaluation of hand-crafted and learned local features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Seo, P.H., Lee, J., Jung, D., Han, B., Cho, M.: Attentive semantic alignment with offset-aware correlation kernels. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 367–383. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_22
Chapter Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015)
Google Scholar
Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks. In: Proceedings of the International Conference on Machine Learning (ICML) (2015)
Google Scholar
Taniai, T., Sinha, S.N., Sato, Y.: Joint recovery of dense correspondence and cosegmentation in two images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Ufer, N., Ommer, B.: Deep semantic feature matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Veit, A., Belongie, S.: Convolutional networks with adaptive inference graphs. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 3–18. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_1
Chapter Google Scholar
Yang, F., Li, X., Cheng, H., Li, J., Chen, L.: Object-aware dense semantic correspondence. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Chapter Google Scholar
Zhou, T., Krahenbuhl, P., Aubry, M., Huang, Q., Efros, A.A.: Learning dense correspondence via 3D-guided cycle consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar

Download references

Acknowledgements

This work is supported by Samsung Advanced Institute of Technology (SAIT) and also by Basic Science Research Program (NRF-2017R1E1A1A01077999) and Next-Generation Information Computing Development Program (NRF-2017M3C4A7069369) through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT, Korea. Jean Ponce was supported in part by the Louis Vuitton/ENS chair in artificial intelligence and the Inria/NYU collaboration and also by the French government under management of Agence Nationale de la Recherche as part of the “Investissements dâavenir” program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute).

Author information

Authors and Affiliations

POSTECH, Pohang University of Science and Technology, Pohang, Korea
Juhong Min, Jongmin Lee & Minsu Cho
NPRC, The Neural Processing Research Center, Seoul, Korea
Juhong Min, Jongmin Lee & Minsu Cho
Inria, Paris, France
Jean Ponce
ENS, École normale supérieure, CNRS, PSL Research University, 75005, Paris, France
Jean Ponce

Authors

Juhong Min
View author publications
You can also search for this author in PubMed Google Scholar
Jongmin Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jean Ponce
View author publications
You can also search for this author in PubMed Google Scholar
Minsu Cho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Minsu Cho .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 21865 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Min, J., Lee, J., Ponce, J., Cho, M. (2020). Learning to Compose Hypercolumns for Visual Correspondence. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12360. Springer, Cham. https://doi.org/10.1007/978-3-030-58555-6_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-58555-6_21
Published: 16 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58554-9
Online ISBN: 978-3-030-58555-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning to Compose Hypercolumns for Visual Correspondence

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Unsupervised Feature Learning for Dense Correspondences Across Scenes

Learning Contrastive Representation for Semantic Correspondence

Semantic-Aware Fine-Grained Correspondence

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 21865 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Learning to Compose Hypercolumns for Visual Correspondence

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Unsupervised Feature Learning for Dense Correspondences Across Scenes

Learning Contrastive Representation for Semantic Correspondence

Semantic-Aware Fine-Grained Correspondence

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 21865 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation