Abstract
Local feature matching between images is a challenging task, and current research focuses on pursuing higher accuracy matching results at the cost of higher time consumption and resource consumption, e.g., using multilayer search strategies to obtain higher matching accuracy. On the other hand, low-time consumption methods perform poorly in matching accuracy, such as using a coarse-to-fine strategy due to the loss of information of many feature maps resulting in lower matching accuracy. To address the above problems, we propose a matching pipeline that balances matching accuracy and time consumption. This pipeline uses a triple search strategy to search the information on three feature maps for local feature matching, which can obtain both higher matching accuracy than the coarse-to-fine method and lower computational complexity than the hierarchical strategy method, thus achieving a balance between accuracy and time consumption. In our pipeline, a pre-trained network is used as the backbone to generate feature maps from different layers. In addition, we collect the coarse matches and geometric transformations of the coarse feature maps. Then, local feature maps centered on matching points are cropped from the middle feature maps for refinement matching. After this step, the positioning of the refined middle matches on the fine layer feature map can be obtained with high accuracy. Extensive experiments are conducted on the Hpatches, IMC2020, and Aachen Day–Night datasets to demonstrate the effectiveness of the proposed pipeline, which is competitive with the current state-of-the-art methods.
Similar content being viewed by others
Availability of data and materials
Research data are not shared.
References
Mur-Artal R, Montiel JMM, Tardos JD (2015) Orb-slam: a versatile and accurate monocular slam system. IEEE Trans Rob 5:1147–1163
Forster C, Pizzoli M, Scaramuzza D (2014) SVO: fast semi-direct monocular visual odometry. In: IEEE International Conference on Robotics and Automation (ICRA), pp 15–22
Engel J, Koltun V, Cremers D (2017) Direct sparse odometry. IEEE Trans Pattern Anal Mach Intell 40(3):611–625
Heinly J, Schonberger JL, Dunn E, Frahm J-M (2015) Reconstructing the world* in six days*(as captured by the yahoo 100 million image dataset). In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3287–3295
Schönberger JL, Pollefeys M, Geiger A, Sattler T (2018) Semantic visual localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6896–6906
Schonberger JL, Frahm J-M (2016) Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4104–4113
Taira H, Okutomi M, Sattler T, Cimpoi M, Pollefeys M, Sivic J, Pajdla T, Torii A (2018) Inloc: indoor visual localization with dense matching and view synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7199–7209
Sattler T, Maddern W, Toft C, Torii A, Hammarstrand L, Stenborg E, Safari D, Okutomi M, Pollefeys M, Sivic J et al (2018) Benchmarking 6dof outdoor visual localization in changing conditions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8601–8610
Yang M, He D, Fan M, Shi B, Xue X, Li F, Ding E, Huang J (2021) Dolg: single-stage image retrieval with deep orthogonal fusion of local and global features. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 11772–11781
Alsmadi MK (2020) Content-based image retrieval using color, shape and texture descriptors and features. Arab J Sci Eng 45(4):3317–3330
Verdie Y, Yi K, Fua P, Lepetit V (2015) Tilde: a temporally invariant learned detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5279–5288
Barroso-Laguna A, Riba E, Ponsa D, Mikolajczyk K (2019) Key. net: Keypoint detection by handcrafted and learned CNN filters. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 5836–5844
Simo-Serra E, Trulls E, Ferraz L, Kokkinos I, Fua P, Moreno-Noguer F (2015) Discriminative learning of deep convolutional feature point descriptors. In: Proceedings of the IEEE International Conference on Computer Vision, pp 118–126
Mishchuk A, Mishkin D, Radenovic F, Matas J (2017) Working hard to know your neighbor’s margins: local descriptor learning loss. In: Advances in Neural Information Processing Systems, vol 30
Tian Y, Yu X, Fan B, Wu F, Heijnen H, Balntas V (2019) Sosnet: aecond order similarity regularization for local descriptor learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11016–11025
Ebel P, Mishchuk A, Yi KM, Fua P, Trulls E (2019) Beyond cartesian representations for local descriptors. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 253–262
Luo Z, Shen T, Zhou L, Zhang J, Yao Y, Li S, Fang T, Quan L (2019) Contextdesc: local descriptor augmentation with cross-modality context. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2527–2536
Luo Z, Shen T, Zhou L, Zhu S, Zhang R, Yao Y, Fang T, Quan L (2018) Geodesc: Learning local descriptors by integrating geometry constraints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 168–183
DeTone D, Malisiewicz T, Rabinovich A (2018) Superpoint: Self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 224–236
Dusmanu M, Rocco I, Pajdla T, Pollefeys M, Sivic J, Torii A, Sattler T (2019) D2-net: a trainable CNN for joint description and detection of local features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8092–8101
Revaud J, De Souza C, Humenberger M, Weinzaepfel P (2019) R2d2: Reliable and repeatable detector and descriptor. In: Advances in Neural Information Processing Systems, vol 32
Luo Z, Zhou L, Bai X, Chen H, Zhang J, Yao Y, Li S, Fang T, Quan L (2020) Aslfeat: learning local features of accurate shape and localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern recognition, pp 6589–6598
Bhowmik A, Gumhold S, Rother C, Brachmann E (2020) Reinforced feature points: optimizing feature detection and description for a high-level task. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4948–4957
Tyszkiewicz M, Fua P, Trulls E (2020) Disk: learning local features with policy gradient. Adv Neural Inf Process Syst 33:14254–14265
Li K, Wang L, Liu L, Ran Q, Xu K, Guo Y (2022) Decoupling makes weakly supervised local feature better. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 15838–15848
Sarlin P-E, DeTone D, Malisiewicz T, Rabinovich A (2020) Superglue: Learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4938–4947
Chen H, Luo Z, Zhang J, Zhou L, Bai X, Hu Z, Tai C-L, Quan L (2021) Learning to match features with seeded graph matching network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6301–6310
Shi Y, Cai J-X, Shavit Y, Mu T-J, Feng W, Zhang K (2022) Clustergnn: cluster-based coarse-to-fine graph neural network for efficient feature matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12517–12526
Viniavskyi O, Dobko M, Mishkin D, Dobosevych O (2022) Openglue: open source graph neural net based pipeline for image matching. arXiv preprint arXiv:2204.08870
Yi KM, Trulls E, Ono Y, Lepetit V, Salzmann M, Fua P (2018) Learning to find good correspondences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2666–2674
Zhang J, Sun D, Luo Z, Yao A, Zhou L, Shen T, Chen Y, Quan L, Liao H (2019) Learning two-view correspondences and geometry using order-aware network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 5845–5854
Sun W, Jiang W, Trulls E, Tagliasacchi A, Yi KM (2020) Acne: Attentive context normalization for robust permutation-equivariant learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11286–11295
Zhou Q, Sattler T, Leal-Taixe L (2021) Patch2pix: epipolar-guided pixel-level correspondences. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4669–4678
Sun J, Shen Z, Wang Y, Bao H, Zhou X (2021) Loftr: detector-free local feature matching with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8922–8931
Edstedt J, Wadenbäck M, Felsberg M (2022) Deep kernelized dense geometric matching. arXiv preprint arXiv:2202.00667
Efe U, Ince KG, Alatan A (2021) DFM: a performance baseline for deep feature matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4284–4293
Wang Q, Zhang J, Yang K, Peng K, Stiefelhagen R (2022) Matchformer: interleaving attention in transformers for feature matching. arXiv preprint arXiv:2203.09645
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Wang Q, Zhou X, Hariharan B, Snavely N (2020) Learning feature descriptors using camera pose supervision. In: European Conference on Computer Vision, pp 757–774
Balntas V, Lenc K, Vedaldi A, Mikolajczyk K (2017) Hpatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5173–5182
Jin Y, Mishkin D, Mishchuk A, Matas J, Fua P, Yi KM, Trulls E (2021) Image matching across wide baselines: from paper to practice. Int J Comput Vision 129(2):517–547
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110
Tian Y, Fan B, Wu F (2017) L2-net: deep learning of discriminative patch descriptor in euclidean space. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 661–669
Ma J, Jiang X, Jiang J, Zhao J, Guo X (2019) LMR: learning a two-class classifier for mismatch removal. IEEE Trans Image Process 28(8):4045–4059
Zhao X, Liu J, Wu X, Chen W, Guo F, Li Z (2021) Probabilistic spatial distribution prior based attentional keypoints matching network. IEEE Trans Circuits Syst Video Technol 32(3):1313–1327
Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395
Torr PH, Nasuto SJ, Bishop JM (2002) Napsac: high noise, high dimensional robust estimation—it’s in the bag. In: British Machine Vision Conference (BMVC) vol 2, 3
Ni K, Jin H, Dellaert F (2009) Groupsac: efficient consensus in the presence of groupings. In: 2009 IEEE 12th International Conference on Computer Vision, pp 2193–2200
Chum O, Matas J (2005) (2005) Matching with prosac-progressive sample consensus. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) vol 1, pp 220–226
Chum O, Matas J, Kittler J (2003) Locally optimized ransac. IN: Joint Pattern Recognition Symposium, pp 236–243
Ma J, Zhao J, Tian J, Yuille AL, Tu Z (2014) Robust point matching via vector field consensus. IEEE Trans Image Process 23(4):1706–1721
Ma J, Zhou H, Zhao J, Gao Y, Jiang J, Tian J (2015) Robust feature matching for remote sensing image registration via locally linear transforming. IEEE Trans Geosci Remote Sens 53(12):6469–6481
Ma J, Jiang J, Liu C, Li Y (2017) Feature guided gaussian mixture model with semi-supervised EM and local geometric constraint for retinal image registration. Inf Sci 417:128–142
Ma J, Wu J, Zhao J, Jiang J, Zhou H, Sheng QZ (2018) Nonrigid point set registration with robust transformation learning under manifold regularization. IEEE Trans Neural Netw Learn Syst 30(12):3584–3597
Rocco I, Cimpoi M, Arandjelović R, Torii A, Pajdla T, Sivic J (2018) Neighbourhood consensus networks. In: Advances in Neural Information Processing Systems, vol 31
Rocco I, Arandjelović R, Sivic J (2020) Efficient neighbourhood consensus networks via submanifold sparse convolutions. In: European Conference on Computer Vision, pp 605–621
Li X, Han K, Li S, Prisacariu V (2020) Dual-resolution correspondence networks. Adv Neural Inf Process Syst 33:17346–17357
Bökman G, Kahl F (2022) A case for using rotation invariant features in state of the art feature matchers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5110–5119
Tang S, Zhang J, Zhu S, Tan P (2022) Quadtree attention for vision transformers. arXiv preprint arXiv:2201.02767
Chen H, Luo Z, Zhou L, Tian Y, Zhen M, Fang T, McKinnon D, Tsin Y, Quan L (2022)Aspanformer: Detector-free image matching with adaptive span transformer. In: European Conference on Computer Vision, pp 20–36
Xie T, Dai K, Wang K, Li R, Zhao L (2023) Deepmatcher: a deep transformer-based network for robust and accurate local feature matching. arXiv preprint arXiv:2301.02993
Giang KT, Song S, Jo S (2022) Topicfm: robust and interpretable feature matching with topic-assisted. arXiv preprint arXiv:2207.00328
Li Z, Snavely N (2018) Megadepth: learning single-view depth prediction from internet photos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2041–2050
Zhao X, Wu X, Miao J, Chen W, Chen PC, Li Z (2022) Alike: accurate and lightweight keypoint detection and descriptor extraction. IEEE Trans Multimedia
Jiang W, Trulls E, Hosang J, Tagliasacchi A, Yi KM (2021) Cotr: correspondence transformer for matching across images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6207–6217
Chum O, Werner T, Matas J (2005) Two-view geometry estimation unaffected by a dominant plane. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol 1, pp 772–779
Sarlin P-E, Cadena C, Siegwart R, Dymczyk M (2019) From coarse to fine: robust hierarchical localization at large scale. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12716–12725
Zhang Z, Sattler T, Scaramuzza D (2021) Reference pose generation for long-term visual localization via learned features and view synthesis. Int J Comput Vision 129(4):821–844
Funding
This work was supported by Key Area Research and Development Program of Guangdong Province under Grant (Funding No.: 2020B0909020001) and National Natural Science Foundation of China (Funding No.: 61573113).
Author information
Authors and Affiliations
Contributions
SF wrote the main manuscript text. HW and HQ modify syntax. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethics approval
This declaration is not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Feng, S., Qian, H. & Wang, H. A deep feature matching pipeline with triple search strategy. J Supercomput 79, 20878–20898 (2023). https://doi.org/10.1007/s11227-023-05418-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05418-6