Diff-Reg: Diffusion Model in Doubly Stochastic Matrix Space for Registration Problem

Qianliang Wu ORCID: orcid.org/0000-0001-6592-021X¹³,
Haobo Jiang¹⁷,
Lei Luo¹³,
Jun Li¹³,
Yaqing Ding ORCID: orcid.org/0000-0002-7448-6686¹⁶,
Jin Xie^14,15 &
…
Jian Yang ORCID: orcid.org/0000-0003-4800-832X¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15123))

Included in the following conference series:

European Conference on Computer Vision

77 Accesses

Abstract

Establishing reliable correspondences is essential for 3D and 2D-3D registration tasks. Existing methods commonly leverage geometric or semantic point features to generate potential correspondences. However, these features may face challenges such as large deformation, scale inconsistency, and ambiguous matching problems (e.g., symmetry). Additionally, many previous methods, which rely on single-pass prediction, may struggle with local minima in complex scenarios. To mitigate these challenges, we introduce a diffusion matching model for robust correspondence construction. Our approach treats correspondence estimation as a denoising diffusion process within the doubly stochastic matrix space, which gradually denoises (refines) a doubly stochastic matching matrix to the ground-truth one for high-quality correspondence estimation. It involves a forward diffusion process that gradually introduces Gaussian noise into the ground truth matching matrix and a reverse denoising process that iteratively refines the noisy one. In particular, we deploy a lightweight denoising strategy during the inference phase. Specifically, once points/image features are extracted and fixed, we utilize them to conduct multiple-pass denoising predictions in the reverse sampling process. Evaluation of our method on both 3D and 2D-3D registration tasks confirms its effectiveness. The code is available at https://github.com/wuqianliang/Diff-Reg.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 49.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 64.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MVDD: Multi-view Depth Diffusion Models

ECO-TR: Efficient Correspondences Finding via Coarse-to-Fine Refinement

Multi-modal Image Registration Based on Modified-SURF and Consensus Inliers Recovery

References

Arun, K.S., Huang, T.S., Blostein, S.D.: Least-squares fitting of two 3-D point sets. IEEE Trans. Pattern Anal. Mach. Intell. (1987)
Google Scholar
Austin, J., Johnson, D.D., Ho, J., Tarlow, D., Van Den Berg, R.: Structured denoising diffusion models in discrete state-spaces. Adv. Neural. Inf. Process. Syst. 34, 17981–17993 (2021)
Google Scholar
Bai, X., et al.: PointDSC: robust point cloud registration using deep spatial consistency. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)
Google Scholar
Bai, X., Luo, Z., Zhou, L., Fu, H., Quan, L., Tai, C.L.: D3fEat: Joint learning of dense detection and description of 3D local features. In: CVPR (2020)
Google Scholar
Baranchuk, D., Rubachev, I., Voynov, A., Khrulkov, V., Babenko, A.: Label-efficient semantic segmentation with diffusion models. ArXiv (2021)
Google Scholar
Besl, P.J., McKay, N.D.: Method for registration of 3-D shapes. In: Sensor fusion IV: Control Paradigms and Data Structures (1992)
Google Scholar
Caron, R.M., Li, X., Mikusiński, P., Sherwood, H., Taylor, M.D.: Nonsquare “doubly stochastic” matrices. Lect. Notes-Monogr. Ser. 28, 65–75 (1996). http://www.jstor.org/stable/4355884
Chen, S., Sun, P., Song, Y., Luo, P.: DiffusionDet: diffusion model for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)
Google Scholar
Chen, Z., et al.: DiffusionPCR: diffusion models for robust multi-step point cloud registration. arXiv preprint arXiv:2312.03053 (2023)
Chen, Z., Sun, K., Yang, F., Tao, W.: SC2-PCR: a second order spatial compatibility for efficient and robust point cloud registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)
Google Scholar
Choy, C., Park, J., Koltun, V.: Fully convolutional geometric features. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2019)
Google Scholar
Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. In: Advances in Neural Information Processing Systems (2013)
Google Scholar
Deng, H., Birdal, T., Ilic, S.: PPF-FoldNet: unsupervised learning of rotation invariant 3D local descriptors. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)
Google Scholar
Fu, K., Liu, S., Luo, X., Wang, M.: Robust point cloud registration framework based on deep graph matching. In: CVPR (2021)
Google Scholar
Gong, J., Foo, L.G., Fan, Z., Ke, Q., Rahmani, H., Liu, J.: DiffPose: toward more reliable 3D pose estimation. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Google Scholar
Gu, Z., Chen, H., Xu, Z., Lan, J., Meng, C., Wang, W.: DiffusioniNST: diffusion model for instance segmentation. ArXiv (2022)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Advances in Neural Information Processing Systems (2020)
Google Scholar
Huang, S., Gojcic, Z., Usvyatsov, M., Wieser, A., Schindler, K.: Predator: registration of 3D point clouds with low overlap. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)
Google Scholar
Jiang, H., Dang, Z., Wei, Z., Xie, J., Yang, J., Salzmann, M.: Robust outlier rejection for 3D registration with variational bayes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)
Google Scholar
Jiang, H., Salzmann, M., Dang, Z., Xie, J., Yang, J.: Se (3) diffusion model-based point cloud registration for robust 6D object pose estimation. arXiv preprint arXiv:2310.17359 (2023)
Lai, K., Bo, L., Fox, D.: Unsupervised feature learning for 3D scene labeling. In: IEEE International Conference on Robotics and Automation (ICRA) (2014)
Google Scholar
Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate O(n) solution to the PnP problem. Int. J. Comput. Vision (2009)
Google Scholar
Li, J., Lee, G.H.: DeepI2P: image-to-point cloud registration via deep classification. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Google Scholar
Li, M., et al.: 2D3D-MATR: 2D-3D matching transformer for detection-free registration between images and point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)
Google Scholar
Li, S., Xu, C., Xie, M.: A robust O(n) solution to the perspective-n-point problem. IEEE Trans. Pattern Anal. Mach. Intell. (2012)
Google Scholar
Li, X., Kaesemodel Pontes, J., Lucey, S.: Neural scene flow prior. In: Advances in Neural Information Processing Systems (2021)
Google Scholar
Li, Y., Harada, T.: Lepard: learning partial point cloud matching in rigid and deformable scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)
Google Scholar
Li, Y., Harada, T.: Non-rigid point cloud registration with neural deformation pyramid. In: Advances in Neural Information Processing Systems (2022)
Google Scholar
Li, Y., Takehara, H., Taketomi, T., Zheng, B., Nießner, M.: 4DComplete: non-rigid motion estimation beyond the observable surface. In: ICCV (2021)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Mei, G., Huang, X., Yu, L., Zhang, J., Bennamoun, M.: COTReg: coupled optimal transport based point cloud registration. arXiv preprint arXiv:2112.14381 (2021)
Mei, G., et al.: Unsupervised deep probabilistic approach for partial point cloud registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. Commun. ACM (2021)
Google Scholar
Oquab, M., et al.: DINOv2: learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)
Parisi, G.: Correlation functions and computer simulations. Nucl. Phys. B (1981)
Google Scholar
Puy, G., Boulch, A., Marlet, R.: FLOT: scene flow on point clouds guided by optimal transport. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 527–544. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_32
Chapter Google Scholar
Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3D object detection in point clouds. In: proceedings of the IEEE/CVF International Conference on Computer Vision (2019)
Google Scholar
Qin, Z., Yu, H., Wang, C., Guo, Y., Peng, Y., Xu, K.: Geometric transformer for fast and robust point cloud registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)
Google Scholar
Qin, Z., Yu, H., Wang, C., Peng, Y., Xu, K.: Deep graph-based spatial consistency for robust non-rigid point cloud registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)
Google Scholar
Shan, W., et al.: Diffusion-based 3D human pose estimation with multi-hypothesis aggregation. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV) (2023)
Google Scholar
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)
Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. In: Advances in Neural Information Processing Systems (2019)
Google Scholar
Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., Guibas, L.J.: KPConv: flexible and deformable convolution for point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2019)
Google Scholar
Urain, J., Funk, N., Peters, J., Chalvatzaki, G.: Se (3)-diffusionfields: learning smooth cost functions for joint grasp and motion optimization through diffusion. In: 2023 IEEE International Conference on Robotics and Automation (ICRA) (2023)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017)
Google Scholar
Vignac, C., Krawczuk, I., Siraudin, A., Wang, B., Cevher, V., Frossard, P.: Digress: discrete denoising diffusion for graph generation. arXiv preprint arXiv:2209.14734 (2022)
Deng, H., Birdal, T., Ilic, S.: PPFNet: global context aware local features for robust 3D point matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Wang, B., et al.: P2-Net: joint description and detection of local features for pixel and point matching. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Wang, H., et al.: FreeReg: image-to-point cloud registration leveraging pretrained diffusion models and monocular depth estimators. ArXiv (2023)
Google Scholar
Wang, J., Rupprecht, C., Novotný, D.: PoseDiffusion: solving pose estimation via diffusion-aided bundle adjustment. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV) (2023)
Google Scholar
Wu, Q., Ding, Y., Luo, L., Zhou, C., Xie, J., Yang, J.: SGFeat: salient geometric feature for point cloud registration. arXiv preprint arXiv:2309.06207 (2023)
Wu, Q., et al.: Graph matching optimization network for point cloud registration. In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2023)
Google Scholar
Wu, W., Wang, Z., Li, Z., Liu, W., Fuxin, L.: PointPWC-net: a coarse-to-fine network for supervised and self-supervised scene flow estimation on 3D point clouds. arXiv preprint arXiv:1911.12408 (2019)
Yan, Z., et al.: Tri-perspective view decomposition for geometry-aware depth completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4874–4884 (2024)
Google Scholar
Yan, Z., et al.: Learning complementary correlations for depth super-resolution with incomplete data in real world. IEEE Trans. Neural Netw. Learn. Syst. (2022)
Google Scholar
Yan, Z., Wang, K., Li, X., Zhang, Z., Li, J., Yang, J.: RigNet: repetitive image guided network for depth completion. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13687, pp. 214–230. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19812-0_13
Chapter Google Scholar
Yan, Z., Wang, K., Li, X., Zhang, Z., Li, J., Yang, J.: DesNet: decomposed scale-consistent network for unsupervised depth completion. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 3109–3117 (2023)
Google Scholar
Yang, H., Shi, J., Carlone, L.: TEASER: fast and certifiable point cloud registration. IEEE Trans. Rob. 37(2), 314–333 (2020)
Article Google Scholar
Yang, L., Kang, B., Huang, Z., Xu, X., Feng, J., Zhao, H.: Depth anything: unleashing the power of large-scale unlabeled data. arXiv preprint arXiv:2401.10891 (2024)
Yao, R., et al.: Hunter: exploring high-order consistency for point cloud registration with severe outliers. IEEE Trans. Pattern Anal. Mach. Intell. (2023)
Google Scholar
Yew, Z.J., Lee, G.H.: REGTR: end-to-end point cloud correspondences with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6677–6686 (2022)
Google Scholar
Yu, H., et al.: RIGA: rotation-invariant and globally-aware descriptors for point cloud registration. arXiv preprint arXiv:2209.13252 (2022)
Yu, H., Li, F., Saleh, M., Busam, B., Ilic, S.: CoFiNet: reliable coarse-to-fine correspondences for robust pointcloud registration. In: Advances in Neural Information Processing Systems (2021)
Google Scholar
Yu, H., et al.: Rotation-invariant transformer for point cloud matching. In: CVPR (2023)
Google Scholar
Yu, J., Ren, L., Zhang, Y., Zhou, W., Lin, L., Dai, G.: PEAL: prior-embedded explicit attention learning for low-overlap point cloud registration. In: CVPR (2023)
Google Scholar
Zeng, A., Song, S., Nießner, M., Fisher, M., Xiao, J., Funkhouser, T.: 3DMatch: learning local geometric descriptors from rgb-d reconstructions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1802–1811 (2017)
Google Scholar
Zhang, X., Yang, J., Zhang, S., Zhang, Y.: 3D registration with maximal cliques. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)
Google Scholar
Zhong, Y.: Intrinsic shape signatures: a shape descriptor for 3D object recognition. In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops (2009)
Google Scholar

Download references

Acknowledgments

This work was partially supported by the National Science Fund of China (Grant Nos. 62361166670, 62276144, 62072242, 62276135) and the Czech Science Foundation (GACR) JUNIOR STAR Grant No. 22-23183M.

Author information

Authors and Affiliations

PCA Lab, Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, and Jiangsu Key Lab of Image and Video Understanding for Social Security, School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
Qianliang Wu, Lei Luo, Jun Li & Jian Yang
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Jin Xie
School of Intelligence Science and Technology, Nanjing University, Suzhou, China
Jin Xie
Visual Recognition Group, Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czech Republic
Yaqing Ding
National University of Singapore, Singapore, Singapore
Haobo Jiang

Authors

Qianliang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Haobo Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Luo
View author publications
You can also search for this author in PubMed Google Scholar
Jun Li
View author publications
You can also search for this author in PubMed Google Scholar
Yaqing Ding
View author publications
You can also search for this author in PubMed Google Scholar
Jin Xie
View author publications
You can also search for this author in PubMed Google Scholar
Jian Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jin Xie or Jian Yang .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 9150 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, Q. et al. (2025). Diff-Reg: Diffusion Model in Doubly Stochastic Matrix Space for Registration Problem. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15123. Springer, Cham. https://doi.org/10.1007/978-3-031-73650-6_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-73650-6_10
Published: 21 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73649-0
Online ISBN: 978-3-031-73650-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Diff-Reg: Diffusion Model in Doubly Stochastic Matrix Space for Registration Problem

Abstract

Access this chapter

Subscribe and save

Buy Now