Abstract
Establishing reliable correspondences is essential for 3D and 2D-3D registration tasks. Existing methods commonly leverage geometric or semantic point features to generate potential correspondences. However, these features may face challenges such as large deformation, scale inconsistency, and ambiguous matching problems (e.g., symmetry). Additionally, many previous methods, which rely on single-pass prediction, may struggle with local minima in complex scenarios. To mitigate these challenges, we introduce a diffusion matching model for robust correspondence construction. Our approach treats correspondence estimation as a denoising diffusion process within the doubly stochastic matrix space, which gradually denoises (refines) a doubly stochastic matching matrix to the ground-truth one for high-quality correspondence estimation. It involves a forward diffusion process that gradually introduces Gaussian noise into the ground truth matching matrix and a reverse denoising process that iteratively refines the noisy one. In particular, we deploy a lightweight denoising strategy during the inference phase. Specifically, once points/image features are extracted and fixed, we utilize them to conduct multiple-pass denoising predictions in the reverse sampling process. Evaluation of our method on both 3D and 2D-3D registration tasks confirms its effectiveness. The code is available at https://github.com/wuqianliang/Diff-Reg.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arun, K.S., Huang, T.S., Blostein, S.D.: Least-squares fitting of two 3-D point sets. IEEE Trans. Pattern Anal. Mach. Intell. (1987)
Austin, J., Johnson, D.D., Ho, J., Tarlow, D., Van Den Berg, R.: Structured denoising diffusion models in discrete state-spaces. Adv. Neural. Inf. Process. Syst. 34, 17981–17993 (2021)
Bai, X., et al.: PointDSC: robust point cloud registration using deep spatial consistency. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)
Bai, X., Luo, Z., Zhou, L., Fu, H., Quan, L., Tai, C.L.: D3fEat: Joint learning of dense detection and description of 3D local features. In: CVPR (2020)
Baranchuk, D., Rubachev, I., Voynov, A., Khrulkov, V., Babenko, A.: Label-efficient semantic segmentation with diffusion models. ArXiv (2021)
Besl, P.J., McKay, N.D.: Method for registration of 3-D shapes. In: Sensor fusion IV: Control Paradigms and Data Structures (1992)
Caron, R.M., Li, X., Mikusiński, P., Sherwood, H., Taylor, M.D.: Nonsquare “doubly stochastic” matrices. Lect. Notes-Monogr. Ser. 28, 65–75 (1996). http://www.jstor.org/stable/4355884
Chen, S., Sun, P., Song, Y., Luo, P.: DiffusionDet: diffusion model for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)
Chen, Z., et al.: DiffusionPCR: diffusion models for robust multi-step point cloud registration. arXiv preprint arXiv:2312.03053 (2023)
Chen, Z., Sun, K., Yang, F., Tao, W.: SC2-PCR: a second order spatial compatibility for efficient and robust point cloud registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)
Choy, C., Park, J., Koltun, V.: Fully convolutional geometric features. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2019)
Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. In: Advances in Neural Information Processing Systems (2013)
Deng, H., Birdal, T., Ilic, S.: PPF-FoldNet: unsupervised learning of rotation invariant 3D local descriptors. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)
Fu, K., Liu, S., Luo, X., Wang, M.: Robust point cloud registration framework based on deep graph matching. In: CVPR (2021)
Gong, J., Foo, L.G., Fan, Z., Ke, Q., Rahmani, H., Liu, J.: DiffPose: toward more reliable 3D pose estimation. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Gu, Z., Chen, H., Xu, Z., Lan, J., Meng, C., Wang, W.: DiffusioniNST: diffusion model for instance segmentation. ArXiv (2022)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Advances in Neural Information Processing Systems (2020)
Huang, S., Gojcic, Z., Usvyatsov, M., Wieser, A., Schindler, K.: Predator: registration of 3D point clouds with low overlap. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)
Jiang, H., Dang, Z., Wei, Z., Xie, J., Yang, J., Salzmann, M.: Robust outlier rejection for 3D registration with variational bayes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)
Jiang, H., Salzmann, M., Dang, Z., Xie, J., Yang, J.: Se (3) diffusion model-based point cloud registration for robust 6D object pose estimation. arXiv preprint arXiv:2310.17359 (2023)
Lai, K., Bo, L., Fox, D.: Unsupervised feature learning for 3D scene labeling. In: IEEE International Conference on Robotics and Automation (ICRA) (2014)
Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate O(n) solution to the PnP problem. Int. J. Comput. Vision (2009)
Li, J., Lee, G.H.: DeepI2P: image-to-point cloud registration via deep classification. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Li, M., et al.: 2D3D-MATR: 2D-3D matching transformer for detection-free registration between images and point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)
Li, S., Xu, C., Xie, M.: A robust O(n) solution to the perspective-n-point problem. IEEE Trans. Pattern Anal. Mach. Intell. (2012)
Li, X., Kaesemodel Pontes, J., Lucey, S.: Neural scene flow prior. In: Advances in Neural Information Processing Systems (2021)
Li, Y., Harada, T.: Lepard: learning partial point cloud matching in rigid and deformable scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)
Li, Y., Harada, T.: Non-rigid point cloud registration with neural deformation pyramid. In: Advances in Neural Information Processing Systems (2022)
Li, Y., Takehara, H., Taketomi, T., Zheng, B., Nießner, M.: 4DComplete: non-rigid motion estimation beyond the observable surface. In: ICCV (2021)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Lin, T.Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Mei, G., Huang, X., Yu, L., Zhang, J., Bennamoun, M.: COTReg: coupled optimal transport based point cloud registration. arXiv preprint arXiv:2112.14381 (2021)
Mei, G., et al.: Unsupervised deep probabilistic approach for partial point cloud registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. Commun. ACM (2021)
Oquab, M., et al.: DINOv2: learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)
Parisi, G.: Correlation functions and computer simulations. Nucl. Phys. B (1981)
Puy, G., Boulch, A., Marlet, R.: FLOT: scene flow on point clouds guided by optimal transport. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 527–544. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_32
Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3D object detection in point clouds. In: proceedings of the IEEE/CVF International Conference on Computer Vision (2019)
Qin, Z., Yu, H., Wang, C., Guo, Y., Peng, Y., Xu, K.: Geometric transformer for fast and robust point cloud registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)
Qin, Z., Yu, H., Wang, C., Peng, Y., Xu, K.: Deep graph-based spatial consistency for robust non-rigid point cloud registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)
Shan, W., et al.: Diffusion-based 3D human pose estimation with multi-hypothesis aggregation. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV) (2023)
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)
Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. In: Advances in Neural Information Processing Systems (2019)
Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., Guibas, L.J.: KPConv: flexible and deformable convolution for point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2019)
Urain, J., Funk, N., Peters, J., Chalvatzaki, G.: Se (3)-diffusionfields: learning smooth cost functions for joint grasp and motion optimization through diffusion. In: 2023 IEEE International Conference on Robotics and Automation (ICRA) (2023)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017)
Vignac, C., Krawczuk, I., Siraudin, A., Wang, B., Cevher, V., Frossard, P.: Digress: discrete denoising diffusion for graph generation. arXiv preprint arXiv:2209.14734 (2022)
Deng, H., Birdal, T., Ilic, S.: PPFNet: global context aware local features for robust 3D point matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Wang, B., et al.: P2-Net: joint description and detection of local features for pixel and point matching. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
Wang, H., et al.: FreeReg: image-to-point cloud registration leveraging pretrained diffusion models and monocular depth estimators. ArXiv (2023)
Wang, J., Rupprecht, C., Novotný, D.: PoseDiffusion: solving pose estimation via diffusion-aided bundle adjustment. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV) (2023)
Wu, Q., Ding, Y., Luo, L., Zhou, C., Xie, J., Yang, J.: SGFeat: salient geometric feature for point cloud registration. arXiv preprint arXiv:2309.06207 (2023)
Wu, Q., et al.: Graph matching optimization network for point cloud registration. In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2023)
Wu, W., Wang, Z., Li, Z., Liu, W., Fuxin, L.: PointPWC-net: a coarse-to-fine network for supervised and self-supervised scene flow estimation on 3D point clouds. arXiv preprint arXiv:1911.12408 (2019)
Yan, Z., et al.: Tri-perspective view decomposition for geometry-aware depth completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4874–4884 (2024)
Yan, Z., et al.: Learning complementary correlations for depth super-resolution with incomplete data in real world. IEEE Trans. Neural Netw. Learn. Syst. (2022)
Yan, Z., Wang, K., Li, X., Zhang, Z., Li, J., Yang, J.: RigNet: repetitive image guided network for depth completion. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13687, pp. 214–230. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19812-0_13
Yan, Z., Wang, K., Li, X., Zhang, Z., Li, J., Yang, J.: DesNet: decomposed scale-consistent network for unsupervised depth completion. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 3109–3117 (2023)
Yang, H., Shi, J., Carlone, L.: TEASER: fast and certifiable point cloud registration. IEEE Trans. Rob. 37(2), 314–333 (2020)
Yang, L., Kang, B., Huang, Z., Xu, X., Feng, J., Zhao, H.: Depth anything: unleashing the power of large-scale unlabeled data. arXiv preprint arXiv:2401.10891 (2024)
Yao, R., et al.: Hunter: exploring high-order consistency for point cloud registration with severe outliers. IEEE Trans. Pattern Anal. Mach. Intell. (2023)
Yew, Z.J., Lee, G.H.: REGTR: end-to-end point cloud correspondences with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6677–6686 (2022)
Yu, H., et al.: RIGA: rotation-invariant and globally-aware descriptors for point cloud registration. arXiv preprint arXiv:2209.13252 (2022)
Yu, H., Li, F., Saleh, M., Busam, B., Ilic, S.: CoFiNet: reliable coarse-to-fine correspondences for robust pointcloud registration. In: Advances in Neural Information Processing Systems (2021)
Yu, H., et al.: Rotation-invariant transformer for point cloud matching. In: CVPR (2023)
Yu, J., Ren, L., Zhang, Y., Zhou, W., Lin, L., Dai, G.: PEAL: prior-embedded explicit attention learning for low-overlap point cloud registration. In: CVPR (2023)
Zeng, A., Song, S., Nießner, M., Fisher, M., Xiao, J., Funkhouser, T.: 3DMatch: learning local geometric descriptors from rgb-d reconstructions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1802–1811 (2017)
Zhang, X., Yang, J., Zhang, S., Zhang, Y.: 3D registration with maximal cliques. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)
Zhong, Y.: Intrinsic shape signatures: a shape descriptor for 3D object recognition. In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops (2009)
Acknowledgments
This work was partially supported by the National Science Fund of China (Grant Nos. 62361166670, 62276144, 62072242, 62276135) and the Czech Science Foundation (GACR) JUNIOR STAR Grant No. 22-23183M.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, Q. et al. (2025). Diff-Reg: Diffusion Model in Doubly Stochastic Matrix Space for Registration Problem. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15123. Springer, Cham. https://doi.org/10.1007/978-3-031-73650-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-73650-6_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73649-0
Online ISBN: 978-3-031-73650-6
eBook Packages: Computer ScienceComputer Science (R0)