[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Diff-Reg: Diffusion Model in Doubly Stochastic Matrix Space for Registration Problem

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

Establishing reliable correspondences is essential for 3D and 2D-3D registration tasks. Existing methods commonly leverage geometric or semantic point features to generate potential correspondences. However, these features may face challenges such as large deformation, scale inconsistency, and ambiguous matching problems (e.g., symmetry). Additionally, many previous methods, which rely on single-pass prediction, may struggle with local minima in complex scenarios. To mitigate these challenges, we introduce a diffusion matching model for robust correspondence construction. Our approach treats correspondence estimation as a denoising diffusion process within the doubly stochastic matrix space, which gradually denoises (refines) a doubly stochastic matching matrix to the ground-truth one for high-quality correspondence estimation. It involves a forward diffusion process that gradually introduces Gaussian noise into the ground truth matching matrix and a reverse denoising process that iteratively refines the noisy one. In particular, we deploy a lightweight denoising strategy during the inference phase. Specifically, once points/image features are extracted and fixed, we utilize them to conduct multiple-pass denoising predictions in the reverse sampling process. Evaluation of our method on both 3D and 2D-3D registration tasks confirms its effectiveness. The code is available at https://github.com/wuqianliang/Diff-Reg.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 49.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 64.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Arun, K.S., Huang, T.S., Blostein, S.D.: Least-squares fitting of two 3-D point sets. IEEE Trans. Pattern Anal. Mach. Intell. (1987)

    Google Scholar 

  2. Austin, J., Johnson, D.D., Ho, J., Tarlow, D., Van Den Berg, R.: Structured denoising diffusion models in discrete state-spaces. Adv. Neural. Inf. Process. Syst. 34, 17981–17993 (2021)

    Google Scholar 

  3. Bai, X., et al.: PointDSC: robust point cloud registration using deep spatial consistency. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)

    Google Scholar 

  4. Bai, X., Luo, Z., Zhou, L., Fu, H., Quan, L., Tai, C.L.: D3fEat: Joint learning of dense detection and description of 3D local features. In: CVPR (2020)

    Google Scholar 

  5. Baranchuk, D., Rubachev, I., Voynov, A., Khrulkov, V., Babenko, A.: Label-efficient semantic segmentation with diffusion models. ArXiv (2021)

    Google Scholar 

  6. Besl, P.J., McKay, N.D.: Method for registration of 3-D shapes. In: Sensor fusion IV: Control Paradigms and Data Structures (1992)

    Google Scholar 

  7. Caron, R.M., Li, X., Mikusiński, P., Sherwood, H., Taylor, M.D.: Nonsquare “doubly stochastic” matrices. Lect. Notes-Monogr. Ser. 28, 65–75 (1996). http://www.jstor.org/stable/4355884

  8. Chen, S., Sun, P., Song, Y., Luo, P.: DiffusionDet: diffusion model for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)

    Google Scholar 

  9. Chen, Z., et al.: DiffusionPCR: diffusion models for robust multi-step point cloud registration. arXiv preprint arXiv:2312.03053 (2023)

  10. Chen, Z., Sun, K., Yang, F., Tao, W.: SC2-PCR: a second order spatial compatibility for efficient and robust point cloud registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)

    Google Scholar 

  11. Choy, C., Park, J., Koltun, V.: Fully convolutional geometric features. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2019)

    Google Scholar 

  12. Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. In: Advances in Neural Information Processing Systems (2013)

    Google Scholar 

  13. Deng, H., Birdal, T., Ilic, S.: PPF-FoldNet: unsupervised learning of rotation invariant 3D local descriptors. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)

    Google Scholar 

  14. Fu, K., Liu, S., Luo, X., Wang, M.: Robust point cloud registration framework based on deep graph matching. In: CVPR (2021)

    Google Scholar 

  15. Gong, J., Foo, L.G., Fan, Z., Ke, Q., Rahmani, H., Liu, J.: DiffPose: toward more reliable 3D pose estimation. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

    Google Scholar 

  16. Gu, Z., Chen, H., Xu, Z., Lan, J., Meng, C., Wang, W.: DiffusioniNST: diffusion model for instance segmentation. ArXiv (2022)

    Google Scholar 

  17. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  18. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Advances in Neural Information Processing Systems (2020)

    Google Scholar 

  19. Huang, S., Gojcic, Z., Usvyatsov, M., Wieser, A., Schindler, K.: Predator: registration of 3D point clouds with low overlap. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)

    Google Scholar 

  20. Jiang, H., Dang, Z., Wei, Z., Xie, J., Yang, J., Salzmann, M.: Robust outlier rejection for 3D registration with variational bayes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)

    Google Scholar 

  21. Jiang, H., Salzmann, M., Dang, Z., Xie, J., Yang, J.: Se (3) diffusion model-based point cloud registration for robust 6D object pose estimation. arXiv preprint arXiv:2310.17359 (2023)

  22. Lai, K., Bo, L., Fox, D.: Unsupervised feature learning for 3D scene labeling. In: IEEE International Conference on Robotics and Automation (ICRA) (2014)

    Google Scholar 

  23. Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate O(n) solution to the PnP problem. Int. J. Comput. Vision (2009)

    Google Scholar 

  24. Li, J., Lee, G.H.: DeepI2P: image-to-point cloud registration via deep classification. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

    Google Scholar 

  25. Li, M., et al.: 2D3D-MATR: 2D-3D matching transformer for detection-free registration between images and point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)

    Google Scholar 

  26. Li, S., Xu, C., Xie, M.: A robust O(n) solution to the perspective-n-point problem. IEEE Trans. Pattern Anal. Mach. Intell. (2012)

    Google Scholar 

  27. Li, X., Kaesemodel Pontes, J., Lucey, S.: Neural scene flow prior. In: Advances in Neural Information Processing Systems (2021)

    Google Scholar 

  28. Li, Y., Harada, T.: Lepard: learning partial point cloud matching in rigid and deformable scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)

    Google Scholar 

  29. Li, Y., Harada, T.: Non-rigid point cloud registration with neural deformation pyramid. In: Advances in Neural Information Processing Systems (2022)

    Google Scholar 

  30. Li, Y., Takehara, H., Taketomi, T., Zheng, B., Nießner, M.: 4DComplete: non-rigid motion estimation beyond the observable surface. In: ICCV (2021)

    Google Scholar 

  31. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  32. Lin, T.Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  33. Mei, G., Huang, X., Yu, L., Zhang, J., Bennamoun, M.: COTReg: coupled optimal transport based point cloud registration. arXiv preprint arXiv:2112.14381 (2021)

  34. Mei, G., et al.: Unsupervised deep probabilistic approach for partial point cloud registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)

    Google Scholar 

  35. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. Commun. ACM (2021)

    Google Scholar 

  36. Oquab, M., et al.: DINOv2: learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)

  37. Parisi, G.: Correlation functions and computer simulations. Nucl. Phys. B (1981)

    Google Scholar 

  38. Puy, G., Boulch, A., Marlet, R.: FLOT: scene flow on point clouds guided by optimal transport. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 527–544. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_32

    Chapter  Google Scholar 

  39. Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3D object detection in point clouds. In: proceedings of the IEEE/CVF International Conference on Computer Vision (2019)

    Google Scholar 

  40. Qin, Z., Yu, H., Wang, C., Guo, Y., Peng, Y., Xu, K.: Geometric transformer for fast and robust point cloud registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)

    Google Scholar 

  41. Qin, Z., Yu, H., Wang, C., Peng, Y., Xu, K.: Deep graph-based spatial consistency for robust non-rigid point cloud registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)

    Google Scholar 

  42. Shan, W., et al.: Diffusion-based 3D human pose estimation with multi-hypothesis aggregation. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV) (2023)

    Google Scholar 

  43. Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)

  44. Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. In: Advances in Neural Information Processing Systems (2019)

    Google Scholar 

  45. Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., Guibas, L.J.: KPConv: flexible and deformable convolution for point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2019)

    Google Scholar 

  46. Urain, J., Funk, N., Peters, J., Chalvatzaki, G.: Se (3)-diffusionfields: learning smooth cost functions for joint grasp and motion optimization through diffusion. In: 2023 IEEE International Conference on Robotics and Automation (ICRA) (2023)

    Google Scholar 

  47. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017)

    Google Scholar 

  48. Vignac, C., Krawczuk, I., Siraudin, A., Wang, B., Cevher, V., Frossard, P.: Digress: discrete denoising diffusion for graph generation. arXiv preprint arXiv:2209.14734 (2022)

  49. Deng, H., Birdal, T., Ilic, S.: PPFNet: global context aware local features for robust 3D point matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  50. Wang, B., et al.: P2-Net: joint description and detection of local features for pixel and point matching. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021)

    Google Scholar 

  51. Wang, H., et al.: FreeReg: image-to-point cloud registration leveraging pretrained diffusion models and monocular depth estimators. ArXiv (2023)

    Google Scholar 

  52. Wang, J., Rupprecht, C., Novotný, D.: PoseDiffusion: solving pose estimation via diffusion-aided bundle adjustment. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV) (2023)

    Google Scholar 

  53. Wu, Q., Ding, Y., Luo, L., Zhou, C., Xie, J., Yang, J.: SGFeat: salient geometric feature for point cloud registration. arXiv preprint arXiv:2309.06207 (2023)

  54. Wu, Q., et al.: Graph matching optimization network for point cloud registration. In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2023)

    Google Scholar 

  55. Wu, W., Wang, Z., Li, Z., Liu, W., Fuxin, L.: PointPWC-net: a coarse-to-fine network for supervised and self-supervised scene flow estimation on 3D point clouds. arXiv preprint arXiv:1911.12408 (2019)

  56. Yan, Z., et al.: Tri-perspective view decomposition for geometry-aware depth completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4874–4884 (2024)

    Google Scholar 

  57. Yan, Z., et al.: Learning complementary correlations for depth super-resolution with incomplete data in real world. IEEE Trans. Neural Netw. Learn. Syst. (2022)

    Google Scholar 

  58. Yan, Z., Wang, K., Li, X., Zhang, Z., Li, J., Yang, J.: RigNet: repetitive image guided network for depth completion. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13687, pp. 214–230. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19812-0_13

    Chapter  Google Scholar 

  59. Yan, Z., Wang, K., Li, X., Zhang, Z., Li, J., Yang, J.: DesNet: decomposed scale-consistent network for unsupervised depth completion. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 3109–3117 (2023)

    Google Scholar 

  60. Yang, H., Shi, J., Carlone, L.: TEASER: fast and certifiable point cloud registration. IEEE Trans. Rob. 37(2), 314–333 (2020)

    Article  Google Scholar 

  61. Yang, L., Kang, B., Huang, Z., Xu, X., Feng, J., Zhao, H.: Depth anything: unleashing the power of large-scale unlabeled data. arXiv preprint arXiv:2401.10891 (2024)

  62. Yao, R., et al.: Hunter: exploring high-order consistency for point cloud registration with severe outliers. IEEE Trans. Pattern Anal. Mach. Intell. (2023)

    Google Scholar 

  63. Yew, Z.J., Lee, G.H.: REGTR: end-to-end point cloud correspondences with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6677–6686 (2022)

    Google Scholar 

  64. Yu, H., et al.: RIGA: rotation-invariant and globally-aware descriptors for point cloud registration. arXiv preprint arXiv:2209.13252 (2022)

  65. Yu, H., Li, F., Saleh, M., Busam, B., Ilic, S.: CoFiNet: reliable coarse-to-fine correspondences for robust pointcloud registration. In: Advances in Neural Information Processing Systems (2021)

    Google Scholar 

  66. Yu, H., et al.: Rotation-invariant transformer for point cloud matching. In: CVPR (2023)

    Google Scholar 

  67. Yu, J., Ren, L., Zhang, Y., Zhou, W., Lin, L., Dai, G.: PEAL: prior-embedded explicit attention learning for low-overlap point cloud registration. In: CVPR (2023)

    Google Scholar 

  68. Zeng, A., Song, S., Nießner, M., Fisher, M., Xiao, J., Funkhouser, T.: 3DMatch: learning local geometric descriptors from rgb-d reconstructions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1802–1811 (2017)

    Google Scholar 

  69. Zhang, X., Yang, J., Zhang, S., Zhang, Y.: 3D registration with maximal cliques. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)

    Google Scholar 

  70. Zhong, Y.: Intrinsic shape signatures: a shape descriptor for 3D object recognition. In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops (2009)

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by the National Science Fund of China (Grant Nos. 62361166670, 62276144, 62072242, 62276135) and the Czech Science Foundation (GACR) JUNIOR STAR Grant No. 22-23183M.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jin Xie or Jian Yang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 9150 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, Q. et al. (2025). Diff-Reg: Diffusion Model in Doubly Stochastic Matrix Space for Registration Problem. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15123. Springer, Cham. https://doi.org/10.1007/978-3-031-73650-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73650-6_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73649-0

  • Online ISBN: 978-3-031-73650-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics