[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3664647.3681428acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Open access

Crossmodal Few-shot 3D Point Cloud Semantic Segmentation via View Synthesis

Published: 28 October 2024 Publication History

Abstract

Cross-modal 2D-3D point cloud semantic segmentation using few-shot-based learning provides a practical approach for borrowing matured 2D domain knowledge into the 3D segmentation model, which reduces the reliance on laborious 3D annotation work and improves generalization to new categories. However, previous methods use single-view point cloud generation algorithms to bridge the gap between 2D images and 3D point clouds, leaving the incomplete geometry of an object or scene due to occlusions. To address this issue, we propose a novel view synthesis cross-modal few-shot point cloud semantic segmentation network. It introduces the color and depth inpainting to generate multi-view images and masks, which compensate for the absent depth information of generated point clouds. Additionally, we propose a Co-embedding Network to bridge the domain features between synthesized and original, collected 3D data, and a weighted prototype network is employed to balance the impact of multi-view images and enhance the segmentation performance. Extensive experiments on two benchmarks show the superiority of our method by outperforming the existing cross-modal few-shot 3D segmentation methods.

References

[1]
Kelsey Allen, Evan Shelhamer, Hanul Shin, and Joshua Tenenbaum. 2019. Infinite mixture prototypes for few-shot learning. In International Conference on Machine Learning. PMLR, 232--241.
[2]
Zhaochong An, Guolei Sun, Yun Liu, Fayao Liu, Zongwei Wu, Dan Wang, Luc Van Gool, and Serge Belongie. 2024. Rethinking Few-shot 3D Point Cloud Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3996--4006.
[3]
Iro Armeni, Ozan Sener, Amir R Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 2016. 3d semantic parsing of large-scale indoor spaces. 1534--1543.
[4]
Pingping Cai, Deja Scott, Xiaoguang Li, and Song Wang. 2024. Orthogonal Dictionary Guided Shape Completion Network for Point Cloud. In Proceedings of the AAAI conference on artificial intelligence.
[5]
Pingping Cai and Sanjib Sur. 2022. DeepPCD: Enabling AutoCompletion of Indoor Point Clouds with Deep Learning. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 2 (2022).
[6]
Pingping Cai, Canyu Zhang, Lingjia Shi, Lili Wang, Nasrin Imanpour, and Song Wang. 2024. EINet: Point Cloud Completion via Extrapolation and Interpolation.
[7]
Liangfu Chen, Zeng Yang, Jianjun Ma, and Zheng Luo. 2018. Driving scene perception network: Real-time joint detection, depth estimation and semantic segmentation. IEEE, 1283--1291.
[8]
Qi Chen, Sihai Tang, Qing Yang, and Song Fu. 2019. Cooper: Cooperative perception for connected autonomous vehicles based on 3d point clouds. In 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). IEEE, 514--524.
[9]
Siheng Chen, Baoan Liu, Chen Feng, Carlos Vallespi-Gonzalez, and Carl Wellington. 2020. 3d point cloud processing and learning for autonomous driving: Impacting map creation, localization, and perception. IEEE Signal Processing Magazine 38, 1 (2020), 68--86.
[10]
Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. 2017. Scannet: Richly-annotated 3d reconstructions of indoor scenes. 5828--5839.
[11]
Tianyu Gao, Xu Han, Zhiyuan Liu, and Maosong Sun. 2019. Hybrid attentionbased prototypical networks for noisy few-shot relation classification. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 6407--6414.
[12]
Qing Guo, Xiaoguang Li, Felix Juefei-Xu, Hongkai Yu, Yang Liu, and Song Wang. 2021. Jpgnet: Joint predictive filtering and generative network for image inpainting. In Proceedings of the 29th ACM International Conference on Multimedia. 386--394.
[13]
Dirk Holz, Stefan Holzer, Radu Bogdan Rusu, and Sven Behnke. 2011. Real-Time Plane Segmentation Using RGB-D Cameras. RoboCup 7416 (2011), 306--317.
[14]
Qiangui Huang, Weiyue Wang, and Ulrich Neumann. 2018. Recurrent slice networks for 3d segmentation of point clouds. 2626--2635.
[15]
Maximilian Jaritz, Jiayuan Gu, and Hao Su. 2019. Multi-view pointnet for 3d scene understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. 0--0.
[16]
Maximilian Jaritz, Tuan-Hung Vu, Raoul de Charette, Emilie Wirbel, and Patrick Pérez. 2020. xmuda: Cross-modal unsupervised domain adaptation for 3d semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12605--12614.
[17]
Pileun Kim, Jingdao Chen, and Yong K Cho. 2018. SLAM-driven robotic mapping and registration of 3D point clouds. Automation in Construction 89 (2018), 38--48.
[18]
Loic Landrieu and Martin Simonovsky. 2018. Large-scale point cloud semantic segmentation with superpoint graphs. 4558--4567.
[19]
Xiaoguang Li, Qing Guo, Di Lin, Ping Li, Wei Feng, and Song Wang. 2022. MISF: Multi-level interactive Siamese filtering for high-fidelity image inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1869--1878.
[20]
Xinzhe Li, Qianru Sun, Yaoyao Liu, Qin Zhou, Shibao Zheng, Tat-Seng Chua, and Bernt Schiele. 2019. Learning to self-train for semi-supervised few-shot classification. Advances in neural information processing systems 32 (2019).
[21]
Lu Liu, Tianyi Zhou, Guodong Long, Jing Jiang, Lina Yao, and Chengqi Zhang. 2019. Prototype propagation networks (PPN) for weakly-supervised few-shot learning on category graph. arXiv preprint arXiv:1905.04042 (2019).
[22]
Weiquan Liu, Baiqi Lai, Cheng Wang, Xuesheng Bian, Wentao Yang, Yan Xia, Xiuhong Lin, Shang-Hong Lai, DongdongWeng, and Jonathan Li. 2020. Learning to match 2d images and 3d lidar point clouds for outdoor augmented reality. In 2020 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW). IEEE, 654--655.
[23]
Kushagra Mahajan, Monika Sharma, and Lovekesh Vig. 2020. Metadermdiagnosis: Few-shot skin disease identification using meta-learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 730--731.
[24]
Bilawal Mahmood and SangUk Han. 2019. 3D registration of indoor point clouds for augmented reality. In Computing in civil engineering 2019: Visualization, information modeling, and simulation. American Society of Civil Engineers Reston, VA, 1--8.
[25]
Kamyar Nazeri, Eric Ng, Tony Joseph, Faisal Qureshi, and Mehran Ebrahimi. 2019. EdgeConnect: Structure Guided Image Inpainting using Edge Prediction. In IEEE/CVF International Conference on Computer Vision (ICCV) Workshops.
[26]
François Pomerleau, Francis Colas, Roland Siegwart, et al. 2015. A review of point cloud registration algorithms for mobile robotics. Foundations and Trends® in Robotics 4, 1 (2015), 1--104.
[27]
Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. 652--660.
[28]
Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. 2017. Pointnet: Deep hierarchical feature learning on point sets in a metric space. 30 (2017).
[29]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684--10695.
[30]
Meng-Li Shih, Shih-Yang Su, Johannes Kopf, and Jia-Bin Huang. 2020. 3d photography using context-aware layered depth inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8028--8038.
[31]
Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for few-shot learning. Advances in neural information processing systems 30 (2017).
[32]
Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. 2016. Matching networks for one shot learning. 29 (2016).
[33]
Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon. 2019. Dynamic graph cnn for learning on point clouds. Acm Transactions On Graphics (tog) 38, 5 (2019), 1--12.
[34]
Peng Xiang, Xin Wen, Yu-Shen Liu, Yan-Pei Cao, Pengfei Wan, Wen Zheng, and Zhizhong Han. 2021. SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer.
[35]
Jinfeng Xu, Siyuan Yang, Xianzhi Li, Yuan Tang, Yixue Hao, Long Hu, and Min Chen. 2024. A Probability-Driven Framework for Open World 3D Point Cloud Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5977--5986.
[36]
Xiaoqing Ye, Jiamao Li, Hexiao Huang, Liang Du, and Xiaolin Zhang. 2018. 3d recurrent neural networks with context fusion for point cloud semantic segmentation. 403--417.
[37]
Xumin Yu, Yongming Rao, Ziyi Wang, Zuyan Liu, Jiwen Lu, and Jie Zhou. 2021. Pointr: Diverse point cloud completion with geometry-aware transformers. In Proceedings of the IEEE/CVF international conference on computer vision. 12498-- 12507.
[38]
Wentao Yuan, Tejas Khot, David Held, Christoph Mertz, and Martial Hebert. 2018. PCN: Point Completion Network. In International Conference on 3D Vision (3DV).
[39]
Canyu Zhang, Qing Guo, Xiaoguang Li, Renjie Wan, Hongkai Yu, Ivor Tsang, and Song Wang. 2023. Superinpaint: Learning detail-enhanced attentional implicit representation for super-resolutional image inpainting. arXiv preprint arXiv:2307.14489 (2023).
[40]
Canyu Zhang, Xiaoguang Li, Qing Guo, and Song Wang. 2023. SAIR: Learning Semantic-aware Implicit Representation. arXiv preprint arXiv:2310.09285 (2023).
[41]
Chi Zhang, Guosheng Lin, Fayao Liu, Jiushuang Guo, Qingyao Wu, and Rui Yao. 2019. Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9587--9595.
[42]
Na Zhao, Tat-Seng Chua, and Gim Hee Lee. 2021. Few-shot 3d point cloud semantic segmentation. 8873--8882.
[43]
Ziyu Zhao, Zhenyao Wu, Xinyi Wu, Canyu Zhang, and Song Wang. 2022. Crossmodal Few-Shot 3D Point Cloud Semantic Segmentation. In Proceedings of the 30th ACM International Conference on Multimedia (Lisboa, Portugal) (MM '22). Association for Computing Machinery, New York, NY, USA, 4760--4768. https://doi.org/10.1145/3503161.3548251
[44]
Kaiyang Zhou, Yongxin Yang, Yu Qiao, and Tao Xiang. 2021. Domain generalization with mixstyle. arXiv preprint arXiv:2104.02008 (2021).
[45]
Junjie Zhu, Xiaodong Yi, Naiyang Guan, and Hang Cheng. 2020. Robust reweighting prototypical networks for few-shot classification. In 2020 6th International Conference on Robotics and Artificial Intelligence. 140--146.

Index Terms

  1. Crossmodal Few-shot 3D Point Cloud Semantic Segmentation via View Synthesis

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
    October 2024
    11719 pages
    ISBN:9798400706868
    DOI:10.1145/3664647
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 October 2024

    Check for updates

    Author Tags

    1. 3d point cloud semantic segmentation
    2. cross-modal
    3. few-shot learning

    Qualifiers

    • Research-article

    Conference

    MM '24
    Sponsor:
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne VIC, Australia

    Acceptance Rates

    MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 110
      Total Downloads
    • Downloads (Last 12 months)110
    • Downloads (Last 6 weeks)52
    Reflects downloads up to 15 Jan 2025

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media