Abstract
Existing fully supervised method 3D point cloud segmentation methods heavily rely on carefully annotated point labels. In this work, we look at weakly-supervised 3D instance segmentation using bounding boxes supervision. Bounding boxes are much easier to annotate than dense point-wise labels. Moreover, they demonstrated high potential in addressing instance-level segmentation compared to other types of weak annotations. However, existing bounding-box supervised techniques have struggled to keep pace with the development of fully-supervised methods. To tackle this issue, we propose a simple-yet-effective approach to directly leverage the network architecture of fully-supervised methods for such weak supervision scenarios. We found that accurate instance labels for each point can be generated with the given bounding boxes by leveraging 3D geometric prior. Such a process is efficient and does not require any additional training or fine-tuning. The generated point-wise labels can be fed to any advanced fully-supervised model without re-designing specific networks for bounding-box supervision. In this fashion, our designed approach achieves on par performance of fully supervised methods in terms of AP, AP50 and AP25. Remarkably, we outperformed the state-of-the-art bounding-box supervised method by 21%. Compared with existing methods, our method is extremely simple and only involves two small heuristics in the data preprocessing step. In addition, our method is proven to be robust against noisy bounding box scenario through experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chen, S., Fang, J., Zhang, Q., Liu, W., Wang, X.: Hierarchical aggregation for 3D instance segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 15467–15476 (2021)
Cheng, B., Choudhuri, A., Misra, I., Kirillov, A., Girdhar, R., Schwing, A.G.: Mask2Former for video instance segmentation. arXiv preprint arXiv:2112.10764 (2021)
Chibane, J., Engelmann, F., Anh Tran, T., Pons-Moll, G.: Box2Mask: weakly supervised 3D semantic instance segmentation using bounding boxes. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13691, pp. 681–699. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19821-2_39
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5828–5839 (2017)
Dai, A., Nießner, M., Zollhöfer, M., Izadi, S., Theobalt, C.: BundleFusion: real-time globally consistent 3D reconstruction using on-the-fly surface reintegration. ACM Trans. Graph. (ToG) 36(4), 1 (2017)
Du, H., Yu, X., Hussain, F., Armin, M.A., Petersson, L., Li, W.: Weakly-supervised point cloud instance segmentation with geometric priors. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 4271–4280 (2023)
Engelmann, F., Bokeloh, M., Fathi, A., Leibe, B., Nießner, M.: 3D-MPA: multi-proposal aggregation for 3D semantic instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9031–9040 (2020)
Han, C., Yu, X., Gao, C., Sang, N., Yang, Y.: Single image based 3D human pose estimation via uncertainty learning. Pattern Recogn. 132, 108934 (2022)
Hou, J., Dai, A., Nießner, M.: 3D-SIS: 3D semantic instance segmentation of RGB-D scans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4421–4430 (2019)
Hou, J., Graham, B., Nießner, M., Xie, S.: Exploring data-efficient 3D scene understanding with contrastive scene contexts. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15587–15597 (2021)
Jiang, L., Zhao, H., Shi, S., Liu, S., Fu, C.W., Jia, J.: PointGroup: dual-set point grouping for 3D instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Kulharia, V., Chandra, S., Agrawal, A., Torr, P., Tyagi, A.: Box2Seg: attention weighted loss and discriminative feature learning for weakly supervised segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12372, pp. 290–308. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58583-9_18
Landrieu, L., Boussaha, M.: Point cloud over segmentation with graph-structured deep metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7440–7449 (2019)
Liang, Z., Li, Z., Xu, S., Tan, M., Jia, K.: Instance segmentation in 3D scenes using semantic superpoint tree networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2783–2792 (2021)
Liao, Y., Zhu, H., Zhang, Y., Ye, C., Chen, T., Fan, J.: Point cloud instance segmentation with semi-supervised bounding-box mining. IEEE Trans. Pattern Anal. Mach. Intell. 44, 10159–10170 (2021)
Liu, C., Furukawa, Y.: MASC: multi-scale affinity with sparse convolution for 3D instance segmentation. arXiv preprint arXiv:1902.04478 (2019)
Liu, C., et al.: Audio-visual segmentation, sound localization, semantic-aware sounding objects localization. arXiv preprint arXiv:2307.16620 (2023)
Liu, C., et al.: BAVS: bootstrapping audio-visual segmentation by integrating foundation knowledge. arXiv preprint arXiv:2308.10175 (2023)
Liu, S.H., Yu, S.Y., Wu, S.C., Chen, H.T., Liu, T.L.: Learning Gaussian instance segmentation in point clouds. arXiv preprint arXiv:2007.09860 (2020)
Liu, Y., Hu, Q., Lei, Y., Xu, K., Li, J., Guo, Y.: Box2Seg: learning semantics of 3D point clouds with box-level supervision. arXiv preprint arXiv:2201.02963 (2022)
Liu, Z., Qi, X., Fu, C.W.: One thing one click: a self-training approach for weakly supervised 3D semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1726–1736 (2021)
Ma, F., Wu, Y., Yu, X., Yang, Y.: Learning with noisy labels via self-reweighting from class centroids. IEEE Trans. Neural Netw. Learn. Syst. 33(11), 6275–6285 (2021)
Ngo, T.D., Hua, B.S., Nguyen, K.: ISBNet: a 3D point cloud instance segmentation network with instance-aware sampling and box-aware dynamic convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13550–13559 (2023)
Qi, X., Liu, C., Li, L., Hou, J., Xin, H., Yu, X.: EmotionGesture: audio-driven diverse emotional co-speech 3D gesture generation (2023)
Qi, X., Liu, C., Sun, M., Li, L., Fan, C., Yu, X.: Diverse 3D hand gesture prediction from body dynamics by bilateral hand disentanglement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4616–4626 (2023)
Schult, J., Engelmann, F., Hermans, A., Litany, O., Tang, S., Leibe, B.: Mask3D for 3D semantic instance segmentation. arXiv preprint arXiv:2210.03105 (2022)
Sun, J., Qing, C., Tan, J., Xu, X.: Superpoint transformer for 3D scene instance segmentation. arXiv preprint arXiv:2211.15766 (2022)
Vu, T., Kim, K., Luu, T.M., Nguyen, T., Kim, J., Yoo, C.D.: SoftGroup++: scalable 3D instance segmentation with octree pyramid grouping. arXiv preprint arXiv:2209.08263 (2022)
Vu, T., Kim, K., Luu, T.M., Nguyen, T., Yoo, C.D.: SoftGroup for 3D instance segmentation on point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2708–2717 (2022)
Wu, Y., et al.: PointMatch: a consistency training framework for weakly supervised semantic segmentation of 3D point clouds. arXiv preprint arXiv:2202.10705 (2022)
Wu, Z., Wu, Y., Lin, G., Cai, J., Qian, C.: Dual adaptive transformations for weakly supervised point cloud segmentation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13691, pp. 78–96. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19821-2_5
Xu, X., Lee, G.H.: Weakly supervised semantic point cloud segmentation: towards 10x fewer labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13706–13715 (2020)
Xu, Y., Yu, X., Zhang, J., Zhu, L., Wang, D.: Weakly supervised RGB-D salient object detection with prediction consistency training and active scribble boosting. IEEE Trans. Image Process. 31, 2148–2161 (2022)
Yang, B., et al.: Learning object bounding boxes for 3D instance segmentation on point clouds. arXiv preprint arXiv:1906.01140 (2019)
Ye, S., Chen, D., Han, S., Liao, J.: Learning with noisy labels for robust point cloud segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6443–6452 (2021)
Yu, Q., Du, H., Liu, C., Yu, X.: When 3D bounding-box meets SAM: point cloud instance segmentation with weak-and-noisy supervision (2023)
Zhan, H., Zheng, J., Xu, Y., Reid, I., Rezatofighi, H.: ActiveRMAP: radiance field for active mapping and planning. arXiv preprint arXiv:2211.12656 (2022)
Zhang, J., Yu, X., Li, A., Song, P., Liu, B., Dai, Y.: Weakly-supervised salient object detection via scribble annotations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12546–12555 (2020)
Zhang, Y., Li, Z., Xie, Y., Qu, Y., Li, C., Mei, T.: Weakly supervised semantic segmentation for large-scale point cloud. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 3421–3429 (2021)
Zhou, Y., Zhu, Y., Ye, Q., Qiu, Q., Jiao, J.: Weakly supervised instance segmentation using class peak response. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3791–3800 (2018)
Acknowledgements
This research is funded in part by ARC-Discovery grant (DP220200800 to XY) and ARC-DECRA grant (DE230100477 to XY). We thank all anonymous reviewers and ACs for their constructive suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yu, Q., Du, H., Yu, X. (2024). A New Perspective of Weakly Supervised 3D Instance Segmentation via Bounding Boxes. In: Liu, T., Webb, G., Yue, L., Wang, D. (eds) AI 2023: Advances in Artificial Intelligence. AI 2023. Lecture Notes in Computer Science(), vol 14471. Springer, Singapore. https://doi.org/10.1007/978-981-99-8388-9_9
Download citation
DOI: https://doi.org/10.1007/978-981-99-8388-9_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8387-2
Online ISBN: 978-981-99-8388-9
eBook Packages: Computer ScienceComputer Science (R0)