Abstract
The state-of-the-art group-free network (GFNet) has achieved superior performance for indoor scene 3D object detection. However, we find there is still room for improvement in the following three aspects. Firstly, seed point features extracted by multi-layer perception (MLP) in the backbone (PointNet++) neglect to consider the different importance of each level feature. Second, the single-scale transformer module in GFNet to handle hand-crafted grouping via Hough Voting cannot adequately model the relationship between points and objects. Finally, GFNet directly utilizes the decoders to predict detection results disregarding the different contributions of decoders at each stage. In this paper, we propose the group-free enhancement network (GFENet) to tackle the above issues. Specifically, our network mainly consists of three lifting modules: the weighted MLP (WMLP) module, the hierarchical-aware module, and the stage-aware module. The WMLP module adaptively combines features of different levels in the backbone before max-pooling for informative feature learning. The hierarchical-aware module formulates a hierarchical way to mitigate the negative impact of insufficient modeling of points and objects. The stage-aware module aggregates multi-stage predictions adaptively for better detection performance. Extensive experiments on ScanNet V2 and SUN RGB-D datasets demonstrate the effectiveness and advantages of our method against existing 3D object detection methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wu, Q., Yu, Y., Luo, T., Lu, P.: GridPointNet: grid and point-based 3D object detection from point cloud. In: Sun, F., Hu, D., Wermter, S., Yang, L., Liu, H., Fang, B. (eds.) ICCSIP 2021. CCIS, vol. 1515, pp. 191–199. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-9247-5_14
Lian, Q., Xu, Y., Yao, W., Chen, Y., Zhang, T.: Semi-supervised monocular 3D object detection by multi-view consistency. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13668, pp. 715–731. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20074-8_41
Qin, Y., Chi, X., Sheng, B., Lau, R.W.: GuideRender: large-scale scene navigation based on multi-modal view frustum movement prediction. Vis. Comput. 39, 3597–3607 (2023). https://doi.org/10.1007/s00371-023-02922-x
Li, J., et al.: Automatic detection and classification system of domestic waste via multimodel cascaded convolutional neural network. IEEE Trans. Ind. Inf. 18(1), 163–173 (2021)
Lin, D., Fidler, S., Urtasun, R.: Holistic scene understanding for 3D object detection with RGBD cameras. In: IEEE International Conference on Computer Vision, pp. 1417–1424 (2013)
Song, S., Xiao, J.: Sliding shapes for 3D object detection in depth images. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 634–651. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_41
Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3D proposal generation and object detection from view aggregation. In: IEEE International Conference on Intelligent Robots and Systems, pp. 1–8 (2018)
Liang, M., Yang, B., Wang, S., Urtasun, R.: Deep continuous fusion for multi-sensor 3D object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 663–678. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_39
Hou, J., Dai, A., Nießner, M.: 3D-SIS: 3D semantic instance segmentation of RGB-D scans. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4421–4430 (2019)
Shi, S., Wang, Z., Shi, J., Wang, X., Li, H.: From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network. IEEE Trans. Pattern Anal. Mach. Intell. 43(8), 2647–2664 (2020)
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 12697–12705 (2019)
Vu, T., Kim, K., Luu, T.M., Nguyen, X.T., Yoo, C.D.: Softgroup for 3D instance segmentation on 3D point clouds. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2708–2717 (2022)
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Zhang, Z., Sun, B., Yang, H., Huang, Q.: H3DNet: 3D object detection using hybrid geometric primitives. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 311–329. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_19
Xie, Q., et al.: MLCVNet: multi-level context VoteNet for 3D object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 10447–10456 (2020)
Xie, Q., et al.: VENet: voting enhancement network for 3D object detection. In: IEEE International Conference on Computer Vision, pp. 3712–3721 (2021)
Rukhovich, D., Vorontsova, A., Konushin, A.: FCAF3D: fully convolutional anchor-free 3D object detection. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13670, pp. 477–493. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20080-9_28
Cheng, B., Sheng, L., Shi, S., Yang, M., Xu, D.: Back-tracing representative points for voting-based 3D object detection in point clouds. In: IEEE International Conference on Computer Vision, pp. 8963–8972 (2021)
Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3D object detection in point clouds. In: IEEE International Conference on Computer Vision, pp. 9277–9286 (2019)
Liu, Z., Zhang, Z., Cao, Y., Hu, H., Tong, X.: Group-free 3D object detection via transformers. In: IEEE International Conference on Computer Vision, pp. 2949–2958 (2021)
Chen, H., et al.: Learning to match features with seeded graph matching network. In: IEEE International Conference on Computer Vision, pp. 6301–6310 (2021)
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)
Song, S., Lichtenberg, S.P., Xiao, J.: SUN RGB-D: a RGB-D scene understanding benchmark suite. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 567–576 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28, pp. 91–99 (2015)
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. In: International Conference on Learning Representations (2021)
Song, G., Liu, Y., Wang, X.: Revisiting the sibling head in object detector. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 11563–11572 (2020)
Li, Y., et al.: Should all proposals be treated equally in object detection? In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13685, pp. 556–572. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19806-9_32
Wang, S.Y., Qu, Z., Li, C.J., Gao, L.Y.: BANet: small and multi-object detection with a bidirectional attention network for traffic scenes. Eng. Appl. Artif. Intell. 117, 105504 (2023)
Guo, J., Feng, H., Xu, H., Yu, W., Shuzhi Ge, S.: D3-Net: integrated multi-task convolutional neural network for water surface deblurring, dehazing and object detection. Eng. Appl. Artif. Intell. 117, 105558 (2023)
Song, S., Xiao, J.: Deep sliding shapes for amodal 3D object detection in RGB-D images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 808–816 (2016)
Chen, K., Zhou, F., Dai, J., Shen, P., Cai, X., Zhang, F.: MCGNet: multi-level context-aware and geometric-aware network for 3D object detection. In: IEEE International Conference on Image Processing, pp. 1846–1850 (2022)
Huang, Z., Yu, Y., Xu, J., Ni, F., Le, X.: PF-Net: point fractal network for 3D point cloud completion. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7662–7670 (2020)
Vaswani, A., et al.: Attention is all you need. In: Conference and Workshop on Neural Information Processing Systems, pp. 5998–6008 (2017)
Zhao, B., Gong, M., Li, X.: Hierarchical multimodal transformer to summarize videos. Neurocomputing 468, 360–369 (2022)
Yuan, L., et al.: Tokens-to-Token ViT: training vision transformers from scratch on ImageNet. In: IEEE International Conference on Computer Vision, pp. 558–567 (2021)
Liu, X., Wang, L., Han, X.: Transformer with peak suppression and knowledge guidance for fine-grained image recognition. Neurocomputing 492, 137–149 (2022)
Park, C., Jeong, Y., Cho, M., Park, J.: Fast point transformer. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 16949–16958 (2022)
Chen, Y., Yang, Z., Zheng, X., Chang, Y., Li, X.: PointFormer: a dual perception attention-based network for point cloud classification. In: Proceedings of the Asian Conference on Computer Vision, pp. 3291–3307 (2022)
Wu, X., Lao, Y., Jiang, L., Liu, X., Zhao, H.: Point Transformer V2: grouped vector attention and partition-based pooling. In: Advances in Neural Information Processing Systems (2022)
Lai, X., et al.: Stratified transformer for 3D point cloud segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8500–8509 (2022)
Misra, I., Girdhar, R., Joulin, A.: An end-to-end transformer model for 3D object detection. In: IEEE International Conference on Computer Vision, pp. 2906–2917 (2021)
Pan, X., Xia, Z., Song, S., Li, L.E., Huang, G.: 3D object detection with pointformer. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7463–7472 (2021)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Li, Y., Ma, L., Tan, W., Sun, C., Cao, D., Li, J.: GRNet: geometric relation network for 3D object detection from point clouds. ISPRS J. Photogramm. Remote. Sens. 165, 43–53 (2020)
Griffiths, D., Boehm, J., Ritschel, T.: Finding your (3D) center: 3D object detection using a learned loss. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12363, pp. 70–85. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_5
Du, H., Li, L., Liu, B., Vasconcelos, N.: SPOT: selective point cloud voting for better proposal in point cloud object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 230–247. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_14
Gwak, J.Y., Choy, C., Savarese, S.: Generative sparse detection networks for 3D single-shot object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 297–313. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_18
Chen, J., Lei, B., Song, Q., Ying, H., Chen, D.Z., Wu, J.: A hierarchical graph network for 3D object detection on point clouds. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 392–401 (2020)
Zhao, N., Chua, T.S., Lee, G.H.: SESS: self-ensembling semi-supervised 3D object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 11079–11087 (2020)
Najibi, M., et al.: DOPS: learning to detect 3D objects and predict their 3D shapes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 11913–11922 (2020)
Zheng, Y., Duan, Y., Lu, J., Zhou, J., Tian, Q.: HyperDet3D: learning a scene-conditioned 3D object detector. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5585–5594 (2022)
Wang, H., et al.: RBGNet: ray-based grouping for 3D object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110–1119 (2022)
Feng, M., Gilani, S.Z., Wang, Y., Zhang, L., Mian, A.: Relation graph network for 3D object detection in point clouds. IEEE Trans. Image Process. 30, 92–107 (2021)
Acknowledgements
This work was supported by Beijing Natural Science Foundation (4232023), R\( { \& }\)D Program of Beijing Municipal Education Commission (KM202310009002), and National Natural Science Foundation of China (62102208). The authors also thank the editor and all the reviewers for their very helpful comments to improve this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhou, F. et al. (2024). GFENet: Group-Free Enhancement Network for Indoor Scene 3D Object Detection. In: Sheng, B., Bi, L., Kim, J., Magnenat-Thalmann, N., Thalmann, D. (eds) Advances in Computer Graphics. CGI 2023. Lecture Notes in Computer Science, vol 14497. Springer, Cham. https://doi.org/10.1007/978-3-031-50075-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-50075-6_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50074-9
Online ISBN: 978-3-031-50075-6
eBook Packages: Computer ScienceComputer Science (R0)