[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Multiscale object detection based on channel and data enhancement at construction sites

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Object detection based on computer vision techniques plays an important role in the safety monitoring of large-scene construction sites. However, current object detection algorithms typically have poor performance on small targets. In this study, an enhanced multiscale object detection algorithm is developed to solve the problem of poor detection performance due to scale changes at construction sites. First, a scale-aware data automatic augmentation is defined to learn a data augmentation strategy. Then, to mitigate information loss caused by channel reduction when using feature pyramid network, we propose a method based on subpixel convolution to perform channel enhancement and upsampling, and add a bottom-up path to enhance the complete feature hierarchy with accurate localization signals in the lower layers. Experimental results show that the proposed algorithm achieves better accuracy on the construction site (MOCS) data set and the MS COCO data set. For example, compared with the Faster R-CNN detector with the ResNet-50 backbone network on the MOCS data set and MS COCO data set, the average accuracy increased by \(8.0\%\) and \(1.5\%\), respectively. In particular, the average accuracy of small targets increased by \(10.3\%\) and \(3.4\%\), respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Vasuhi, S., Vaidehi, V.: Target detection and tracking for video surveillance. WSEAS Trans. Signal Process. 10, 179–188 (2014)

    Google Scholar 

  2. Zhang, D.D., Lei, L.I.: Face detection system based on pcanet-rf. Comput. Technol. Dev. 26(2), 31–34 (2016)

    Google Scholar 

  3. Martinez-Martin, E., Del Pobil, A.P.: Object detection and recognition for assistive robots: experimentation and implementation. IEEE Robot. Automat. Magazine 24(3), 123–138 (2017)

    Article  Google Scholar 

  4. Kim, D., Liu, M., Lee, S., Kamat, V.R.: Remote proximity monitoring between mobile construction resources using camera-mounted uavs. Autom. Constr. 99, 168–182 (2019)

    Article  Google Scholar 

  5. Roberts, D., Golparvar-Fard, M.: End-to-end vision-based detection, tracking and activity analysis of earthmoving equipment filmed at ground level. Autom. Constr. 105, 102811 (2019)

    Article  Google Scholar 

  6. Fang, Q., Li, H., Luo, X., Ding, L., Luo, H., Rose, T.M., An, W.: Detecting non-hardhat-use by a deep learning method from far-field surveillance videos. Autom. Constr. 85, 1–9 (2018)

    Article  Google Scholar 

  7. Xuehui, A., Li, Z., Zuguang, L., Chengzhi, W., Pengfei, L., Zhiwei, L.: Dataset and benchmark for detecting moving objects in construction sites. Autom. Constr. 122, 103482 (2021)

    Article  Google Scholar 

  8. Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1958–1970 (2008)

    Article  Google Scholar 

  9. Zou, W.W., Yuen, P.C.: Very low resolution face recognition problem. IEEE Trans. Image Process. 21(1), 327–340 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  10. Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

  11. Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)

  12. Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: Libra r-cnn: Towards balanced learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 821–830 (2019)

  13. Ghiasi, G., Lin, T.-Y., Le, Q.V.: Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7036–7045 (2019)

  14. Fang, H.-S., Sun, J., Wang, R., Gou, M., Li, Y.-L., Lu, C.: Instaboost: Boosting instance segmentation via probability map guided copy-pasting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 682–691 (2019)

  15. Dwibedi, D., Misra, I., Hebert, M.: Cut, paste and learn: Surprisingly easy synthesis for instance detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1301–1310 (2017)

  16. Chen, Y., Li, Y., Kong, T., Qi, L., Chu, R., Li, L., Jia, J.: Scale-aware automatic augmentation for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9563–9572 (2021)

  17. Farhadi, A., Redmon, J.: Yolov3: An incremental improvement. In: Computer Vision and Pattern Recognition, pp. 1804–2767. Springer Berlin/Heidelberg, Germany (2018)

  18. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: Single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37 (2016). Springer

  19. Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: Dssd: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)

  20. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

  21. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

  22. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Info. Process. Syst 28, 91–99 (2015)

    Google Scholar 

  23. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

  24. Guo, C., Fan, B., Zhang, Q., Xiang, S., Pan, C.: Augfpn: Improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12595–12604 (2020)

  25. Singh, B., Najibi, M., Davis, L.S.: Sniper: Efficient multi-scale training. arXiv preprint arXiv:1805.09300 (2018)

  26. Singh, B., Davis, L.S.: An analysis of scale invariance in object detection snip. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3578–3587 (2018)

  27. Fang, H.-S., Sun, J., Wang, R., Gou, M., Li, Y.-L., Lu, C.: Instaboost: Boosting instance segmentation via probability map guided copy-pasting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 682–691 (2019)

  28. Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: Autoaugment: Learning augmentation strategies from data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 113–123 (2019)

  29. Zoph, B., Cubuk, E.D., Ghiasi, G., Lin, T.-Y., Shlens, J., Le, Q.V.: Learning data augmentation strategies for object detection. In: European Conference on Computer Vision, pp. 566–583 (2020). Springer

  30. Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016)

  31. Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search. In: Proceedings of the Aaai Conference on Artificial Intelligence, vol. 33, pp. 4780–4789 (2019)

  32. Glenn Jocher: Yolov5. https://github.com/ultralytics/yolov5, (2021)

  33. Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)

Download references

Acknowledgements

This study was supported in part by the National Natural Science Foundation of China (Nos. 62072024, 41971396, and 61971290), the Research Ability Enhancement Program for Young Teachers of Beijing University of Civil Engineering and Architecture (No. X21024), the Outstanding Youth Program of Beijing University of Civil Engineering and Architecture, the BUCEA Post Graduate Innovation Project, and R &D Program of Beijing Municipal Education Commission(Nos. KM202110016001, KM202210016002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hengyou Wang.

Additional information

Communicated by B-K Bao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, H., Song, Y., Huo, L. et al. Multiscale object detection based on channel and data enhancement at construction sites. Multimedia Systems 29, 49–58 (2023). https://doi.org/10.1007/s00530-022-00983-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-022-00983-x

Keywords

Navigation