Abstract
Deep learning has achieved great successes in performing many visual recognition tasks including object detection. Nevertheless, existing deep networks are computationally expensive and memory intensive, hindering their deployment in resource-constrained environments, such as mobile or embedded devices that are widely used by city travellers. Recently, estimating city-level travel patterns using street imagery has shown to be a potentially valid way according to a case study with Google Street View (GSV), addressing a critical challenge in transport object detection. This paper presents a compressed deep network using tensor decomposition to detect transport objects in GSV images, which is sustainable and eco-friendly. In particular, a new dataset named Transport Mode Share-Tokyo (TMS-Tokyo) is created to serve the public for transport object detection. This is based on the selection and filtering of 32,555 acquired images that involve 50,827 visible transport objects (including cars, pedestrians, buses, trucks, motors, vans, cyclists and parked bicycles) from the GSV imagery of Tokyo. Then a compressed convolutional neural network (termed SVDet) is proposed for street view object detection via tensor train decomposition on a given baseline detector. Experimental results conducted on the TMS-Tokyo dataset demonstrate that SVDet can achieve promising performance in comparison with conventional deep detection networks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Anguelov, D., et al.: Google street view: capturing the world at street level. Computer 43(6), 32–38 (2010). https://doi.org/10.1109/MC.2010.170
Bochkovskiy, A., et al.: YOLOv4: optimal speed and accuracy of object detection. arXiv:2004.10934 [cs, eess]. (2020)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. arXiv:1712.00726 [cs]. (2017)
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. arXiv:1604.01685 [cs]. (2016)
Dai, J., et al.: R-FCN: object detection via region-based fully convolutional networks. arXiv (2016). https://doi.org/10.48550/arXiv.1605.06409
Denil, M., et al.: Predicting parameters in deep learning. arXiv:1306.0543 [cs, stat]. (2014)
Denton, E., et al.: Exploiting linear structure within convolutional networks for efficient evaluation. arXiv:1404.0736 [cs] (2014)
Geiger, A., et al.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012). https://doi.org/10.1109/CVPR.2012.6248074
Ghiasi, G., et al.: NAS-FPN: learning scalable feature pyramid architecture for object detection. arXiv:1904.07392 [cs]. (2019)
Girshick, R., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv:1311.2524 [cs]. (2014)
Goel, R., et al.: Estimating city-level travel patterns using street imagery: a case study of using Google street view in Britain. Plos One 13(5), e0196521 (2018). https://doi.org/10.1371/journal.pone.0196521
Grimsrud, M., El-Geneidy, A.: Transit to eternal youth: lifecycle and generational trends in greater Montreal public transport mode share. Transportation 41(1), 1–19 (2014). https://doi.org/10.1007/s11116-013-9454-9
He, K., et al.: Deep residual learning for image recognition. arXiv:1512.03385 [cs]. (2015)
Huang, G., et al.: Densely connected convolutional networks. arXiv (2018). https://doi.org/10.48550/arXiv.1608.06993
Jaderberg, M., et al.: Speeding up convolutional neural networks with low rank expansions. arXiv:1405.3866 [cs]. (2014)
Lebedev, V., et al.: Speeding-up convolutional neural networks using fine-tuned CP-decomposition. arXiv:1412.6553 [cs]. (2015)
Lin, T.-Y., et al.: Feature Pyramid networks for object detection. arXiv:1612.03144 [cs]. (2017)
Lin, T.-Y., et al.: Focal loss for dense object detection. arXiv:1708.02002 [cs]. (2018)
Liu, S., et al.: Path aggregation network for instance segmentation. arXiv (2018). https://doi.org/10.48550/arXiv.1803.01534
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. Lecture Notes in Computer Science(), vol. 9905, pp. 21–37. Springer, Cham arXiv:1512.02325 [cs]. (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Lu, Y., et al.: Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1131–1140 (2017). https://doi.org/10.1109/CVPR.2017.126
Mueller, N., et al.: Health impacts related to urban and transport planning: a burden of disease assessment. Environ. Int. 107, 243–257 (2017). https://doi.org/10.1016/j.envint.2017.07.020
Neuhold, G., et al.: The Mapillary vistas dataset for semantic understanding of street scenes. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5000–5009 (2017). https://doi.org/10.1109/ICCV.2017.534
Pang, J., et al.: Libra R-CNN: towards balanced learning for object detection. arXiv:1904.02701 [cs]. (2019)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv:1804.02767 [cs]. (2018)
Ren, S., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. arXiv:1506.01497 [cs]. (2016)
Rigamonti, R., et al.: Learning separable filters. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2754–2761 (2013). https://doi.org/10.1109/CVPR.2013.355
Sainath, T.N., et al.: Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6655–6659 (2013). https://doi.org/10.1109/ICASSP.2013.6638949
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv (2015). https://doi.org/10.48550/arXiv.1409.1556
Tai, C., et al.: Convolutional neural networks with low-rank regularization. arXiv:1511.06067 [cs, stat]. (2016)
Tan, M., et al.: EfficientDet: scalable and efficient object detection. arXiv (2020). https://doi.org/10.48550/arXiv.1911.09070
Tian, Z., et al.: FCOS: fully convolutional one-stage object detection (2019)
Li,X., et al.: A new benchmark for vision-based cyclist detection. In: 2016 IEEE Intelligent Vehicles Symposium (IV), pp. 1028–1033 (2016). https://doi.org/10.1109/IVS.2016.7535515
Yu, F., et al.: BDD100K: a diverse driving dataset for heterogeneous multitask learning. arXiv:1805.04687 [cs]. (2020)
Acknowledgements
This work is supported in part by the Strategic Partner Acceleration Award (80761-AU201), funded under the Ser Cymru II programme, UK. The first author is supported with a full International PhD Scholarship awarded by Aberystwyth University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bai, Y., Shang, C., Li, Y., Shen, L., Zeng, X., Shen, Q. (2024). Transport Object Detection in Street View Imagery Using Decomposed Convolutional Neural Networks. In: Panoutsos, G., Mahfouf, M., Mihaylova, L.S. (eds) Advances in Computational Intelligence Systems. UKCI 2022. Advances in Intelligent Systems and Computing, vol 1454. Springer, Cham. https://doi.org/10.1007/978-3-031-55568-8_34
Download citation
DOI: https://doi.org/10.1007/978-3-031-55568-8_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-55567-1
Online ISBN: 978-3-031-55568-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)