[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3543873.3587592acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

DSNet: Efficient Lightweight Model for Video Salient Object Detection for IoT and WoT Applications

Published: 30 April 2023 Publication History

Abstract

The most challenging aspects of deploying deep models in IoT and embedded systems are extensive computational complexity and large training and inference time. Although various lightweight versions of state-of-the-art models are also being designed, maintaining the performance of such models is difficult. To overcome these problems, an efficient, lightweight, Deformable Separable Network (DSNet) is proposed for video salient object detection tasks, mainly for mobile and embedded vision applications. DSNet is equipped with a Deformable Convolution Network (DeCNet), Separable Convolution Network (SCNet), and Depth-wise Attention Response Propagation (DARP) module, which makes it maintain the trade-off between accuracy and latency. The proposed model generates saliency maps considering both the background and foreground simultaneously, making it perform better in unconstrained scenarios (such as partial occlusion, deformable background/objects, and illumination effect). The extensive experiments conducted on six benchmark datasets demonstrate that the proposed model outperforms state-of-art approaches in terms of computational complexity, number of parameters, and latency measures.

References

[1]
Alexey Abramov, Karl Pauwels, Jeremie Papon, Florentin Wörgötter, and Babette Dellen. 2012. Depth-supported real-time video segmentation with the kinect. In 2012 IEEE workshop on the applications of computer vision (WACV), Vol. 36. IEEE, WACV, Breckenridge, USA, 457–464.
[2]
Goutam Bhat, Felix Järemo Lawin, Martin Danelljan, Andreas Robinson, Michael Felsberg, Luc Van Gool, and Radu Timofte. 2020. Learning what to learn for video object segmentation. In European Conference on Computer Vision, Vol. 20. Springer, Virtual Event, 777–794.
[3]
Hong-Bo Bi, Di Lu, Hui-Hui Zhu, Li-Na Yang, and Hua-Ping Guan. 2021. STA-Net: spatial-temporal attention network for video salient object detection. Applied Intelligence 51, 6 (2021), 3450–3459.
[4]
Bowen Chen, Huan Ling, Xiaohui Zeng, Jun Gao, Ziyue Xu, and Sanja Fidler. 2020. Scribblebox: Interactive annotation framework for video object segmentation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16, Vol. 16. Springer, UK, 293–310.
[5]
Chenglizhao Chen, Jia Song, Chong Peng, Guodong Wang, and Yuming Fang. 2021. A novel video salient object detection method via semisupervised motion quality perception. IEEE Transactions on Circuits and Systems for Video Technology 32, 5 (2021), 2732–2745.
[6]
Chenglizhao Chen, Guotao Wang, Chong Peng, Yuming Fang, Dingwen Zhang, and Hong Qin. 2021. Exploring rich and efficient spatial temporal interactions for real-time video salient object detection. IEEE Transactions on Image Processing 30 (2021), 3995–4007.
[7]
Zixuan Chen, Chunchao Guo, Jianhuang Lai, and Xiaohua Xie. 2019. Motion-appearance interactive encoding for object segmentation in unconstrained videos. IEEE Transactions on Circuits and Systems for Video Technology 30, 6 (2019), 1613–1624.
[8]
Ho Kei Cheng and Alexander G Schwing. 2022. XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model. In European Conference on Computer Vision, Vol. 17. Springer, Israel, 640–658.
[9]
Jingchun Cheng, Yi-Hsuan Tsai, Wei-Chih Hung, Shengjin Wang, and Ming-Hsuan Yang. 2018. Fast and accurate online video object segmentation via tracking parts. In Proceedings of the IEEE conference on computer vision and pattern recognition, Vol. 31. IEEE, Long Beach, CA, 7415–7424.
[10]
Jingchun Cheng, Yi-Hsuan Tsai, Shengjin Wang, and Ming-Hsuan Yang. 2017. Segflow: Joint learning for video object segmentation and optical flow. In Proceedings of the IEEE international conference on computer vision, Vol. 30. IEEE, Hawaii, 686–695.
[11]
Runmin Cong, Weiyu Song, Jianjun Lei, Guanghui Yue, Yao Zhao, and Sam Kwong. 2022. PSNet: Parallel symmetric network for video salient object detection. IEEE Transactions on Emerging Topics in Computational Intelligence 5 (2022).
[12]
Mingyu Ding, Zhe Wang, Bolei Zhou, Jianping Shi, Zhiwu Lu, and Ping Luo. 2020. Every frame counts: joint learning of video segmentation and optical flow. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. AAAI Press, New York USA, 10713–10720.
[13]
Deng-Ping Fan, Ge-Peng Ji, Xuebin Qin, and Ming-Ming Cheng. 2021. Cognitive vision inspired object segmentation metric and loss function. SCIENTIA SINICA Informationis 6 (2021).
[14]
Deng-Ping Fan, Wenguan Wang, Ming-Ming Cheng, and Jianbing Shen. 2019. Shifting more attention to video salient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Hawaii, 8554–8564.
[15]
Deng-Ping Fan, Wenguan Wang, Ming-Ming Cheng, and Jianbing Shen. 2019. Shifting More Attention to Video Salient Object Detection. In IEEE CVPR, Vol. 32. IEEE, Long Beach, CA.
[16]
Keren Fu, Irene Yu-Hua Gu, and Jie Yang. 2017. Saliency detection by fully learning a continuous conditional random field. IEEE Transactions on Multimedia 19, 7 (2017), 1531–1544.
[17]
Yuchao Gu, Lijuan Wang, Ziqin Wang, Yun Liu, Ming-Ming Cheng, and Shao-Ping Lu. 2020. Pyramid constrained self-attention network for fast video salient object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. AAAI Press, New York USA, 10869–10876.
[18]
Chao Hu and Liqiang Zhu. 2022. Efficient Unsupervised Video Object Segmentation Network Based on Motion Guidance. arXiv preprint arXiv:2211.05364 10 (2022).
[19]
Yuan-Ting Hu, Jia-Bin Huang, and Alexander G Schwing. 2018. Unsupervised video object segmentation using motion saliency-guided spatio-temporal propagation. In Proceedings of the European conference on computer vision (ECCV). IEEE, Germany, 786–802.
[20]
Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. arXiv preprint arXiv:1602.07360 34 (2016).
[21]
Ge-Peng Ji, Yu-Cheng Chou, Deng-Ping Fan, Geng Chen, Huazhu Fu, Debesh Jha, and Ling Shao. 2021. Progressively Normalized Self-Attention Network for Video Polyp Segmentation. arXiv preprint arXiv:2105.08468 24 (2021).
[22]
Ge-Peng Ji, Keren Fu, Zhe Wu, Deng-Ping Fan, Jianbing Shen, and Ling Shao. 2021. Full-duplex strategy for video object segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Vol. 30. IEEE, ICCV, China, 4922–4933.
[23]
Yuzhu Ji, Haijun Zhang, Zequn Jie, Lin Ma, and QM Jonathan Wu. 2020. CASNet: A cross-attention siamese network for video salient object detection. IEEE transactions on neural networks and learning systems 32, 6 (2020), 2676–2690.
[24]
Joakim Johnander, Martin Danelljan, Emil Brissman, Fahad Shahbaz Khan, and Michael Felsberg. 2019. A generative appearance model for end-to-end video object segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vol. 32. IEEE, Long Beach, CA, 8953–8962.
[25]
Ivan Khokhlov, Egor Davydenko, Ilya Osokin, Ilya Ryakin, Azer Babaev, Vladimir Litvinenko, and Roman Gorbachev. 2020. Tiny-YOLO object detection supplemented with geometrical data. In 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring), Vol. 91. IEEE, Virtual Event, 1–5.
[26]
Gongyang Li, Zhi Liu, Zhen Bai, Weisi Lin, and Haibin Ling. 2022. Lightweight salient object detection in optical remote sensing images via feature correlation. IEEE Transactions on Geoscience and Remote Sensing 60 (2022), 1–12.
[27]
Yuxi Li, Jiuwei Li, Weiyao Lin, and Jianguo Li. 2018. Tiny-DSOD: Lightweight object detection for resource-restricted usages. arXiv preprint arXiv:1807.11013 29 (2018).
[28]
Yu Li, Zhuoran Shen, and Ying Shan. 2020. Fast video object segmentation using the global context module. In European Conference on Computer Vision, Vol. 20. Springer, Virtual Event, 735–750.
[29]
Yun Liu, Xin-Yu Zhang, Jia-Wang Bian, Le Zhang, and Ming-Ming Cheng. 2021. SAMNet: Stereoscopically attentive multi-scale network for lightweight salient object detection. IEEE Transactions on Image Processing 30 (2021), 3804–3814.
[30]
Yukang Lu, Dingyao Min, Keren Fu, and Qijun Zhao. 2022. Depth-Cooperated Trimodal Network for Video Salient Object Detection. arXiv preprint arXiv:2202.06060 29 (2022), 25.
[31]
Will Maddern, Geoffrey Pascoe, Chris Linegar, and Paul Newman. 2017. 1 year, 1000 km: The oxford robotcar dataset. The International Journal of Robotics Research 36, 1 (2017), 3–15.
[32]
Jianbiao Mei, Mengmeng Wang, Yeneng Lin, Yi Yuan, and Yong Liu. 2021. Transvos: Video object segmentation with transformers. arXiv preprint arXiv:2106.00588 43 (2021).
[33]
Jing Meng, Ping Jiang, Jianmin Wang, and Kai Wang. 2022. A MobileNet-SSD Model with FPN for Waste Detection. Journal of Electrical Engineering & Technology 17 (2022), 1425–1431.
[34]
Seoung Wug Oh, Joon-Young Lee, Kalyan Sunkavalli, and Seon Joo Kim. 2018. Fast video object segmentation by reference-guided mask propagation. In Proceedings of the IEEE conference on computer vision and pattern recognition, Vol. 31. IEEE, Salt Lake City, 7376–7385.
[35]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019), 8026–8037.
[36]
Qinmu Peng and Yiu-Ming Cheung. 2019. Automatic video object segmentation based on visual and motion saliency. IEEE Transactions on Multimedia 21, 12 (2019), 3083–3094.
[37]
Federico Perazzi, Anna Khoreva, Rodrigo Benenson, Bernt Schiele, and Alexander Sorkine-Hornung. 2017. Learning video object segmentation from static images. In Proceedings of the IEEE conference on computer vision and pattern recognition, Vol. 29. IEEE, Hawaii, 2663–2672.
[38]
Sucheng Ren, Chu Han, Xin Yang, Guoqiang Han, and Shengfeng He. 2020. Tenet: Triple excitation network for video salient object detection. In European Conference on Computer Vision, Vol. 16. Springer, China, 212–228.
[39]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, Vol. 31. IEEE, CVPR, Salt Lake City, 4510–4520.
[40]
Seonguk Seo, Joon-Young Lee, and Bohyung Han. 2020. Urvos: Unified referring video object segmentation network with a large-scale benchmark. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV 16, Vol. 16. Springer, Virtual Event, 208–223.
[41]
Mennatullah Siam, Chen Jiang, Steven Lu, Laura Petrich, Mahmoud Gamal, Mohamed Elhoseiny, and Martin Jagersand. 2019. Video object segmentation using teacher-student adaptation in a human robot interaction (hri) setting. In 2019 International Conference on Robotics and Automation (ICRA). IEEE, Canada, 50–56.
[42]
Hongmei Song, Wenguan Wang, Sanyuan Zhao, Jianbing Shen, and Kin-Man Lam. 2018. Pyramid dilated deeper convlstm for video salient object detection. In Proceedings of the European conference on computer vision (ECCV), Vol. 18. IEEE, Germany, 715–731.
[43]
Yi Tang, Wenbin Zou, Yang Hua, Zhi Jin, and Xia Li. 2020. Video salient object detection via spatiotemporal attention neural networks. Neurocomputing 377 (2020), 27–37.
[44]
Xin Tian, Ke Xu, Xin Yang, Lin Du, Baocai Yin, and Rynson WH Lau. 2022. Bi-Directional Object-Context Prioritization Learning for Saliency Ranking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vol. 35. IEEE, New Orleans, Louisiana, 5882–5891.
[45]
Pavel Tokmakov, Karteek Alahari, and Cordelia Schmid. 2017. Learning video object segmentation with visual memory. In Proceedings of the IEEE International Conference on Computer Vision, Vol. 30. IEEE, Hawaii, 4481–4490.
[46]
Robert J Wang, Xiang Li, and Charles X Ling. 2018. Pelee: A real-time object detection system on mobile devices. Advances in neural information processing systems 31 (2018).
[47]
Wenguan Wang, Jianbing Shen, Xiankai Lu, Steven CH Hoi, and Haibin Ling. 2020. Paying attention to video object pattern understanding. IEEE transactions on pattern analysis and machine intelligence 43 (2020), 2413 – 2428.
[48]
Ziyang Wang, Junxia Li, and Zefeng Pan. 2020. Cross Complementary Fusion Network for Video Salient Object Detection. IEEE Access 8 (2020), 201259–201270.
[49]
Zixu Wang, Yujie Zhong, Yishu Miao, Lin Ma, and Lucia Specia. 2022. Contrastive Video-Language Learning with Fine-grained Frame Sampling. arXiv preprint arXiv:2210.05039 2 (2022).
[50]
Chenchu Xu, Zhifan Gao, Heye Zhang, Shuo Li, and Victor Hugo C de Albuquerque. 2021. Video salient object detection using dual-stream spatiotemporal attention. Applied Soft Computing 108 (2021), 107433.
[51]
Mingzhu Xu, Ping Fu, Bing Liu, and Junbao Li. 2021. Multi-Stream Attention-Aware Graph Convolution Network for Video Salient Object Detection. IEEE Transactions on Image Processing 30 (2021), 4183–4197.
[52]
Pengxiang Yan, Guanbin Li, Yuan Xie, Zhen Li, Chuan Wang, Tianshui Chen, and Liang Lin. 2019. Semi-supervised video salient object detection using pseudo-labels. In Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, Hawaii, 7284–7293.
[53]
Chenglin Yang, Yilin Wang, Jianming Zhang, He Zhang, Zijun Wei, Zhe Lin, and Alan Yuille. 2022. Lite vision transformer with enhanced self-attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vol. 36. IEEE CVPR, New Orleans, Louisiana, 11998–12008.
[54]
Zongxin Yang and Yi Yang. 2022. Decoupling Features in Hierarchical Propagation for Video Object Segmentation. arXiv preprint arXiv:2210.09782 36 (2022).
[55]
Kaihua Zhang, Long Wang, Dong Liu, Bo Liu, Qingshan Liu, and Zhu Li. 2020. Dual Temporal Memory Network for Efficient Video Object Segmentation. In Proceedings of the 28th ACM International Conference on Multimedia, Vol. 28. ACM MM, Seattle WA USA, 1515–1523.
[56]
Kaihua Zhang, Zicheng Zhao, Dong Liu, Qingshan Liu, and Bo Liu. 2021. Deep transport network for unsupervised video object segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Vol. 15. IEEE ICCV, China, 8781–8790.
[57]
Miao Zhang, Jie Liu, Yifei Wang, Yongri Piao, Shunyu Yao, Wei Ji, Jingjing Li, Huchuan Lu, and Zhongxuan Luo. 2021. Dynamic context-sensitive filtering network for video salient object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, Virtual Event, 1553–1563.
[58]
Wenbo Zhang, Keren Fu, Zhuo Wang, Ge-Peng Ji, and Qijun Zhao. 2022. Depth Quality-Inspired Feature Manipulation for Efficient RGB-D and Video Salient Object Detection. arXiv preprint arXiv:2208.03918 30, 10 (2022).
[59]
Tianfei Zhou, Shunzhou Wang, Yi Zhou, Yazhou Yao, Jianwu Li, and Ling Shao. 2020. Motion-attentive transition for zero-shot video object segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. AAAI Press, New York USA, 13066–13073.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023
April 2023
1567 pages
ISBN:9781450394192
DOI:10.1145/3543873
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 April 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Deformable convolution network
  2. depth-wise quantization network
  3. object detection
  4. separable convolution networks

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

WWW '23
Sponsor:
WWW '23: The ACM Web Conference 2023
April 30 - May 4, 2023
TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 157
    Total Downloads
  • Downloads (Last 12 months)68
  • Downloads (Last 6 weeks)9
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media