[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3343031.3351051acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Aberrance-aware Gradient-sensitive Attentions for Scene Recognition with RGB-D Videos

Published: 15 October 2019 Publication History

Abstract

With the developments of deep learning, previous approaches have made successes in scene recognition with massive RGB data obtained from the ideal environments. However, scene recognition in real world may face various types of aberrant conditions caused by different unavoidable factors, such as the lighting variance of the environments and the limitations of cameras, which may damage the performance of previous models. In addition to ideal conditions, our motivation is to investigate researches on robust scene recognition models for unconstrained environments. In this paper, we propose an aberrance-aware framework for RGB-D scene recognition, where several types of attentions, such as temporal, spatial and modal attentions are integrated to spatio-temporal RGB-D CNN models to avoid the interference of RGB frame blurring, depth missing, and light variance. All the attentions are homogeneously obtained by projecting the gradient-sensitive maps of visual data into corresponding spaces. Particularly, the gradient maps are captured with the convolutional operations with the typically designed kernels, which can be seamlessly integrated into end-to-end CNN training. The experiments under different challenging conditions demonstrate the effectiveness of the proposed method.

References

[1]
Dan Banica and Cristian Sminchisescu. 2015. Second-Order Constrained Parametric Proposals and Sequential Search-Based Structured Prediction for Semantic Segmentation in RGB-D Images. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .
[2]
Jo a o Carreira and Andrew Zisserman. 2017. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21--26, 2017. 4724--4733.
[3]
Konstantinos G. Derpanis, Matthieu Lecce, Kostas Daniilidis, and Richard P. Wildes. 2012. Dynamic scene understanding: The role of orientation features in space and time in scene classification. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .
[4]
Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Trevor Darrell, and Kate Saenko. 2015. Long-term recurrent convolutional networks for visual recognition and description. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7--12, 2015. 2625--2634.
[5]
Li Fei-Fei, Asha Iyer, Christof Koch, and Pietro Perona. 2007. What do we perceive in a glance of a real-world scene? Journal of Vision, Vol. 7, 1 (2007), 10. https://doi.org/10.1167/7.1.10
[6]
Christoph Feichtenhofer, Axel Pinz, and Richard Wildes. 2016. Spatiotemporal Residual Networks for Video Action Recognition. Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 3468--3476. http://papers.nips.cc/paper/6433-spatiotemporal-residual-networks-for-video-action-recognition.pdf
[7]
Christoph Feichtenhofer, Axel Pinz, and Richard P. Wildes. 2017. Temporal Residual Networks for Dynamic Scene Recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .
[8]
Saurabh Gupta, Pablo Arbelaez, Ross Girshick, and Jitendra Malik. 2014. Indoor scene understanding with RGB-D images: Bottom-up segmentation, object detection and semantic segmentation. Int J Comput Vis, Vol. 112 (2014), 133--149.
[9]
Saurabh Gupta, Pablo Arbeláez, Ross Girshick, and Jitendra Malik. 2015. Indoor Scene Understanding with RGB-D Images: Bottom-up Segmentation, Object Detection and Semantic Segmentation. International Journal of Computer Vision, Vol. 112, 2 (2015), 133--149. https://doi.org/10.1007/s11263-014-0777--6
[10]
Saurabh Gupta, Judy Hoffman, and Jitendra Malik. 2016. Cross Modal Distillation for Supervision Transfer. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .
[11]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .
[12]
Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2013. 3D Convolutional Neural Networks for Human Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 35, 1 (2013), 221--231.
[13]
Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Fei-Fei Li. 2014. Large-Scale Video Classification with Convolutional Neural Networks. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23--28, 2014. 1725--1732.
[14]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In NIPS . 1106--1114.
[15]
S. Lazebnik, C. Schmid, and J. Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR .
[16]
Fei Fei Li, Rufin VanRullen, Christof Koch, and Pietro Perona. 2002. Rapid natural scene categorization in the near absence of attention. Proceedings of the National Academy of Sciences, Vol. 99, 14 (2002), 9596--9601. https://doi.org/10.1073/pnas.092277599
[17]
A. Quattoni and A. Torralba. 2009. Recognizing indoor scenes. In CVPR .
[18]
N. Shroff, P. Turaga, and R. Chellappa. 2010. Moving vistas: Exploiting motion for describing scenes. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 1911--1918. https://doi.org/10.1109/CVPR.2010.5539864
[19]
Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor Segmentation and Support Inference from RGBD Images. In Proceedings of the 12th European Conference on Computer Vision - Volume Part V (ECCV'12). Springer-Verlag, Berlin, Heidelberg, 746--760. https://doi.org/10.1007/978--3--642--33715--4_54
[20]
Karen Simonyan and Andrew Zisserman. 2014. Two-Stream Convolutional Networks for Action Recognition in Videos. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8--13 2014, Montreal, Quebec, Canada . 568--576.
[21]
K. Simonyan and A. Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations .
[22]
R. Socher, B. Huval, B. Bath, C. D. Manning, and A. Y. Ng. 2012. Convolutional-recursive deep learning for 3D object classification. In NIPS .
[23]
Shuran Song, S. P. Lichtenberg, and Jianxiong Xiao. 2015. SUN RGB-D: A RGB-D scene understanding benchmark suite. In Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on. 567--576. https://doi.org/10.1109/CVPR.2015.7298655
[24]
Xinhang Song, Luis Herranz, and Shuqiang Jiang. 2017a. Depth CNNs for RGB-D Scene Recognition: Learning from Scratch Better than Transferring from RGB-CNNs. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4--9, 2017, San Francisco, California, USA. 4271--4277.
[25]
Xinhang Song, Shuqiang Jiang, and Luis Herranz. 2017b. Combining Models from Multiple Sources for RGB-D Scene Recognition. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17. 4523--4529.
[26]
X. Song, S. Jiang, L. Herranz, and C. Chen. 2019. Learning Effective RGB-D Representations for Scene Recognition. IEEE Transactions on Image Processing, Vol. 28, 2 (Feb 2019), 980--993. https://doi.org/10.1109/TIP.2018.2872629
[27]
S. Thorpe, D. Fize, and C. Marlot. 1996. Speed of processing in the human visual system. Nature, Vol. 381 (June 1996), 520--522. https://doi.org/10.1038/381520a0
[28]
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning Spatiotemporal Features With 3D Convolutional Networks. In The IEEE International Conference on Computer Vision (ICCV) .
[29]
Anran Wang, Jianfei Cai, Jiwen Lu, and Tat-Jen Cham. 2016. Modality and Component Aware Feature Fusion For RGB-D Scene Classification. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .
[30]
Chuan Wang, Haibin Huang, Xiaoguang Han, and Jue Wang. 2019. Video Inpainting by Jointly Learning Temporal Structure and Spatial Details. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019. 5232--5239. https://aaai.org/ojs/index.php/AAAI/article/view/4458
[31]
Chuan Wang, Jie Zhu, Yanwen Guo, and Wenping Wang. 2017. Video Vectorization via Tetrahedral Remeshing. IEEE Trans. Image Processing, Vol. 26, 4 (2017), 1833--1844. https://doi.org/10.1109/TIP.2017.2666742
[32]
J. Xiao, J. Hayes, K. Ehringer, A. Olivia, and A. Torralba. 2010. SUN database: Largescale scene recognition from Abbey to Zoo. In CVPR .
[33]
Jianxiong Xiao, A. Owens, and A. Torralba. 2013. SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels. In Computer Vision (ICCV), 2013 IEEE International Conference on. 1625--1632. https://doi.org/10.1109/ICCV.2013.458
[34]
Yang Xiao, Jianxin Wu, and Junsong Yuan. 2014. mCENTRIST: A Multi-Channel Feature Generation Mechanism for Scene Categorization. IEEE Trans. on Image Process., Vol. 23, 2 (Feb 2014), 823--836. https://doi.org/10.1109/TIP.2013.2295756
[35]
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Antonio Torralba, and Aude Oliva. 2018. Places: An Image Database for Deep Scene Understanding. IEEE Trans. on Pattern Anal. and Mach. Intell. (Accepted) (2018).
[36]
Hongyuan Zhu, Jean-Baptiste Weibel, and Shijian Lu. 2016. Discriminative Multi-Modal Feature Fusion for RGBD Indoor Scene Recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .

Cited By

View all
  • (2022)Response Generation by Jointly Modeling Personalized Linguistic Styles and EmotionsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/347587218:2(1-20)Online publication date: 16-Feb-2022
  • (2021)Depth Privileged Scene Recognition via Dual Attention HallucinationIEEE Transactions on Image Processing10.1109/TIP.2021.312295530(9164-9178)Online publication date: 2021
  • (2021)RGB-D scene analysis in the NICUComputers in Biology and Medicine10.1016/j.compbiomed.2021.104873138(104873)Online publication date: Nov-2021
  • Show More Cited By

Index Terms

  1. Aberrance-aware Gradient-sensitive Attentions for Scene Recognition with RGB-D Videos

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '19: Proceedings of the 27th ACM International Conference on Multimedia
      October 2019
      2794 pages
      ISBN:9781450368896
      DOI:10.1145/3343031
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 October 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. attention
      2. challenging conditions
      3. gradient-sensitive
      4. rgb-d video
      5. scene recognition

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      MM '19
      Sponsor:

      Acceptance Rates

      MM '19 Paper Acceptance Rate 252 of 936 submissions, 27%;
      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)10
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 06 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Response Generation by Jointly Modeling Personalized Linguistic Styles and EmotionsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/347587218:2(1-20)Online publication date: 16-Feb-2022
      • (2021)Depth Privileged Scene Recognition via Dual Attention HallucinationIEEE Transactions on Image Processing10.1109/TIP.2021.312295530(9164-9178)Online publication date: 2021
      • (2021)RGB-D scene analysis in the NICUComputers in Biology and Medicine10.1016/j.compbiomed.2021.104873138(104873)Online publication date: Nov-2021
      • (2021)RGB-D Co-attention Network for Semantic SegmentationComputer Vision – ACCV 202010.1007/978-3-030-69525-5_31(519-536)Online publication date: 27-Feb-2021
      • (2020)A part-based spatial and temporal aggregation method for dynamic scene recognitionNeural Computing and Applications10.1007/s00521-020-05415-3Online publication date: 19-Oct-2020

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media