[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Recognizing multi-view objects with occlusions using a deep architecture

Published: 01 November 2015 Publication History

Abstract

Image-based object recognition is employed widely in many computer vision applications such as image semantic annotation and object location. However, traditional object recognition algorithms based on the 2D features of RGB data have difficulty when objects overlap and image occlusion occurs. At present, RGB-D cameras are being used more widely and the RGB-D depth data can provide auxiliary information to address these challenges. In this study, we propose a deep learning approach for the efficient recognition of 3D objects with occlusion. First, this approach constructs a multi-view shape model based on 3D objects by using an encode-decode deep learning network to represent the features. Next, 3D object recognition in indoor scenes is performed using random forests. The application of deep learning to RGB-D data is beneficial for recovering missing information due to image occlusion. Our experimental results demonstrate that this approach can significantly improve the efficiency of feature representation and the performance of object recognition with occlusion.

References

[1]
I. Arel, D. Rose, T. Karnowski, Deep machine learning-a new frontier in artificial intelligence research research frontier}, Comput. Intell. Mag., 5 (2010) 13-18.
[2]
Y. Bengio, Y. LeCun, Scaling learning algorithms towards ai, Large-Scale Kernel Mach., 34 (2007) 1-41.
[3]
T. Binford, Visual perception by computer, in: Proceedings of IEEE Conference on Systems and Control, 1971, p. 262.
[4]
R. Brooks, Symbolic reasoning among 3-d models and 2-d images, Artif. Intell., 17 (1981) 285-348.
[5]
G. Csurka, C. Dance, L. Fan, J. Willamowski, C. Bray, Visual categorization with bags of keypoints, in: Workshop on Statistical Learning in Computer Vision, ECCV, 2004, pp. 1-2.
[6]
S. Fidler, S. Dickinson, R. Urtasun, 3d object detection and viewpoint estimation with a deformable 3d cuboid model, in: Advances in Neural Information Processing Systems, 2012, pp. 611-619.
[7]
P. Flynn, A. Jain, 3d object recognition using invariant feature indexing of interpretation tables, CVGIP: Image Underst., 55 (1992) 119-129.
[8]
Y. Gao, M. Wang, R. Ji, X. Wu, Q. Dai, 3d object retrieval with hausdorff distance learning, IEEE Trans. Ind. Electron. (2014).
[9]
Y. Gao, M. Wang, R. Ji, Z. Zha, J. Shen, k-partite graph reinforcement and its application in multimedia information retrieval, Inform. Sci., 194 (2012) 224-239.
[10]
Y. Gao, M. Wang, D. Tao, R. Ji, Q. Dai, 3d object retrieval and recognition with hypergraph analysis, IEEE Trans. Image Process., 21 (2012) 4290-4303.
[11]
A. Hanson, Computer Vision Systems, Academic Press, 1977.
[12]
G. Hinton, S. Osindero, Y. Teh, A fast learning algorithm for deep belief nets, Neural Comput., 18 (2006) 1527-1554.
[13]
M. Hofmann, D. Gavrila, Multi-view 3d human pose estimation in complex environment, Int. J. Comput. Vision, 96 (2012) 103-124.
[14]
S. Ji, W. Xu, M. Yang, K. Yu, 3d convolutional neural networks for human action recognition, IEEE Trans. Pattern Anal. Mach. Intell., 35 (2013) 221-231.
[15]
T. Karnowski, I. Arel, D. Rose, Deep spatiotemporal feature learning with application to image classification, in: Proceedings of 9th International Conference on Machine Learning and Applications, IEEE, 2010, pp. 883-888.
[16]
Y. LeCun, Y. Bengio, Convolutional networks for images, speech, and time series, in: The Handbook of Brain Theory and Neural Networks, MIT Press, 1998, pp. 255-258.
[17]
H. Lee, R. Grosse, R. Ranganath, A. Ng, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, in: Proceedings of the 26th Annual International Conference on Machine Learning, ACM, 2009, pp. 609-616.
[18]
J. Li, W. Huang, L. Shao, N. Allinson, Building recognition in urban environments: a survey of state-of-the-art and future challenges, Inform. Sci., 277 (2014) 406-420.
[19]
P. Li, M. Wang, J. Cheng, C. Xu, H. Lu, Spectral hashing with semantically consistent graph for image indexing, IEEE Trans. Multimedia, 15 (2013) 141-152.
[20]
R. Li, T. Zickler, Discriminative virtual views for cross-view action recognition, in: IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2012, pp. 2855-2862.
[21]
K. Lu, Q. Wang, J. Xue, W. Pan, 3d model retrieval and classification by semi-supervised learning with content-based similarity, Inform. Sci. (2014).
[22]
K. Mikolajczyk, C. Schmid, An affine invariant interest point detector, in: Computer Vision, Springer, 2002, pp. 128-142.
[23]
L. Nie, M. Wang, Y. Gao, Z. Zha, T. Chua, Beyond text QA: multimedia answer generation by harvesting web information, IEEE Trans. Multimedia, 15 (2013) 426-441.
[24]
D. Pelli, B. Farell, D. Moore, The remarkable inefficiency of word recognition, Nature, 423 (2003) 752-756.
[25]
S. Peng, D. Kim, S. Lee, C. Chung, A visual shape descriptor using sectors and shape context of contour lines, Inform. Sci., 180 (2010) 2925-2939.
[26]
T. Pham, Pattern recognition by active visual information processing in birds, Inform. Sci., 270 (2014) 134-142.
[27]
R. Rao, D. Ballard, An active vision architecture based on iconic representations, Artif. Intell., 78 (1995) 461-505.
[28]
J. Schwartz, M. Sharir, Identification of partially obscured objects in two and three dimensions by matching noisy characteristic curves, Int. J. Robot. Res., 6 (1987) 29-44.
[29]
T. Shao, W. Xu, K. Zhou, J. Wang, D. Li, B. Guo, An interactive approach to semantic modeling of indoor scenes with an rgbd camera, ACM Trans. Graph., 31 (2012) 136.
[30]
P. Simard, D. Steinkraus, J. Platt, Best practices for convolutional neural networks applied to visual document analysis, in: Proceedings of 7th International Conference on Document Analysis and Recognition, IEEE, 2003, pp. 958-963.
[31]
J. Sivic, A. Zisserman, Video Google: a text retrieval approach to object matching in videos, in: Proceedings of 9th IEEE International Conference on Computer Vision, IEEE, 2003, pp. 1470-1477.
[32]
M. Song, C. Chen, J. Bu, T. Sha, Image-based facial sketch-to-photo synthesis via online coupled dictionary learning, Inform. Sci., 193 (2012) 233-246.
[33]
M. Wang, Y. Gao, K. Lu, Y. Rui, View-based discriminative probabilistic modeling for 3d object retrieval and recognition, IEEE Trans. Image Process., 22 (2013) 1395-1407.
[34]
M. Wang, X. Hua, R. Hong, J. Tang, G. Qi, Y. Song, Unified video annotation via multigraph learning, IEEE Trans. Circ. Syst. Video Technol., 19 (2009) 733-746.
[35]
M. Wang, H. Li, D. Tao, K. Lu, X. Wu, Multimodal graph-based reranking for web image search, IEEE Trans. Image Process., 21 (2012) 4649-4661.
[36]
M. Wang, B. Ni, X. Hua, T. Chua, Assistive tagging: a survey of multimedia tagging with human-computer joint exploration, ACM Comput. Surv., 44 (2012) 25.
[37]
Z. Wang, P. Cui, F. Li, E. Chang, S. Yang, A data-driven study of image feature extraction and fusion, Inform. Sci. (2014).
[38]
Y. Xia, X. Ren, Z. Peng, J. Zhang, L. She, Effectively identifying the influential spreaders in large-scale social networks, Multimedia Tools Appl. (2014) 1-13.
[39]
Y. Xia, Y. Zhou, Synchronization induced by disorder of phase directions, Int. J. Mod. Phys. C, 25 (2014).
[40]
M. Zerroug, R. Nevatia, Three-dimensional descriptions based on the analysis of the invariant and quasi-invariant properties of some curved-axis generalized cylinders, IEEE Trans. Pattern Anal. Mach. Intell., 18 (1996) 237-253.
[41]
L. Zhang, Y. Gao, Y. Xia, K. Lu, J. Shen, R. Ji, Representative discovery of structure cues for weakly-supervised image segmentation, IEEE Trans. Multimedia, 16 (2014) 470-479.

Cited By

View all
  • (2019)3D object recognition and classification: a systematic literature reviewPattern Analysis & Applications10.1007/s10044-019-00804-422:4(1243-1292)Online publication date: 1-Nov-2019
  • (2016)Towards robust subspace recovery via sparsity-constrained latent low-rank representationJournal of Visual Communication and Image Representation10.1016/j.jvcir.2015.06.01237:C(46-52)Online publication date: 1-May-2016
  • (2016)Local receptive field constrained deep networksInformation Sciences: an International Journal10.1016/j.ins.2016.02.034349:C(229-247)Online publication date: 1-Jul-2016
  1. Recognizing multi-view objects with occlusions using a deep architecture

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Information Sciences: an International Journal
      Information Sciences: an International Journal  Volume 320, Issue C
      November 2015
      443 pages

      Publisher

      Elsevier Science Inc.

      United States

      Publication History

      Published: 01 November 2015

      Author Tags

      1. Deep learning
      2. Depth data
      3. Multi-view
      4. Object recognition
      5. RGB-D

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 11 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2019)3D object recognition and classification: a systematic literature reviewPattern Analysis & Applications10.1007/s10044-019-00804-422:4(1243-1292)Online publication date: 1-Nov-2019
      • (2016)Towards robust subspace recovery via sparsity-constrained latent low-rank representationJournal of Visual Communication and Image Representation10.1016/j.jvcir.2015.06.01237:C(46-52)Online publication date: 1-May-2016
      • (2016)Local receptive field constrained deep networksInformation Sciences: an International Journal10.1016/j.ins.2016.02.034349:C(229-247)Online publication date: 1-Jul-2016

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media