More Web Proxy on the site http://driver.im/

research-article

Recognizing multi-view objects with occlusions using a deep architecture

Authors:

Yuncai LiuAuthors Info & Claims

Information Sciences—Informatics and Computer Science, Intelligent Systems, Applications: An International Journal, Volume 320, Issue C

Pages 333 - 345

https://doi.org/10.1016/j.ins.2015.01.038

Published: 01 November 2015 Publication History

Abstract

Image-based object recognition is employed widely in many computer vision applications such as image semantic annotation and object location. However, traditional object recognition algorithms based on the 2D features of RGB data have difficulty when objects overlap and image occlusion occurs. At present, RGB-D cameras are being used more widely and the RGB-D depth data can provide auxiliary information to address these challenges. In this study, we propose a deep learning approach for the efficient recognition of 3D objects with occlusion. First, this approach constructs a multi-view shape model based on 3D objects by using an encode-decode deep learning network to represent the features. Next, 3D object recognition in indoor scenes is performed using random forests. The application of deep learning to RGB-D data is beneficial for recovering missing information due to image occlusion. Our experimental results demonstrate that this approach can significantly improve the efficiency of feature representation and the performance of object recognition with occlusion.

References

[1]

I. Arel, D. Rose, T. Karnowski, Deep machine learning-a new frontier in artificial intelligence research research frontier}, Comput. Intell. Mag., 5 (2010) 13-18.

Digital Library

[2]

Y. Bengio, Y. LeCun, Scaling learning algorithms towards ai, Large-Scale Kernel Mach., 34 (2007) 1-41.

[3]

T. Binford, Visual perception by computer, in: Proceedings of IEEE Conference on Systems and Control, 1971, p. 262.

[4]

R. Brooks, Symbolic reasoning among 3-d models and 2-d images, Artif. Intell., 17 (1981) 285-348.

Digital Library

[5]

G. Csurka, C. Dance, L. Fan, J. Willamowski, C. Bray, Visual categorization with bags of keypoints, in: Workshop on Statistical Learning in Computer Vision, ECCV, 2004, pp. 1-2.

[6]

S. Fidler, S. Dickinson, R. Urtasun, 3d object detection and viewpoint estimation with a deformable 3d cuboid model, in: Advances in Neural Information Processing Systems, 2012, pp. 611-619.

[7]

P. Flynn, A. Jain, 3d object recognition using invariant feature indexing of interpretation tables, CVGIP: Image Underst., 55 (1992) 119-129.

Digital Library

[8]

Y. Gao, M. Wang, R. Ji, X. Wu, Q. Dai, 3d object retrieval with hausdorff distance learning, IEEE Trans. Ind. Electron. (2014).

[9]

Y. Gao, M. Wang, R. Ji, Z. Zha, J. Shen, k-partite graph reinforcement and its application in multimedia information retrieval, Inform. Sci., 194 (2012) 224-239.

Digital Library

[10]

Y. Gao, M. Wang, D. Tao, R. Ji, Q. Dai, 3d object retrieval and recognition with hypergraph analysis, IEEE Trans. Image Process., 21 (2012) 4290-4303.

Digital Library

[11]

A. Hanson, Computer Vision Systems, Academic Press, 1977.

[12]

G. Hinton, S. Osindero, Y. Teh, A fast learning algorithm for deep belief nets, Neural Comput., 18 (2006) 1527-1554.

Digital Library

[13]

M. Hofmann, D. Gavrila, Multi-view 3d human pose estimation in complex environment, Int. J. Comput. Vision, 96 (2012) 103-124.

Digital Library

[14]

S. Ji, W. Xu, M. Yang, K. Yu, 3d convolutional neural networks for human action recognition, IEEE Trans. Pattern Anal. Mach. Intell., 35 (2013) 221-231.

Digital Library

[15]

T. Karnowski, I. Arel, D. Rose, Deep spatiotemporal feature learning with application to image classification, in: Proceedings of 9th International Conference on Machine Learning and Applications, IEEE, 2010, pp. 883-888.

Digital Library

[16]

Y. LeCun, Y. Bengio, Convolutional networks for images, speech, and time series, in: The Handbook of Brain Theory and Neural Networks, MIT Press, 1998, pp. 255-258.

Digital Library

[17]

H. Lee, R. Grosse, R. Ranganath, A. Ng, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, in: Proceedings of the 26th Annual International Conference on Machine Learning, ACM, 2009, pp. 609-616.

[18]

J. Li, W. Huang, L. Shao, N. Allinson, Building recognition in urban environments: a survey of state-of-the-art and future challenges, Inform. Sci., 277 (2014) 406-420.

[19]

P. Li, M. Wang, J. Cheng, C. Xu, H. Lu, Spectral hashing with semantically consistent graph for image indexing, IEEE Trans. Multimedia, 15 (2013) 141-152.

Digital Library

[20]

R. Li, T. Zickler, Discriminative virtual views for cross-view action recognition, in: IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2012, pp. 2855-2862.

[21]

K. Lu, Q. Wang, J. Xue, W. Pan, 3d model retrieval and classification by semi-supervised learning with content-based similarity, Inform. Sci. (2014).

[22]

K. Mikolajczyk, C. Schmid, An affine invariant interest point detector, in: Computer Vision, Springer, 2002, pp. 128-142.

[23]

L. Nie, M. Wang, Y. Gao, Z. Zha, T. Chua, Beyond text QA: multimedia answer generation by harvesting web information, IEEE Trans. Multimedia, 15 (2013) 426-441.

Digital Library

[24]

D. Pelli, B. Farell, D. Moore, The remarkable inefficiency of word recognition, Nature, 423 (2003) 752-756.

[25]

S. Peng, D. Kim, S. Lee, C. Chung, A visual shape descriptor using sectors and shape context of contour lines, Inform. Sci., 180 (2010) 2925-2939.

Digital Library

[26]

T. Pham, Pattern recognition by active visual information processing in birds, Inform. Sci., 270 (2014) 134-142.

[27]

R. Rao, D. Ballard, An active vision architecture based on iconic representations, Artif. Intell., 78 (1995) 461-505.

Digital Library

[28]

J. Schwartz, M. Sharir, Identification of partially obscured objects in two and three dimensions by matching noisy characteristic curves, Int. J. Robot. Res., 6 (1987) 29-44.

Digital Library

[29]

T. Shao, W. Xu, K. Zhou, J. Wang, D. Li, B. Guo, An interactive approach to semantic modeling of indoor scenes with an rgbd camera, ACM Trans. Graph., 31 (2012) 136.

[30]

P. Simard, D. Steinkraus, J. Platt, Best practices for convolutional neural networks applied to visual document analysis, in: Proceedings of 7th International Conference on Document Analysis and Recognition, IEEE, 2003, pp. 958-963.

[31]

J. Sivic, A. Zisserman, Video Google: a text retrieval approach to object matching in videos, in: Proceedings of 9th IEEE International Conference on Computer Vision, IEEE, 2003, pp. 1470-1477.

[32]

M. Song, C. Chen, J. Bu, T. Sha, Image-based facial sketch-to-photo synthesis via online coupled dictionary learning, Inform. Sci., 193 (2012) 233-246.

Digital Library

[33]

M. Wang, Y. Gao, K. Lu, Y. Rui, View-based discriminative probabilistic modeling for 3d object retrieval and recognition, IEEE Trans. Image Process., 22 (2013) 1395-1407.

Digital Library

[34]

M. Wang, X. Hua, R. Hong, J. Tang, G. Qi, Y. Song, Unified video annotation via multigraph learning, IEEE Trans. Circ. Syst. Video Technol., 19 (2009) 733-746.

Digital Library

[35]

M. Wang, H. Li, D. Tao, K. Lu, X. Wu, Multimodal graph-based reranking for web image search, IEEE Trans. Image Process., 21 (2012) 4649-4661.

Digital Library

[36]

M. Wang, B. Ni, X. Hua, T. Chua, Assistive tagging: a survey of multimedia tagging with human-computer joint exploration, ACM Comput. Surv., 44 (2012) 25.

Digital Library

[37]

Z. Wang, P. Cui, F. Li, E. Chang, S. Yang, A data-driven study of image feature extraction and fusion, Inform. Sci. (2014).

[38]

Y. Xia, X. Ren, Z. Peng, J. Zhang, L. She, Effectively identifying the influential spreaders in large-scale social networks, Multimedia Tools Appl. (2014) 1-13.

[39]

Y. Xia, Y. Zhou, Synchronization induced by disorder of phase directions, Int. J. Mod. Phys. C, 25 (2014).

[40]

M. Zerroug, R. Nevatia, Three-dimensional descriptions based on the analysis of the invariant and quasi-invariant properties of some curved-axis generalized cylinders, IEEE Trans. Pattern Anal. Mach. Intell., 18 (1996) 237-253.

Digital Library

[41]

L. Zhang, Y. Gao, Y. Xia, K. Lu, J. Shen, R. Ji, Representative discovery of structure cues for weakly-supervised image segmentation, IEEE Trans. Multimedia, 16 (2014) 470-479.

Digital Library

Cited By

Carvalho Lvon Wangenheim A(2019)3D object recognition and classification: a systematic literature reviewPattern Analysis & Applications10.1007/s10044-019-00804-422:4(1243-1292)Online publication date: 1-Nov-2019
https://dl.acm.org/doi/10.1007/s10044-019-00804-4
Li PBu JYu JChen C(2016)Towards robust subspace recovery via sparsity-constrained latent low-rank representationJournal of Visual Communication and Image Representation10.1016/j.jvcir.2015.06.01237:C(46-52)Online publication date: 1-May-2016
https://dl.acm.org/doi/10.1016/j.jvcir.2015.06.012
Turcsany DBargiela AMaul T(2016)Local receptive field constrained deep networksInformation Sciences: an International Journal10.1016/j.ins.2016.02.034349:C(229-247)Online publication date: 1-Jul-2016
https://dl.acm.org/doi/10.1016/j.ins.2016.02.034

Recognizing multi-view objects with occlusions using a deep architecture
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks

Recommendations

Multi-view structure-from-motion for hybrid camera scenarios

We describe a pipeline for structure-from-motion (SfM) with mixed camera types, namely omnidirectional and perspective cameras. For the steps of this pipeline, we propose new approaches or adapt the existing perspective camera methods to make the ...
Tracking object poses in the context of robust body pose estimates

This work focuses on tracking objects being used by humans. These objects are often small, fast moving and heavily occluded by the user. Attempting to recover their 3D position and orientation over time is a challenging research problem. To make ...
Uncalibrated multi-view multiple humans association and 3D pose estimation by adversarial learning
Abstract
Multiple human 3D pose estimation is a useful but challenging task in computer vison applications. The ambiguities in estimation of 2D and 3D poses of multiple persons can be verified by using multi-view frames, in which the occluded or self-...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information Sciences: an International Journal

Information Sciences: an International Journal Volume 320, Issue C

November 2015

443 pages

ISSN:0020-0255

Issue’s Table of Contents

Copyright © Elsevier Inc.

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 November 2015

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Carvalho Lvon Wangenheim A(2019)3D object recognition and classification: a systematic literature reviewPattern Analysis & Applications10.1007/s10044-019-00804-422:4(1243-1292)Online publication date: 1-Nov-2019
https://dl.acm.org/doi/10.1007/s10044-019-00804-4
Li PBu JYu JChen C(2016)Towards robust subspace recovery via sparsity-constrained latent low-rank representationJournal of Visual Communication and Image Representation10.1016/j.jvcir.2015.06.01237:C(46-52)Online publication date: 1-May-2016
https://dl.acm.org/doi/10.1016/j.jvcir.2015.06.012
Turcsany DBargiela AMaul T(2016)Local receptive field constrained deep networksInformation Sciences: an International Journal10.1016/j.ins.2016.02.034349:C(229-247)Online publication date: 1-Jul-2016
https://dl.acm.org/doi/10.1016/j.ins.2016.02.034

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents