More Web Proxy on the site http://driver.im/

Article

A Deep Learning Approach for Real-Time 3D Human Action Recognition from Skeletal Data

Authors:

Houssam Salmane,

Louahdi Khoudour,

Sergio A. VelastinAuthors Info & Claims

Image Analysis and Recognition: 16th International Conference, ICIAR 2019, Waterloo, ON, Canada, August 27–29, 2019, Proceedings, Part I

Pages 18 - 32

https://doi.org/10.1007/978-3-030-27202-9_2

Published: 27 August 2019 Publication History

Abstract

We present a new deep learning approach for real-time 3D human action recognition from skeletal data and apply it to develop a vision-based intelligent surveillance system. Given a skeleton sequence, we propose to encode skeleton poses and their motions into a single RGB image. An Adaptive Histogram Equalization (AHE) algorithm is then applied on the color images to enhance their local patterns and generate more discriminative features. For learning and classification tasks, we design Deep Neural Networks based on the Densely Connected Convolutional Architecture (DenseNet) to extract features from enhanced-color images and classify them into classes. Experimental results on two challenging datasets show that the proposed method reaches state-of-the-art accuracy, whilst requiring low computational time for training and inference. This paper also introduces CEMEST, a new RGB-D dataset depicting passenger behaviors in public transport. It consists of 203 untrimmed real-world surveillance videos of realistic normal and anomalous events. We achieve promising results on real conditions of this dataset with the support of data augmentation and transfer learning techniques. This enables the construction of real-world applications based on deep learning for enhancing monitoring and security in public transport.

References

[1]

Bilen H, Fernando B, Gavves E, and Vedaldi A Action recognition with dynamic image networks IEEE Transactions on Pattern Analysis and Machine Intelligence 2018 40 12 2799-2813

[2]

Chen C, Liu K, and Kehtarnavaz N Real-time human action recognition based on depth motion maps Journal of Real-Time Image Processing 2016 12 1 155-163

[3]

Choutas, V., Weinzaepfel, P., Revaud, J., Schmid, C.: Potion: Pose motion representation for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 7024–7033 (2018)

[4]

Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by Exponential Linear Units (ELUs). arXiv preprint arXiv:1511.07289 (2015)

[5]

Ding, Z., Wang, P., Ogunbona, P.O., Li, W.: Investigation of different skeleton features for cnn-based 3d action recognition. In: 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). pp. 617–622. IEEE (2017)

[6]

Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: IEEE CVPR. pp. 1110–1118 (2015)

[7]

Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: International Conference on Artificial Intelligence and Statistics (AISTATS). pp. 315–323 (2011)

[8]

Han L, Wu X, Liang W, Hou G, and Jia Y Discriminative human action recognition in the learned hierarchical manifold space Image and Vision Computing 2010 28 5 836-849

[9]

He, K., Sun, J.: Convolutional neural networks at constrained time cost. In: IEEE CVPR. pp. 5353–5360 (2015)

[10]

He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In: IEEE ICCV. pp. 1026–1034 (2015)

[11]

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE CVPR. pp. 770–778 (2016)

[12]

Hochreiter S and Schmidhuber J Long Short-Term Memory Neural Computation 1997 9 8 1735-1780

[13]

Hu J, Zheng WS, Lai JH, and Jianguo Z Jointly learning heterogeneous features for RGB-D activity recognition IEEE Transactions on Pattern Analysis and Machine Intelligence 2015 39 2186-2200

[14]

Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. In: IEEE CVPR. p. 3 (2017)

[15]

Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: ICML. pp. 448–456 (2015)

[16]

Johansson G Visual perception of biological motion and a model for its analysis Perception & Psychophysics 1973 14 2 201-211

[17]

Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

[18]

LeCun Y, Bengio Y, and Hinton G Deep learning. nature 2015 521 7553 436

[19]

Lee, I., Kim, D., Kang, S., Lee, S.: Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 1012–1020 (2017)

[20]

Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: IEEE CVPR. pp. 9–14 (2010)

[21]

Liu Jun, Shahroudy Amir, Xu Dong, and Wang Gang Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition Computer Vision – ECCV 2016 2016 Cham Springer International Publishing 816-833

[22]

Liu J, Wang G, Duan LY, Abdiyeva K, and Kot AC Skeleton-based human action recognition with global context-aware attention LSTM networks IEEE Transactions on Image Processing 2018 27 4 1586-1599

[23]

Liu, J., Wang, G., Hu, P., Duan, L.Y., Kot, A.C.: Global context-aware attention LSTM networks for 3D action recognition. In: IEEE CVPR. pp. 3671–3680 (2017)

[24]

Liu M, Liu H, and Chen C Enhanced skeleton visualization for view invariant human action recognition Pattern Recognition 2017 68 346-362

[25]

Luo, J., Wang, W., Qi, H.: Group sparsity and geometry constrained dictionary learning for action recognition from depth maps. In: IEEE ICCV. pp. 1809–1816 (2013)

[26]

Lv Fengjun and Nevatia Ramakant Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost Computer Vision – ECCV 2006 2006 Berlin, Heidelberg Springer Berlin Heidelberg 359-372

[27]

Pham, H., Khoudour, L., Crouzil, A., Zegers, P., Velastin, S.A.: Skeletal movement to color map: A novel representation for 3D action recognition with Inception Residual networks. In: IEEE International Conference on Image Processing (ICIP). pp. 3483–3487 (2018)

[28]

Pham, H.H., Khoudour, L., Crouzil, A., Zegers, P., Velastin, S.: Learning to Recognize 3D Human Action from A New Skeleton-based Representation Using Deep Convolutional Neural Networks. IET Computer Vision (2018)

[29]

Pham HH, Khoudour L, Crouzil A, Zegers P, and Velastin SA Exploiting deep residual networks for human action recognition from skeletal data Computer Vision and Image Understanding 2018 170 51-66

[30]

Pizer SM, Amburn EP, Austin JD, Cromartie R, Geselowitz A, Greer T, ter Haar Romeny B, Zimmerman JB, and Zuiderveld K Adaptive histogram equalization and its variations Computer Vision, Graphics, and Image Processing 1987 39 3 355-368

[31]

Poppe R A survey on vision-based human action recognition Image and Vision Computing 2010 28 6 976-990

[32]

Sainath, T.N., Vinyals, O., Senior, A., Sak, H.: Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks. In: IEEE ICASSP. pp. 4580–4584 (2015)

[33]

Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: A large scale dataset for 3D human activity analysis. In: IEEE CVPR. pp. 1010–1019 (2016)

[34]

Shotton J, Sharp T, Kipman A, Fitzgibbon A, Finocchio M, Blake A, Cook M, and Moore R Real-time human pose recognition in parts from single depth images Communications of the ACM 2013 56 1 116-124

[35]

Si Chenyang, Jing Ya, Wang Wei, Wang Liang, and Tan Tieniu Skeleton-Based Action Recognition with Spatial Reasoning and Temporal Stack Learning Computer Vision – ECCV 2018 2018 Cham Springer International Publishing 106-121

[36]

Tanfous, A.B., Drira, H., Amor, B.B.: Coding Kendall’s shape trajectories for 3D action recognition. In: IEEE CVPR. pp. 2840–2849 (2018)

[37]

The Local: SNCF increases fines for ticket dodgers. https://bit.ly/2mYaJwW (2015), published 20 February 2015. Accessed 10 July 2018

[38]

Veeriah, V., Zhuang, N., Qi, G.J.: Differential recurrent neural networks for action recognition. In: IEEE ICCV. pp. 4041–4049 (2015)

[39]

Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3D skeletons as points in a lie group. In: IEEE CVPR. pp. 588–595 (2014)

[40]

Wang, H., Wang, L.: Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In: IEEE CVPR. pp. 3633–3642 (2017)

[41]

Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: IEEE CVPR. pp. 1290–1297 (2012)

[42]

Wang Pei, Yuan Chunfeng, Hu Weiming, Li Bing, and Zhang Yanning Graph Based Skeleton Motion Representation and Similarity Measurement for Action Recognition Computer Vision – ECCV 2016 2016 Cham Springer International Publishing 370-385

[43]

Wang, P., Li, W., Ogunbona, P., Gao, Z., Zhang, H.: Mining mid-level features for action recognition based on effective skeleton representation. In: International Conference on Digital Image Computing: Techniques and Applications (DICTA). pp. 1–8 (2014)

[44]

Weng, J., Weng, C., Yuan, J.: Spatio-temporal Naive-Bayes Nearest-Neighbor (ST-NBNN) for skeleton-based action recognition. In: IEEE CVPR. pp. 4171–4180 (2017)

[45]

Weng, J., Weng, C., Yuan, J., Liu, Z.: Discriminative spatio-temporal pattern discovery for 3D action recognition. IEEE Transactions on Circuits and Systems for Video Technology pp. 1–1 (2018)

[46]

Wu, D., Shao, L.: Leveraging hierarchical parametric networks for skeletal joints based action segmentation and recognition. In: IEEE CVPR. pp. 724–731 (2014)

[47]

Xia, L., Chen, C.C., Aggarwal, J.: View invariant human action recognition using histograms of 3D joints. In: IEEE CVPR. pp. 20–27 (2012)

[48]

Xu, H., Chen, E., Liang, C., Qi, L., Guan, L.: Spatio-temporal pyramid model based on depth maps for action recognition. In: IEEE International Workshop on Multimedia Signal Processing (MMSP). pp. 1–6 (2015)

[49]

Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L., Samaras, D.: Two-person interaction detection using body-pose features and multiple instance learning. In: IEEE CVPR. pp. 28–35 (2012)

[50]

Zhang, S., Liu, X., Xiao, J.: On geometric features for skeleton-based action recognition using multilayer LSTM networks. In: IEEE Winter Conference on Applications of Computer Vision (WACV). pp. 148–157 (2017)

[51]

Zhu, W., et al.: Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: AAAI. p. 8 (2016)

Index Terms

A Deep Learning Approach for Real-Time 3D Human Action Recognition from Skeletal Data
1. Computing methodologies

Index terms have been assigned to the content through auto-classification.

Recommendations

Exploiting deep residual networks for human action recognition from skeletal data
Highlights
- An end-to-end learning framework based on Deep Residual Networks (ResNets) has been presented to effectively learn the spatial-temporal dynamics carried in RGB images which encoded from skeleton sequences for 3D human action recognition.
Abstract
The computer vision community is currently focusing on solving action recognition problems in real videos, which contain thousands of samples with many challenges. In this process, Deep Convolutional Neural Networks (D-CNNs) have played a ...
Human Action Recognition using Pre-trained Convolutional Neural Networks
VSIP '20: Proceedings of the 2020 2nd International Conference on Video, Signal and Image Processing

Recognition of human action is one of the challenges in the field of artificial intelligence. Deep learning model has become a research issue in action recognition applications due to its ability to outperform traditional machine learning approaches. ...
Benchmarking deep learning techniques for face recognition
Highlights
- Training networks for face recognition is very complex and time-consuming.
- Multiple factors need to be considered: deep learning frameworks, GPU platforms, deep network models, and datasets.
- We compare three deep learning ...
Abstract
Recent progresses in Convolutional Neural Networks (CNNs) and GPUs have greatly advanced the state-of-the-art performance for face recognition. However, training CNNs for face recognition is complex and time-consuming. Multiple factors need to be ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Image Analysis and Recognition: 16th International Conference, ICIAR 2019, Waterloo, ON, Canada, August 27–29, 2019, Proceedings, Part I

Aug 2019

491 pages

ISBN:978-3-030-27201-2

DOI:10.1007/978-3-030-27202-9

Editors:
Fakhri Karray
University of Waterloo, Waterloo, ON, Canada
,
Aurélio Campilho
University of Porto, Porto, Portugal
,
Alfred Yu
University of Waterloo, Waterloo, ON, Canada

© Springer Nature Switzerland AG 2019.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 27 August 2019

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten