[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1007/978-3-030-27202-9_2guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A Deep Learning Approach for Real-Time 3D Human Action Recognition from Skeletal Data

Published: 27 August 2019 Publication History

Abstract

We present a new deep learning approach for real-time 3D human action recognition from skeletal data and apply it to develop a vision-based intelligent surveillance system. Given a skeleton sequence, we propose to encode skeleton poses and their motions into a single RGB image. An Adaptive Histogram Equalization (AHE) algorithm is then applied on the color images to enhance their local patterns and generate more discriminative features. For learning and classification tasks, we design Deep Neural Networks based on the Densely Connected Convolutional Architecture (DenseNet) to extract features from enhanced-color images and classify them into classes. Experimental results on two challenging datasets show that the proposed method reaches state-of-the-art accuracy, whilst requiring low computational time for training and inference. This paper also introduces CEMEST, a new RGB-D dataset depicting passenger behaviors in public transport. It consists of 203 untrimmed real-world surveillance videos of realistic normal and anomalous events. We achieve promising results on real conditions of this dataset with the support of data augmentation and transfer learning techniques. This enables the construction of real-world applications based on deep learning for enhancing monitoring and security in public transport.

References

[1]
Bilen H, Fernando B, Gavves E, and Vedaldi A Action recognition with dynamic image networks IEEE Transactions on Pattern Analysis and Machine Intelligence 2018 40 12 2799-2813
[2]
Chen C, Liu K, and Kehtarnavaz N Real-time human action recognition based on depth motion maps Journal of Real-Time Image Processing 2016 12 1 155-163
[3]
Choutas, V., Weinzaepfel, P., Revaud, J., Schmid, C.: Potion: Pose motion representation for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 7024–7033 (2018)
[4]
Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by Exponential Linear Units (ELUs). arXiv preprint arXiv:1511.07289 (2015)
[5]
Ding, Z., Wang, P., Ogunbona, P.O., Li, W.: Investigation of different skeleton features for cnn-based 3d action recognition. In: 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). pp. 617–622. IEEE (2017)
[6]
Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: IEEE CVPR. pp. 1110–1118 (2015)
[7]
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: International Conference on Artificial Intelligence and Statistics (AISTATS). pp. 315–323 (2011)
[8]
Han L, Wu X, Liang W, Hou G, and Jia Y Discriminative human action recognition in the learned hierarchical manifold space Image and Vision Computing 2010 28 5 836-849
[9]
He, K., Sun, J.: Convolutional neural networks at constrained time cost. In: IEEE CVPR. pp. 5353–5360 (2015)
[10]
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In: IEEE ICCV. pp. 1026–1034 (2015)
[11]
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE CVPR. pp. 770–778 (2016)
[12]
Hochreiter S and Schmidhuber J Long Short-Term Memory Neural Computation 1997 9 8 1735-1780
[13]
Hu J, Zheng WS, Lai JH, and Jianguo Z Jointly learning heterogeneous features for RGB-D activity recognition IEEE Transactions on Pattern Analysis and Machine Intelligence 2015 39 2186-2200
[14]
Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. In: IEEE CVPR. p. 3 (2017)
[15]
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: ICML. pp. 448–456 (2015)
[16]
Johansson G Visual perception of biological motion and a model for its analysis Perception & Psychophysics 1973 14 2 201-211
[17]
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
[18]
LeCun Y, Bengio Y, and Hinton G Deep learning. nature 2015 521 7553 436
[19]
Lee, I., Kim, D., Kang, S., Lee, S.: Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 1012–1020 (2017)
[20]
Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: IEEE CVPR. pp. 9–14 (2010)
[21]
Liu Jun, Shahroudy Amir, Xu Dong, and Wang Gang Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition Computer Vision – ECCV 2016 2016 Cham Springer International Publishing 816-833
[22]
Liu J, Wang G, Duan LY, Abdiyeva K, and Kot AC Skeleton-based human action recognition with global context-aware attention LSTM networks IEEE Transactions on Image Processing 2018 27 4 1586-1599
[23]
Liu, J., Wang, G., Hu, P., Duan, L.Y., Kot, A.C.: Global context-aware attention LSTM networks for 3D action recognition. In: IEEE CVPR. pp. 3671–3680 (2017)
[24]
Liu M, Liu H, and Chen C Enhanced skeleton visualization for view invariant human action recognition Pattern Recognition 2017 68 346-362
[25]
Luo, J., Wang, W., Qi, H.: Group sparsity and geometry constrained dictionary learning for action recognition from depth maps. In: IEEE ICCV. pp. 1809–1816 (2013)
[26]
Lv Fengjun and Nevatia Ramakant Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost Computer Vision – ECCV 2006 2006 Berlin, Heidelberg Springer Berlin Heidelberg 359-372
[27]
Pham, H., Khoudour, L., Crouzil, A., Zegers, P., Velastin, S.A.: Skeletal movement to color map: A novel representation for 3D action recognition with Inception Residual networks. In: IEEE International Conference on Image Processing (ICIP). pp. 3483–3487 (2018)
[28]
Pham, H.H., Khoudour, L., Crouzil, A., Zegers, P., Velastin, S.: Learning to Recognize 3D Human Action from A New Skeleton-based Representation Using Deep Convolutional Neural Networks. IET Computer Vision (2018)
[29]
Pham HH, Khoudour L, Crouzil A, Zegers P, and Velastin SA Exploiting deep residual networks for human action recognition from skeletal data Computer Vision and Image Understanding 2018 170 51-66
[30]
Pizer SM, Amburn EP, Austin JD, Cromartie R, Geselowitz A, Greer T, ter Haar Romeny B, Zimmerman JB, and Zuiderveld K Adaptive histogram equalization and its variations Computer Vision, Graphics, and Image Processing 1987 39 3 355-368
[31]
Poppe R A survey on vision-based human action recognition Image and Vision Computing 2010 28 6 976-990
[32]
Sainath, T.N., Vinyals, O., Senior, A., Sak, H.: Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks. In: IEEE ICASSP. pp. 4580–4584 (2015)
[33]
Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: A large scale dataset for 3D human activity analysis. In: IEEE CVPR. pp. 1010–1019 (2016)
[34]
Shotton J, Sharp T, Kipman A, Fitzgibbon A, Finocchio M, Blake A, Cook M, and Moore R Real-time human pose recognition in parts from single depth images Communications of the ACM 2013 56 1 116-124
[35]
Si Chenyang, Jing Ya, Wang Wei, Wang Liang, and Tan Tieniu Skeleton-Based Action Recognition with Spatial Reasoning and Temporal Stack Learning Computer Vision – ECCV 2018 2018 Cham Springer International Publishing 106-121
[36]
Tanfous, A.B., Drira, H., Amor, B.B.: Coding Kendall’s shape trajectories for 3D action recognition. In: IEEE CVPR. pp. 2840–2849 (2018)
[37]
The Local: SNCF increases fines for ticket dodgers. https://bit.ly/2mYaJwW (2015), published 20 February 2015. Accessed 10 July 2018
[38]
Veeriah, V., Zhuang, N., Qi, G.J.: Differential recurrent neural networks for action recognition. In: IEEE ICCV. pp. 4041–4049 (2015)
[39]
Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3D skeletons as points in a lie group. In: IEEE CVPR. pp. 588–595 (2014)
[40]
Wang, H., Wang, L.: Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In: IEEE CVPR. pp. 3633–3642 (2017)
[41]
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: IEEE CVPR. pp. 1290–1297 (2012)
[42]
Wang Pei, Yuan Chunfeng, Hu Weiming, Li Bing, and Zhang Yanning Graph Based Skeleton Motion Representation and Similarity Measurement for Action Recognition Computer Vision – ECCV 2016 2016 Cham Springer International Publishing 370-385
[43]
Wang, P., Li, W., Ogunbona, P., Gao, Z., Zhang, H.: Mining mid-level features for action recognition based on effective skeleton representation. In: International Conference on Digital Image Computing: Techniques and Applications (DICTA). pp. 1–8 (2014)
[44]
Weng, J., Weng, C., Yuan, J.: Spatio-temporal Naive-Bayes Nearest-Neighbor (ST-NBNN) for skeleton-based action recognition. In: IEEE CVPR. pp. 4171–4180 (2017)
[45]
Weng, J., Weng, C., Yuan, J., Liu, Z.: Discriminative spatio-temporal pattern discovery for 3D action recognition. IEEE Transactions on Circuits and Systems for Video Technology pp. 1–1 (2018)
[46]
Wu, D., Shao, L.: Leveraging hierarchical parametric networks for skeletal joints based action segmentation and recognition. In: IEEE CVPR. pp. 724–731 (2014)
[47]
Xia, L., Chen, C.C., Aggarwal, J.: View invariant human action recognition using histograms of 3D joints. In: IEEE CVPR. pp. 20–27 (2012)
[48]
Xu, H., Chen, E., Liang, C., Qi, L., Guan, L.: Spatio-temporal pyramid model based on depth maps for action recognition. In: IEEE International Workshop on Multimedia Signal Processing (MMSP). pp. 1–6 (2015)
[49]
Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L., Samaras, D.: Two-person interaction detection using body-pose features and multiple instance learning. In: IEEE CVPR. pp. 28–35 (2012)
[50]
Zhang, S., Liu, X., Xiao, J.: On geometric features for skeleton-based action recognition using multilayer LSTM networks. In: IEEE Winter Conference on Applications of Computer Vision (WACV). pp. 148–157 (2017)
[51]
Zhu, W., et al.: Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: AAAI. p. 8 (2016)

Index Terms

  1. A Deep Learning Approach for Real-Time 3D Human Action Recognition from Skeletal Data
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image Guide Proceedings
          Image Analysis and Recognition: 16th International Conference, ICIAR 2019, Waterloo, ON, Canada, August 27–29, 2019, Proceedings, Part I
          Aug 2019
          491 pages
          ISBN:978-3-030-27201-2
          DOI:10.1007/978-3-030-27202-9

          Publisher

          Springer-Verlag

          Berlin, Heidelberg

          Publication History

          Published: 27 August 2019

          Author Tags

          1. Action recognition
          2. Skeletal data
          3. Enhanced-SPMF
          4. DenseNet

          Qualifiers

          • Article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 28 Jan 2025

          Other Metrics

          Citations

          View Options

          View options

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media