Abstract
3D sensors such as standoff Light Detection and Ranging (LIDAR) generate partial 3D point clouds that resemble patches of irregularly-shaped, coarse groups of points. 3D modeling of this type of data for human action recognition has been rarely studied. Although 2D–based depth image analysis is an option, its effectiveness on this type of low-resolution data hasn’t been well answered. This paper investigates a new multi-scale 3D shape descriptor, based on the discrete orthogonal Tchebichef Moments, for the characterization of 3D action pose shapes made of low-resolution point cloud patches. Our shape descriptor consists of low-order 3D Tchebichef moments computed with respect to a new point cloud voxelization scheme that normalizes translation, scale, and resolution. The action recognition is built on the Naïve Bayes classifier using temporal statistics of a ‘bag of pose shapes’. For performance evaluation, a synthetic LIDAR pose shape baseline was developed with 62 human subjects performing three actions ― digging, jogging, and throwing. Our action classification experiments demonstrated that the 3D Tchebichef moment representation of point clouds achieves excellent action and viewing direction predictions with superb consistency across a large range of scale and viewing angle variations.
Similar content being viewed by others
References
Aggarwal JK, Xia L (2014) Human activity recognition from 3D data: a review. Pattern Recogn Lett 48:70–80
Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional space. In: Proc. Int. Conf. database theory, pp 420–434
Ballin G, Munaro M, Menegatti E (2012) Human action recognition from RGB-D frames based on real-time 3d optical flow estimation. Biologically Inspired Cognitive Architectures, Springer-Velag, pp 65–74
Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267
Cheng H, Chung SM (2016) Orthogonal moment-based descriptors for pose shape query on 3D point cloud patches. Pattern Recognition 52, Elsevier Science:397–406
Chihara TS (1978) An introduction to orthogonal polynomials, Gordon and Breach
Costantini L, Seidenari L, Serra G, Capodiferro L, Bimbo AD (2011) Space-time Zernike moments and pyramid kernel descriptors for action classification. In: Proc. Int. Conf. Image Anal. Processing
Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. Proc Eur Conf Comput Vis. Lect Notes Comput Sci 3952:428–441
Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. IEEE Conf Comput Vis Pattern Recogn 2625–2634
Efros AA, Berg A, Mori G, Malik J (2003) Recognizing action at a distance. Proc Int Conf Comput Vis 2:726–733
Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253
Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
Johnstone IM, Lu AY (2009) On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc 104:682–693
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. IEEE Conf Comput Vis. Pattern Recogn 1725–1732
Kazhdan M, Funkhouser T, Rusinkiewicz S (2003) Rotation invariant spherical harmonic representation of 3D shape descriptors. In: Proc. Eurographics Symp. Geometry Processing, pp 156–164
Kläser A, Marszałek M, Schmid C (2008) A spatial-temporal descriptor based on 3D gradients. In: Proc. British Mach. Vis. Conf
Krizhevsky A, Sutskever I, and Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (NIPS 2012), pp 1097–1105
Laptev I, Lindeberg T (2003) Space–time interest points. Proc Int Conf Comput Vis 2:432–439
Lassoued I, Zagrouba E, Chahir Y (2011) An efficient approach for video action classification based on 3D Zernike moments. In: Proc. Int. Conf. Future Inf. Tech., Part II, pp 196–205
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: Proc. IEEE. Conf. Comput. Vis. Pattern Recogn. Workshops, pp 9–14
Lian Z, Godil A, Sun X (2010) Visual similarity based 3D shape retrieval using bag-of-features. Int Conf Shape Model Appl 25–36
Lu Y, Li Y, Shen Y, Ding F, Wang X, Hu J, Ding S (2012) A human action recognition method based on Tchebichef moment invariants and temporal templates. In: Proc. Int. Conf. Intelligent Human-Machine Sys. and Cybernetics, vol. II, pp 76–79
Mademlis A, Axenopoulos A, Daras P, Tzovaras D, Strintzis MG (2006) 3D content-based search based on 3D Krawtchouk moments. In: Proc. Int. Symp. 3D data processing, visualization, and transmission, pp 743–749
Maturana D, Scherer S (2015) Voxnet: a 3D convolutional neural network for real-time object recognition. IEEE/RSJ Int Conf Intell Robots Sys 922–928
McCallum A, Freitag D, Pereira F (2000) Maximum entropy Markov models for information extraction and segmentation. Int Conf Mach Learning 591–598
Metsis V, Androutsopoulos I, Paliouras G (2006) Spam filtering with naive Bayes — which naive Bayes? In: Proc. Conf. Email and anti-spam, pp 27–28
Mukundan R, Ong SH, Lee PA (2001) Image analysis by Tchebichef moments. IEEE Trans Image Process 10(9):1357–1364
Ni B, Wang G, Moulin P (2011) RGBD-HuDaAct: a color-depth video database for human daily activity recognition. In: Proc. IEEE. Int. Conf. Comput. Vis. Workshops, pp 1147–1153
Niebles J, Wang H, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318
Novotni M, Klein R (2004) Shape retrieval using 3D Zernike descriptors. Comput Aided Des 36(11):1047–1062
Ohbuchi R, Osada K, Furuya T, Banno T (2008) Salient local visual features for shape-based 3D model retrieval. IEEE Int Conf Shape Model Appl 93–102
Ovsjanikov M, Bronstein AM, Bronstein MM, Guibas L (2009) Shape google: a computer vision approach to isometry invariant shape retrieval. In: Proc. workshop on non-rigid shape analysis and deformable image alignment (NORDIA’09)
Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990
Schüldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach Int Conf Pattern Recogn 32–36
Sheng Y, Shen L (1994) Orthogonal Fourier-Mellin moments for invariant pattern recognition. J Opt Soc Am 11(6):1748–1757
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Advances in neural information processing systems 27 (NIPS 2014), pp. 568–576
Sivic J, Zisserman A (2003) Video Google: A text retrieval approach to object matching in videos. Proc Int Conf Comput Vis 2:1470–1477
Sminchisescu C, Kanaujia A, Li Z, Metaxas D (2006) Conditional models for contextual human motion recognition. Comput Vis Image Underst 104:210–220
Sun X, Cheng M, Hauptmann A (2009) Action recognition via local descriptors and holistic features. In: Proc. IEEE Conf. Comput. Vis. Pattern Recogn. Workshops, pp 58–65
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. Proc. IEEE Conf Comput Vis Pattern Recogn
Tabia H, Daoudi M, Vandeborre J-P, Colot O (2011) Deformable shape retrieval using bag-of-feautre techniques. In: Proc. SPIE-IS&T Electronic Imaging, SPIE, vol 7864
Teague MR (1980) Image analysis via the general theory of moments. J Opt Soc Am 70(8):920–930
Teh CH, Chin RT (1988) On image analysis by the methods of moments. IEEE Trans Pattern Anal Mach Intell 10(4):496–513
Vieira A, Nascimento E, Oliveira G, Liu Z, Campos M (2012) STOP: Space-time occupancy patterns for 3D action recognition from depth map sequences. Progress in Pattern Recognition, Image Analysis, Computer Vision and Application. Lect Notes Comput Sci 7441:252–259
Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. IEEE Conf Comput Vis Pattern Recogn 3156–3164
Wang Y, Mori G (2009) Human action recognition by Semilatent topic models. IEEE Trans Pattern Anal Mach Intell 31(10):1762–1774
Wang J, Liu Z, Chorowski J, Chen Z, Wu Y (2012) Robust 3D action recognition with random occupancy patterns. European Conf Comput Vis 872–885
Wang J, Liu Z, Wu Y, Yuan J (2014) Learning actionlet ensemble for 3D human action recognition. IEEE Trans Pattern Anal Mach Intell 36(5):914–927
Weinland D, Ronfard R, Boyer E (2011) A survey of vision-based methods for action representation, segmentation, and recognition. Comput Vis Image Underst 115(2):224–241
Wolf C, Mille J, Lombardi E, Celiktutan O, Jiu MB, Dellandrea E, Bichot C, Garcia C, Sankur B (2012) The LIRIS human activities dataset and the ICPR 2012 human activities recognition and localization competition. Technical report RR-LIRIS-2012-004, LIRIS Laboratory
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3D ShapeNets: a deep representation for volumetric shapes. IEEE Conf Comput Vis Pattern Recogn 1912–1920
Xia L, Chen C.-C, and Aggarwal JK (2012) View invariant human action recognition using histograms of 3D joints. In: Proc. IEEE Conf. Comput. Vis. Pattern Recogn. Workshops, pp 20–27
Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hidden Markov model. IEEE Conf Comput Vis Pattern Recogn 379–385
Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps based histograms of oriented gradients. ACM Int Conf Multimed 1057–1060
Yao L, Torabi A, Cho K, Ballas N, Pal C, Larochelle H, Courville A (2015) Describing videos by exploiting temporal structure. IEEE Conf Comput Vis 4507–4515
Ye M, Zhang Q, Wang L, Zhu J, Yang R, Gail J (2013) A survey on human motion analysis from depth data. Time-of-Flight and Depth Imaging, Sensors, Algorithms, and Applications. Lect Notes Comput Sci 8200:149–187
Acknowledgements
The authors would like to thank Isiah Davenport, Max Grattan, and Jeanne Smith for their indispensable help in the creation of biofidelic pose shape baseline.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cheng, H., Chung, S.M. Action recognition from point cloud patches using discrete orthogonal moments. Multimed Tools Appl 77, 8213–8236 (2018). https://doi.org/10.1007/s11042-017-4711-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4711-0