Augmentation of Human Action Datasets with Suboptimal Warping and Representative Data Samples
<p>Class boundaries in the 2D MDS embeddings of DTW dissimilarities for the exemplary time series from the MSRA I dataset generated by SPAWNER and ARSPAWNER. Boundaries of neighboring classes are highlighted.</p> "> Figure 2
<p>Calculation of Bone Pair Descriptor.</p> "> Figure 3
<p>Three skeletons available in datasets: (<b>left</b>) MSRA, UTD-MHAD, UTK, and SYSU; (<b>middle</b>) FLORENCE; (<b>right</b>) KARD.</p> "> Figure 4
<p>Time series generated by ARSPAWNER (blue curve) based on two exemplary timeseries (red and green curves). The left plot represents “draw circle” action and the right plot represents “high arm wave” action from MSRA II dataset.</p> "> Figure 5
<p>The 2D MDS embeddings of DTW dissimilarities between training and augmented sequences from the MSRA I dataset for the compared augmentation methods. Colors are used to differentiate the classes, the filled triangles denote input examples.</p> "> Figure 6
<p>The 2D MDS embeddings of DTW dissimilarities between testing and training or testing and augmented sequences from the MSRA I dataset. Colors differentiate the classes, the filled triangles denote testing examples.</p> "> Figure 7
<p>The 2D MDS embeddings of DTW dissimilarities between sequences of reduced dimensionality from the MSRA I dataset for CGAN and ARSPAWNER. Colors are used to differentiate the classes, the filled triangles denote input examples (<b>a</b>,<b>c</b>), while filled circles denote augmented samples (<b>b</b>,<b>d</b>).</p> "> Figure 8
<p>Three-dimensional surface plots presenting the impact of ARSPAWNER parameters <math display="inline"><semantics> <msub> <mi>r</mi> <mn>1</mn> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>r</mi> <mn>2</mn> </msub> </semantics></math> on classification accuracy with MSRA II dataset. The upper, middle, and lower plots represent the results of DTW, LDMLT, and TCK, respectively.</p> "> Figure 9
<p>Average accuracy of the nearest neighbor classifier with the DTW distance based on a small number of augmented training examples per class.</p> ">
Abstract
:1. Introduction
- A novel method for AR time series augmentation with small amount of data;
- A novel and efficient method for determining constraints on generated data samples using statistics for a class and its representatives along with their incorporation into the data augmentation approach to address AR-related characteristics;
- Comparative evaluation of the method with related approaches on eight AR datasets using popular classifiers.
2. Related Work
3. Proposed Method
4. Action Recognition Descriptors and Features
4.1. Distance Descriptor
- For each joint , do:
- (a)
- Calculate distances between the other joints , ;
- (b)
- Sort joints by the calculated distances in ascending order;
- (c)
- Assign consecutive integers to the ordered joints , starting from 1.
- Create a feature vector consisting of integer values assigned to the joints in step 1(c) in the following order: ;
- Reduce the feature vector by adding together integers a corresponding to the same pair of indices i, j: .
4.2. Bone Pair Descriptor
5. Experiments and Discussion
5.1. Datasets
5.2. Visual Examples of Augmented Time Series
5.3. Classifiers
5.4. Results
5.5. Visual Comparison
5.6. Comparison with CGAN
5.7. Impact of Parameters
5.8. Performance with Small Number of Training Examples
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhang, H.B.; Zhang, Y.X.; Zhong, B.; Lei, Q.; Yang, L.; Du, J.X.; Chen, D.S. A comprehensive survey of vision-based human action recognition methods. Sensors 2019, 19, 1005. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Handhika, T.; Murni; Lestari, D.P.; Sari, I. Multivariate time series classification analysis: State-of-the-art and future challenges. IOP Conf. Ser. Mater. Sci. Eng. 2019, 536, 012003. [Google Scholar] [CrossRef]
- Le Guennec, A.; Malinowski, S.; Tavenard, R. Data Augmentation for Time Series Classification using Convolutional Neural Networks. In Proceedings of the AALTD 2016: Second ECML/PKDD International Workshop on Advanced Analytics and Learning on Temporal Data, Riva del Garda, Italy, 19–23 September 2016; p. 11. [Google Scholar]
- Um, T.T.; Pfister, F.M.J.; Pichler, D.; Endo, S.; Lang, M.; Hirche, S.; Fietzek, U.; Kulić, D. Data Augmentation of Wearable Sensor Data for Parkinson’s Disease Monitoring using Convolutional Neural Networks. In Proceedings of the ACM International Conference on Multimodal Interaction, Glasgow, UK, 13–17 November 2017. [Google Scholar] [CrossRef] [Green Version]
- Haradal, S.; Hayashi, H.; Uchida, S. Biosignal Data Augmentation Based on Generative Adversarial Networks. In Proceedings of the 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Honolulu, HI, USA, 17–21 July 2018; pp. 368–371. [Google Scholar] [CrossRef]
- Fawaz, H.I.; Forestier, G.; Weber, J.; Idoumghar, L.; Muller, P.A. Data Augmentation Using Synthetic Data for Time Series Classification with Deep Residual Networks. arXiv 2018, arXiv:1808.02455. [Google Scholar]
- Forestier, G.; Petitjean, F.; Dau, H.A.; Webb, G.I.; Keogh, E. Generating Synthetic Time Series to Augment Sparse Datasets. In Proceedings of the 2017 IEEE International Conference on Data Mining (ICDM), New Orleans, LA, USA, 18–21 November 2017; pp. 865–870. [Google Scholar] [CrossRef]
- Kamycki, K.; Kapuściński, T.; Oszust, M. Data Augmentation with Suboptimal Warping for Time-Series Classification. Sensors 2020, 20, 98. [Google Scholar] [CrossRef] [Green Version]
- Douzas, G.; Bacao, F. Geometric SMOTE a geometrically enhanced drop-in replacement for SMOTE. Inf. Sci. 2019, 501, 118–135. [Google Scholar] [CrossRef]
- Sakoe, H.; Chiba, S. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech 1978, 26, 43–49. [Google Scholar] [CrossRef] [Green Version]
- Ramponi, G.; Protopapas, P.; Brambilla, M.; Janssen, R. T-CGAN: Conditional Generative Adversarial Network for Data Augmentation in Noisy Time Series with Irregular Sampling. arXiv 2018, arXiv:1811.08295. [Google Scholar]
- Cao, P.; Li, X.; Mao, K.; Lu, F.; Ning, G.; Fang, L.; Pan, Q. A novel data augmentation method to enhance deep neural networks for detection of atrial fibrillation. Biomed. Signal Process. Control 2020, 56, 101675. [Google Scholar] [CrossRef]
- Delaney, A.M.; Brophy, E.; Ward, T.E. Synthesis of Realistic ECG using Generative Adversarial Networks. arXiv 2019, arXiv:1909.09150. [Google Scholar]
- Krell, M.M.; Seeland, A.; Kim, S.K. Data Augmentation for Brain-Computer Interfaces: Analysis on Event-Related Potentials Data. arXiv 2018, arXiv:1801.02730. [Google Scholar]
- Shen, J.; Dudley, J.J.; Kristensson, P.O. The Imaginative Generative Adversarial Network: Automatic Data Augmentation for Dynamic Skeleton-Based Hand Gesture and Human Action Recognition. In Proceedings of the 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), Jodhpur, India, 15–18 December 2021. [Google Scholar]
- Ramachandra, S.; Hölzemann, A.; Laerhoven, K.V. Transformer Networks for Data Augmentation of Human Physical Activity Recognition. arXiv 2021, arXiv:2109.01081. [Google Scholar]
- Song, Z.; Yuan, Z.; Zhang, C.; Chi, W.; Ling, Y.; Zhang, S. Learning End-to-End Action Interaction by Paired-Embedding Data Augmentation. In Computer Vision–ACCV 2020; Ishikawa, H., Liu, C.L., Pajdla, T., Shi, J., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 191–206. [Google Scholar]
- Hoelzemann, A.; Sorathiya, N.; Van Laerhoven, K. Data Augmentation Strategies for Human Activity Data Using Generative Adversarial Neural Networks. In Proceedings of the 2021 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), Kassel, Germany, 22–26 March 2021; pp. 8–13. [Google Scholar] [CrossRef]
- Sidor, K.; Wysocki, M. Recognition of Human Activities Using Depth Maps and the Viewpoint Feature Histogram Descriptor. Sensors 2020, 20, 2940. [Google Scholar] [CrossRef] [PubMed]
- Rusu, R.B.; Bradski, G.; Thibaux, R.; Hsu, J. Fast 3D recognition and pose using the Viewpoint Feature Histogram. In Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, Taiwan, 18–22 October 2010; pp. 2155–2162. [Google Scholar]
- Pazhoumand-Dar, H.; Lam, C.P.; Masek, M. Joint movement similarities for robust 3D action recognition using skeletal data. J. Vis. Commun. Image Represent. 2015, 30, 10–21. [Google Scholar] [CrossRef]
- Lillo, I.; Niebles, J.C.; Soto, A. Sparse composition of body poses and atomic actions for human activity recognition in RGB-D videos. Image Vis. Comput. 2017, 59, 63–75. [Google Scholar] [CrossRef]
- Shahroudy, A.; Ng, T.T.; Yang, Q.; Wang, G. Multimodal Multipart Learning for Action Recognition in Depth Videos. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 2123–2129. [Google Scholar] [CrossRef] [Green Version]
- Raman, N.; Maybank, S. Activity Recognition using a supervised non-parametric Hierarchical HMM. Neurocomputing 2016, 199, 163–177. [Google Scholar] [CrossRef] [Green Version]
- Farnoosh, A.; Wang, Z.; Zhu, S.; Ostadabbas, S. A Bayesian Dynamical Approach for Human Action Recognition. Sensors 2021, 21, 5613. [Google Scholar] [CrossRef]
- Wang, H.; Yu, B.; Xia, K.; Li, J.; Zuo, X. Skeleton edge motion networks for human action recognition. Neurocomputing 2021, 423, 1–12. [Google Scholar] [CrossRef]
- Plizzari, C.; Cannici, M.; Matteucci, M. Skeleton-based action recognition via spatial and temporal transformer networks. Comput. Vis. Image Underst. 2021, 208–209, 103219. [Google Scholar] [CrossRef]
- Donahue, J.; Hendricks, L.A.; Rohrbach, M.; Venugopalan, S.; Guadarrama, S.; Saenko, K.; Darrell, T. Long-Term Recurrent Convolutional Networks for Visual Recognition and Description. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 677–691. [Google Scholar] [CrossRef] [PubMed]
- Kruskal, J.B. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 1964, 29, 1–27. [Google Scholar] [CrossRef]
- Warchoł, D.; Kapuściński, T. Human Action Recognition Using Bone Pair Descriptor and Distance Descriptor. Symmetry 2020, 12, 1580. [Google Scholar] [CrossRef]
- Kapuściński, T.; Warchoł, D. Hand Posture Recognition Using Skeletal Data and Distance Descriptor. Appl. Sci. 2020, 10, 2132. [Google Scholar] [CrossRef] [Green Version]
- Rusu, R.B.; Marton, Z.C.; Blodow, N.; Beetz, M. Learning informative point classes for the acquisition of object model maps. In Proceedings of the 2008 10th International Conference on Control, Automation, Robotics and Vision, Hanoi, Vietnam, 2–5 December 2018; pp. 643–650. [Google Scholar]
- Spivak, M. A Comprehensive Introduction to Differential Geometry, 3rd ed.; Publish or Perish: Houston, TX, USA, 1999; Volume 3. [Google Scholar]
- Li, W.; Zhang, Z.; Liu, Z. Action recognition based on a bag of 3D points. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, San Francisco, CA, USA, 13–18 June 2010; pp. 9–14. [Google Scholar]
- Chen, C.; Jafari, R.; Kehtarnavaz, N. UTD-MHAD: A Multimodal Dataset for Human Action Recognition Utilizing a Depth Camera and a Wearable Inertial Sensor. In Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada, 27–30 September 2015; pp. 168–172. [Google Scholar] [CrossRef]
- Xia, L.; Chen, C.C.; Aggarwal, J.K. View invariant human action recognition using histograms of 3D joints. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 16–21 June 2012; pp. 20–27. [Google Scholar]
- Seidenari, L.; Varano, V.; Berretti, S.; Del Bimbo, A.; Pala, P. Recognizing Actions from Depth Cameras as Weakly Aligned Multi-part Bag-of-Poses. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Portland, OR, USA, 23–28 June 2013; pp. 479–485. [Google Scholar]
- Hu, J.F.; Zheng, W.S.; Lai, J.; Zhang, J. Jointly Learning Heterogeneous Features for RGB-D Activity Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Gaglio, S.; Re, G.L.; Morana, M. Human Activity Recognition Process Using 3-D Posture Data. IEEE Trans. Hum.-Mach. Syst. 2015, 45, 586–597. [Google Scholar] [CrossRef]
- MSRA Dataset. Available online: https://sites.google.com/view/wanqingli/data-sets/msr-action3d (accessed on 11 April 2022).
- Mei, J.; Liu, M.; Wang, Y.F.; Gao, H. Learning a Mahalanobis Distance-Based Dynamic Time Warping Measure for Multivariate Time Series Classification. IEEE Trans. Cybern. 2016, 46, 1363–1374. [Google Scholar] [CrossRef]
- Øyvind Mikalsen, K.; Bianchi, F.M.; Soguero-Ruiz, C.; Jenssen, R. Time series cluster kernel for learning similarities between multivariate time series with missing data. Pattern Recognit. 2018, 76, 569–581. [Google Scholar] [CrossRef] [Green Version]
- Radford, A.; Metz, L.; Chintala, S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv 2016, arXiv:1511.06434. [Google Scholar]
- Matlab Scripts for Distance Descriptor and Bone Pair Descriptor. Available online: http://vision.kia.prz.edu.pl (accessed on 1 January 2022).
Name | Classes | Subjects | Sequences (Actions) | Time Series Length | Input Examples | Augmented Examples | Validation Protocol |
---|---|---|---|---|---|---|---|
MSRA I | 8 | 10 | 224 | 13–76 | 118 | 611 | 50-50 validation |
MSRA II | 8 | 10 | 207 | 15–100 | 118 | 573 | 50-50 validation |
MSRA III | 8 | 10 | 225 | 13–71 | 113 | 438 | 50-50 validation |
UTD-MHAD | 27 | 8 | 861 | 41–125 | 431 | 1163 | 50-50 validation |
UTK | 10 | 10 | 199 | 5–110 | 179 | 744 | 10-fold cross-validation |
FLORENCE | 9 | 10 | 215 | 8–35 | 194 | 1109 | 10-fold cross-validation |
SYSU | 12 | 40 | 480 | 58–638 | 240 | 1087 | 50-50 validation |
KARD | 18 | 10 | 540 | 42–310 | 270 | 685 | 50-50 validation |
MSRA, UTD-MHAD, UTK, SYSU | FLORENCE | KARD |
---|---|---|
Hand L. | Wrist L. | Hand L. |
Hand R. | Wrist R. | Hand R. |
Shoulder L. | Shoulder L. | Shoulder L. |
Shoulder R. | Shoulder R. | Shoulder R. |
Head | Head | Head |
Spine | Spine | Spine |
Hip L. | Hip L. | Hip L. |
Hip R. | Hip R. | Hip R. |
Ankle L. | Ankle L. | Foot L. |
Ankle R. | Ankle R. | Foot R. |
MSRA, UTD-MHAD, UTK, SYSU | FLORENCE | KARD |
---|---|---|
Spine–Head (central) | Spine–Head (central) | Spine–Head (central) |
Elbow R.–Wrist R. | Elbow R.–Wrist R. | Elbow R.–Wrist R. |
Wrist R.–Hand R. | Shoulder R.–Elbow R. | Shoulder R.–Elbow R. |
Shoulder R.–Elbow R. | Elbow L.–Wrist L. | Elbow L.–Wrist L. |
Elbow L.–Wrist L. | Shoulder L.–Elbow L. | Shoulder L.–Elbow L. |
Wrist L.–Hand L. | Hip R.–Knee R. | Hip R.–Knee R. |
Shoulder L.–Elbow L. | Knee R.–Ankle R. | Knee R.–Foot R. |
Hip R.–Knee R. | Hip L.–Knee L. | Hip L.–Knee L. |
Knee R.–Ankle R. | Knee L.–Ankle L. | Knee L.–Foot L. |
Ankle R.–Foot R. | ||
Hip L.–Knee L. | ||
Knee L.–Ankle L. | ||
Ankle L.–Foot L. |
Classifier | Parameter Name | Parameter Value |
---|---|---|
DTW | Window size | 5 |
Metric | Euclidean | |
Triplets factor | 20 | |
LDMLT | Alpha factor | 5 |
Number of iterations | 15 | |
Maximum number of mixture components | 5 | |
TCK | Number of randomizations | 50 |
Number of iterations | 20 |
Dataset/Aug. Method | None | WW | WS | SPAWNER | ARSPAWNER |
---|---|---|---|---|---|
DTW | |||||
MSRA I | 71.7 | 70.6 | 74.3 | 74.4 | 76.1 |
MSRA II | 69.0 | 69.7 | 73.1 | 69.3 | 71.7 |
MSRA III | 83.9 | 84.2 | 84.0 | 86.5 | 86.5 |
UTD-MHAD | 86.3 | 86.3 | 83.9 | 86.5 | 86.7 |
UTK | 81.9 | 80.7 | 86.4 | 85.4 | 86.4 |
FLORENCE | 78.6 | 78.4 | 81.7 | 81.5 | 81.8 |
SYSU | 69.2 | 67.2 | 70.8 | 71.2 | 72.5 |
KARD | 89.6 | 90.9 | 91.6 | 88.0 | 89.7 |
LDMLT | |||||
MSRA I | 75.5 | 80.6 | 82.6 | 86.2 | 86.5 |
MSRA II | 78.8 | 77.3 | 73.2 | 80.6 | 83.2 |
MSRA III | 90.2 | 88.8 | 88.6 | 89.4 | 89.6 |
UTD-MHAD | 92.1 | 90.4 | 84.4 | 92.4 | 89.2 |
UTK | 91.5 | 92 | 91.9 | 95.4 | 95.7 |
FLORENCE | 86.0 | 84.7 | 84.7 | 88.5 | 87.4 |
SYSU | 68.8 | 61.4 | 64.4 | 70.9 | 70.5 |
KARD | 95.9 | 96.4 | 94.0 | 97.0 | 97.6 |
TCK | |||||
MSRA I | 55.8 | 62.8 | 62.1 | 65.7 | 66.5 |
MSRA II | 54.9 | 58.0 | 58.5 | 54.9 | 58.1 |
MSRA III | 75.7 | 79.3 | 77.1 | 81.7 | 81.4 |
UTD-MHAD | 62.0 | 56.6 | 57.7 | 61.5 | 60.3 |
UTK | 92.6 | 93.3 | 93.7 | 93.2 | 93.3 |
FLORENCE | 78.0 | 79.7 | 79.4 | 81.6 | 80.4 |
SYSU | 62.7 | 62.8 | 62.3 | 66.5 | 66.2 |
KARD | 85.5 | 88.0 | 88.3 | 88.9 | 85.2 |
Overall results | |||||
Average rank | 3.88 | 3.65 | 3.38 | 2.21 | 1.90 |
Geometric average rank | 3.6 | 3.53 | 2.95 | 1.93 | 1.68 |
Count best | 2 | 0 | 5 | 8 | 11 |
Average accuracy | 78.2 | 78.3 | 78.7 | 80.7 | 80.9 |
Dataset | None | CGAN | ARSPAWNER |
---|---|---|---|
MSRA I | 0.7075 | 0.7453 | 0.8118 |
MSRA II | 0.6283 | 0.5487 | 0.6994 |
MSRA III | 0.8125 | 0.6964 | 0.8393 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Warchoł, D.; Oszust, M. Augmentation of Human Action Datasets with Suboptimal Warping and Representative Data Samples. Sensors 2022, 22, 2947. https://doi.org/10.3390/s22082947
Warchoł D, Oszust M. Augmentation of Human Action Datasets with Suboptimal Warping and Representative Data Samples. Sensors. 2022; 22(8):2947. https://doi.org/10.3390/s22082947
Chicago/Turabian StyleWarchoł, Dawid, and Mariusz Oszust. 2022. "Augmentation of Human Action Datasets with Suboptimal Warping and Representative Data Samples" Sensors 22, no. 8: 2947. https://doi.org/10.3390/s22082947
APA StyleWarchoł, D., & Oszust, M. (2022). Augmentation of Human Action Datasets with Suboptimal Warping and Representative Data Samples. Sensors, 22(8), 2947. https://doi.org/10.3390/s22082947