Abstract
Temporal prediction of human pose sequence is vital for robot applications such as human-robot interaction and autonomous control of a robot. Recent methods are based on a 3D human skeleton sequence to predict future skeletons. Even if starting motions of two human skeleton sequences are very similar, their future motions may be different because of the surrounding objects of the human; it is difficult to predict the future skeleton sequences only from a given human skeleton sequence. However, don’t you think the presence of surrounding objects is an important clue for the prediction? This paper proposes a method of predicting future skeleton sequences by incorporating the surrounding information into the skeleton sequence. We assume that the surrounding condition around a target person does not change significantly within a few seconds and use an image feature around the target person as the surrounding information. Through evaluations on a public dataset, performance improvement is confirmed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adeli, V., Adeli, E., Reid, I., Niebles, J.C., Rezatofighi, H.: Socially and contextually aware human motion and pose forecasting. IEEE Robot. Autom. Lett. 5(4), 6033–6040 (2020). https://doi.org/10.1109/LRA.2020.3010742
Chao, Y.W., Yang, J., Price, B., Cohen, S., Deng, J.: Forecasting human dynamics from static images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 548–556 (Jul 2017). https://doi.org/10.1109/CVPR.2017.388
Corona, E., Pumarola, A., Alenya, G., Moreno-Noguer, F.: Context-aware human motion prediction. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6990–6999 (Jun 2020). https://doi.org/10.1109/CVPR42600.2020.00702
Dang, L., Nie, Y., Long, C., Zhang, Q., Li, G.: MSR-GCN: multi-scale residual graph convolution networks for human motion prediction. In: Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, pp. 11447–11456 (Oct 2021). https://doi.org/10.1109/ICCV48922.2021.01127
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (Jun 2009). https://doi.org/10.1109/CVPR.2009.5206848
Li, M., Chen, S., Zhao, Y., Zhang, Y., Wang, Y., Tian, Q.: Dynamic multiscale graph neural networks for 3D skeleton based human motion prediction. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 211–220 (Jun 2020). https://doi.org/10.1109/CVPR42600.2020.00029
Liu, J., Shahroudy, A., Perez, M., Wang, G., Duan, L.Y., Kot, A.C.: NTU RGB+D 120: a large-scale benchmark for 3D human activity understanding. IEEE Trans. Pattern Anal. Mach. Intell. 42(10), 2684–2701 (2020). https://doi.org/10.1109/TPAMI.2019.2916873
Mao, W., Liu, M., Salzmann, M., Li, H.: Learning trajectory dependencies for human motion prediction. In: Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, pp. 9488–9496 (Oct 2019). https://doi.org/10.1109/ICCV.2019.00958
Sofianos, T., Sampieri, A., Franco, L., Galasso, F.: Space-time-separable graph convolutional network for pose forecasting. In: Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, pp. 11189–11198 (Oct 2021). https://doi.org/10.1109/ICCV48922.2021.01102
Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 6105–6114 (Jun 2019)
Tang, Y., Ma, L., Liu, W., Zheng, W.S.: Long-term human motion prediction by modeling motion context and enhancing motion dynamics. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pp. 935–941 (Jul 2018). https://doi.org/10.24963/ijcai.2018/130
Taylor, G.W., Hinton, G.E., Roweis, S.T.: Modeling human motion using binary latent variables. In: Advances in Neural Information Processing Systems, vol. 19 (Sep 2007). https://doi.org/10.7551/mitpress/7503.003.0173
Wang, B., Adeli, E., Chiu, H.K., Huang, D.A., Niebles, J.C.: Imitation learning for human pose prediction. In: Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, pp. 7123–7132 (Oct 2019). https://doi.org/10.1109/ICCV.2019.00722
Wang, J., Hertzmann, A., Blei, D.M.: Gaussian process dynamical models. In: Advances in neural information processing systems, vol. 18 (May 2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Fujita, T., Kawanishi, Y. (2023). Toward Surroundings-Aware Temporal Prediction of 3D Human Skeleton Sequence. In: Rousseau, JJ., Kapralos, B. (eds) Pattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges. ICPR 2022. Lecture Notes in Computer Science, vol 13643. Springer, Cham. https://doi.org/10.1007/978-3-031-37660-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-37660-3_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37659-7
Online ISBN: 978-3-031-37660-3
eBook Packages: Computer ScienceComputer Science (R0)