Dynamic Pose Estimation Using Multiple RGB-D Cameras
<p>Overview of the proposed system: Human pose estimation using multiple red, green, blue, and depth (RGB-D) cameras.</p> "> Figure 2
<p>Accumulative geodesic distances between the center of body and key body parts such head, hands, and feet. Here, the poses are colored white if the distances are closer to the center of the body and red if they are closer to key body parts.</p> "> Figure 3
<p>Results of each step in the single camera process: (<b>a</b>) a filtered human object (foreground), (<b>b</b>) a quadtree-based decomposition of color and depth images, (<b>c</b>) accumulative geodesic end points (candidate extreme points), and (<b>d</b>) selected extreme points on the head (H) (yellow), right hand (RH) (white), left hand (LH) (green), right foot (RF) (red), and left foot (LF) (blue) in searched regions.</p> "> Figure 4
<p>Data unification for a multi-camera process: (<b>a</b>) a calibration pose for the coordinate unification and (<b>b</b>) two depth images in the same coordinate system (oriented toward a viewer).</p> "> Figure 5
<p>Comparison of foot trajectories: Initially tracked (green) and Kalman filtered (white). Here, the tracking noises cause sudden positional changes (the white circle) while the tracking failure skips the foot positions (the red circle), which are corrected in the filtered trajectory.</p> "> Figure 6
<p>A set of extreme points on the body parts tracked and the body orientation represented by a normal vector (cyan).</p> "> Figure 7
<p>System setup for action recognition and motion synthesis: Two RGB-D cameras with displays, a smart sandbag to detect user kicks and punches, and the Xsens system (a wearable suit) used for accuracy comparison.</p> "> Figure 8
<p>Recognition of various Taekwondo actions: Input RGB (<b>left</b>) and output depth data with tracked extreme points (<b>right</b>).</p> "> Figure 9
<p>Key poses used for motion synthesis: front kick (<b>left</b>), round kick (<b>middle</b>), and front punch (<b>right</b>).</p> "> Figure 10
<p>Motion synthesis from a set of key poses and input parameters: Front kick (<b>top</b>), round kick (<b>middle</b>), and front punch (<b>bottom</b>).</p> ">
Abstract
:1. Introduction
2. Related Work
3. Single Camera Process
3.1. Background Subtraction
3.2. Graph Construction
Algorithm 1 Graph construction: Here, and are threshold values to split a node into four children. |
|
3.3. Body Parts Detection
- (1)
- Set as a start vertex, , and search G.
- (2)
- Save the accumulative geodesic distances of (1) to .
- (3)
- Set to the longest accumulative geodesic end point of .
- (4)
- Update to .
- (1)
- Set as a start vertex and partially search G such that is nearer to than to .
- (2)
- Update to using the result of (1).
- (3)
- Set to the longest accumulative geodesic end point of .
- (4)
- Update to .
4. Multi-Camera Process
4.1. Data Unification
4.2. Body Parts Tracking
4.3. Noise Removal and Failure Recovery
4.4. Body Orientation Estimation
5. Experimental Results
5.1. Tracking Accuracy
5.2. Action Recognition
5.3. Motion Synthesis
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
Abbreviations
RGB-D | Red, green, blue, and depth |
HCI | Human computer interaction |
VR | Virtual reality |
AR | Augmented reality |
AdaBoost | Adaptive boosting |
SVM | Support vector machine |
GMM | Gaussian mixture model |
ICP | Iterative closest point |
CNN | Convolutional neural network |
H | Head |
RH | Right hand |
LH | Left hand |
RF | Right foot |
LF | Left foot |
SVD | Single value decomposition |
PCA | Principal component analysis |
FK | Front kick |
SK | Side kick |
RK | Round kick |
FP | Front punch |
WP | Forward punch |
FPOS | Front punch opposite side |
References
- Patrizio, A. Japan Airlines Employs Microsoft HoloLens for Inspections and Training. 2016. Available online: https://www.networkworld.com/article/3098505/software/japan-airlines-employs-microsoft-hololens-for-inspections-and-training.html (accessed on 12 February 2018).
- Microsoft Kinect. Available online: https://developer.microsoft.com/en-us/windows/kinect (accessed on 15 March 2018).
- Intel RealSense. Available online: https://www.intel.com/content/www/us/en/architecture-and-technology/realsense-overview.html (accessed on 31 August 2018).
- Shotton, J.; Fitzgibbon, A.; Cook, M.; Sharp, T.; Finocchio, M.; Moore, R.; Kipman, A.; Blake, A. Real-time human pose recognition in parts from single depth images. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 20–25 June 2011; pp. 1297–1304. [Google Scholar]
- Zhang, L.; Sturm, J.; Cremers, D.; Lee, D. Real-time human motion tracking using multiple depth cameras. In Proceedings of the International Conference on Intelligent Robots and Systems, Vilamoura, Portugal, 7–12 October 2012; pp. 2389–2395. [Google Scholar]
- Kaenchan, S.; Mongkolnam, P.; Watanapa, B.; Sathienpong, S. Automatic multiple kinect cameras setting for simple walking posture analysis. In Proceedings of the International Computer Science and Engineering Conference, Bangkok, Thailand, 4–6 September 2013; pp. 245–249. [Google Scholar]
- Kitsikidis, A.; Dimitropoulos, K.; Douka, S.; Grammalidis, N. Dance analysis using multiple kinect sensors. In Proceedings of the International Conference on Computer Vision Theory and Applications, Lisbon, Portugal, 5–8 January 2014; pp. 789–795. [Google Scholar]
- Michel, D.; Panagiotakis, C.; Argyros, A.A. Tracking the articulated motion of the human body with two RGBD cameras. Mach. Vis. Appl. 2015, 26, 41–54. [Google Scholar] [CrossRef]
- Moon, S.; Park, Y.; Ko, D.W.; Suh, I.H. Multiple kinect sensor fusion for human skeleton tracking using kalman filtering. Int. J. Adv. Robot. Syst. 2016, 13, 1–10. [Google Scholar] [CrossRef]
- Kim, Y.; Baek, S.; Bae, B.-C. Motion capture of the human body using multiple depth sensors. ETRI J. 2017, 39, 181–190. [Google Scholar] [CrossRef]
- Kim, Y. Dance motion capture and composition using multiple RGB and depth sensors. Int. J. Distrib. Sens. Netw. 2017, 13, 1–11. [Google Scholar] [CrossRef]
- Grest, D.; Woetzel, J.; Koch, R. Nonlinear body pose estimation from depth images. In Joint Pattern Recognition Symposium; Springer: Berlin/Heidelberg, Germany, 2005; pp. 285–292. [Google Scholar]
- Zhu, Y.; Dariush, B.; Fujimura, K. Controlled human pose estimation from depth image streams. In Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar]
- Ganapathi, V.; Plagemann, C.; Koller, D.; Thrun, S. Real-time human pose tracking from range data. In Proceedings of the European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; pp. 738–751. [Google Scholar]
- Shuai, L.; Li, C.; Guo, X.; Prabhakaran, B.; Chai, J. Motion capture with ellipsoidal skeleton using multiple depth cameras. Trans. Vis. Comput. Gr. 2017, 23, 1085–1098. [Google Scholar] [CrossRef] [PubMed]
- Girshick, R.; Shotton, J.; Kohli, P.; Criminisi, A.; Fitzgibbon, A. Efficient regression of general-activity human poses from depth images. In Proceedings of the International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 415–422. [Google Scholar]
- Shen, W.; Deng, K.; Bai, X.; Leyvand, T.; Guo, B.; Tu, Z. Exemplar-based human action pose correction and tagging. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 1784–1791. [Google Scholar]
- Jung, H.Y.; Lee, S.; Heo, Y.S.; Yun, I.D. Random tree walk toward instantaneous 3D human pose estimation. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015; pp. 2467–2474. [Google Scholar]
- Shafaei, A.; Little, J.J. Real-time Human motion capture with multiple depth cameras. In Proceedings of the Conference on Computer and Robot Vision, Victoria, BC, Canada, 1–3 June 2016; pp. 24–31. [Google Scholar]
- Ganapathi, V.; Plagemann, C.; Koller, D.; Thrun, S. Real time motion capture using a single time-of-flight camera. In Proceedings of the Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 755–762. [Google Scholar]
- Ye, M.; Wang, X.; Yang, R.; Ren, L.; Pollefeys, M. Accurate 3d pose estimation from a single depth image. In Proceedings of the International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 731–738. [Google Scholar]
- Baak, A.; Muller, M.; Bharaj, G.; Seidel, H.-P.; Christian, T. A data-driven approach for real-time full body pose reconstruction from a depth camera. In Proceedings of the International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 1092–1099. [Google Scholar]
- Helten, T.; Baak, A.; Bharaj, G.; Muller, M.; Seidel, H.-P.; Theobalt, C. Personalization and evaluation of a real-time depth-based full body tracker. In Proceedings of the International Conference on 3D Vision, Verona, Italy, 5–8 September 2013; pp. 279–286. [Google Scholar]
- Greff, K.; Brandao, A.; Kraub, S.; Stricker, D.; Clua, E. A comparison between background subtraction algorithms using a consumer depth camera. In Proceedings of the International Conference on Computer Vision Theory and Applications, Rome, Italy, 24–26 February 2012; pp. 431–436. [Google Scholar]
- Hwang, S.; Uh, Y.; Ki, M.; Lim, K.; Park, D.; Byun, H. Real-time background subtraction based on GPGPU for high-resolution video surveillance. In Proceedings of the International Conference on Ubiquitous Information Management and Communication, Beppu, Japan, 5–7 January 2017; pp. 109:1–109:6. [Google Scholar]
- Plagemann, C.; Ganapathi, V.; Koller, D.; Thrun, S. Real-time identification and localization of body parts from depth images. In Proceedings of the International Conference on Robotics and Automation, Anchorage, Alaska, 4–8 May 2010; pp. 3108–3113. [Google Scholar]
- Mohsin, N.; Payandeh, S. Localization and identification of body extremities based on data from multiple depth sensors. In Proceedings of the International Conference on Systems, Man, and Cybernetics, Banff, AB, Canada, 5–8 October 2017; pp. 2736–2741. [Google Scholar]
- Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D.; Ramanan, D. Object detection with discriminatively trained part based models. Trans. Pattern Anal. Mach. Intell. 2010, 32, 1627–1645. [Google Scholar] [CrossRef] [PubMed]
- Jiang, G.; Cheng, J.; Pang, J.; Guo, Y. Realtime hand detection based on multi-stage HOG-SVM classifier. In Proceedings of the International Conference on Image Processing, Melbourne, Australia, 15–18 September 2013; pp. 4108–4111. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–25 June 2005; pp. 886–893. [Google Scholar]
- Besl, P.J.; McKay, N.D. A method for registration of 3-D shapes. Trans. Pattern Anal. Mach. Intell. 1992, 14, 239–256. [Google Scholar] [CrossRef]
- Xsens Technologies. Available online: https://xsens.com (accessed on 15 March 2018).
- Sloan, P.P.; Rose, C.F.; Cohen, M.F. Shape by Example. In Proceedings of the International Conference on Symposium on Interactive 3D Graphics, Chapel Hill, NC, USA, 26–29 March 2001; pp. 135–144. [Google Scholar]
Action Type | Accuracy (%) | |||||
---|---|---|---|---|---|---|
H | RH | LH | RF | LF | ||
Front Kick (FK) | Left | 92.1 | 88.2 | 87.8 | 86.4 | 87.8 |
Right | 91.8 | 87.8 | 88.1 | 87.1 | 86.9 | |
Side Kick (SK) | Left | 90.2 | 87.9 | 88.0 | 87.8 | 88.3 |
Right | 90.8 | 87.7 | 87.9 | 89.1 | 87.1 | |
Round Kick (RK) | Left | 91.8 | 88.1 | 88.2 | 87.2 | 89.2 |
Right | 91.5 | 88.3 | 88.1 | 88.9 | 87.9 | |
Front Punch (FP) | Left | 94.2 | 81.8 | 84.1 | 85.6 | 85.8 |
Right | 94.1 | 83.8 | 82.7 | 86.1 | 86.2 | |
Forward Punch (WP) | Left | 93.8 | 80.9 | 82.8 | 86.2 | 86.5 |
Right | 94.0 | 83.1 | 80.5 | 86.1 | 86.4 | |
Front Punch Opposite Side (FPOS) | Left | 93.8 | 80.2 | 82.7 | 85.8 | 86.0 |
Right | 93.1 | 83.8 | 80.9 | 85.4 | 85.9 | |
Average | 92.6 | 85.1 | 85.2 | 86.8 | 87.0 |
System Type | Average Accuracy (%) | ||||
---|---|---|---|---|---|
H | RH | LH | RF | LF | |
Ours (Two Kinects) | 92.6 | 85.1 | 85.2 | 86.8 | 87.0 |
Multi-Kinects (Two Kinects) | 91.2 | 78.9 | 79.2 | 76.5 | 77.1 |
Multi-Kinects (Four Kinects) | 91.3 | 80.3 | 80.8 | 79.8 | 80.0 |
Action Type | Training Samples | Test Samples | Total Frames | Average Frames () | |
---|---|---|---|---|---|
FK | Left | 123 | 34 | 4160 | 26.50 (±3.37) |
Right | 120 | 34 | 4488 | 29.14 (±6.84) | |
SK | Left | 123 | 30 | 4886 | 31.93 (±5.48) |
Right | 123 | 30 | 5305 | 34.67 (±4.41) | |
RK | Left | 124 | 37 | 3470 | 21.55 (±3.68) |
Right | 125 | 35 | 4403 | 27.52 (±5.82) | |
FP | Left | 120 | 33 | 4758 | 31.10 (±6.22) |
Right | 120 | 32 | 7617 | 32.16 (±4.45) | |
WP | Left | 124 | 30 | 6243 | 40.54 (±4.33) |
Right | 121 | 32 | 6380 | 41.01 (±4.14) | |
FPOS | Left | 121 | 31 | 7617 | 50.11 (±7.20) |
Right | 121 | 34 | 8598 | 55.47 (±5.79) | |
Total | 1465 | 392 | 67,925 |
Action Type | FK | SK | RK | FP | WP | FPOS | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Left | Right | Left | Right | Left | Right | Left | Right | Left | Right | Left | Right | ||
FK | Left | 0.97 | 0 | 0.03 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Right | 0 | 0.94 | 0 | 0.06 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
SK | Left | 0.03 | 0.03 | 0.94 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Right | 0 | 0.03 | 0 | 0.97 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
RK | Left | 0 | 0 | 0 | 0 | 1.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Right | 0.03 | 0 | 0 | 0 | 0 | 0.97 | 0 | 0 | 0 | 0 | 0 | 0 | |
FP | Left | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 | 0 | 0 | 0 | 0 | 0 |
Right | 0 | 0 | 0 | 0 | 0 | 0 | 0.09 | 0.91 | 0 | 0 | 0 | 0 | |
WP | Left | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.97 | 0 | 0 | 0.03 |
Right | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.97 | 0.03 | 0 | |
FPOS | Left | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 | 0 |
Right | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hong, S.; Kim, Y. Dynamic Pose Estimation Using Multiple RGB-D Cameras. Sensors 2018, 18, 3865. https://doi.org/10.3390/s18113865
Hong S, Kim Y. Dynamic Pose Estimation Using Multiple RGB-D Cameras. Sensors. 2018; 18(11):3865. https://doi.org/10.3390/s18113865
Chicago/Turabian StyleHong, Sungjin, and Yejin Kim. 2018. "Dynamic Pose Estimation Using Multiple RGB-D Cameras" Sensors 18, no. 11: 3865. https://doi.org/10.3390/s18113865
APA StyleHong, S., & Kim, Y. (2018). Dynamic Pose Estimation Using Multiple RGB-D Cameras. Sensors, 18(11), 3865. https://doi.org/10.3390/s18113865