Object Pose Estimation Using Edge Images Synthesized from Shape Information
<p>Overview of our method. Firstly, an edge image is randomly created by using the 3D model. The CNN model is trained with the generated dataset. At the inference stage, an RGB-based edge image is generated from an RGB image. Finally, the RGB-based edge image is input to pre-trained CNN and the pose is estimated.</p> "> Figure 2
<p>Process flow of edge image generation. Upper: CAD-based edge image. First, all ridgelines are projected. Then, hidden lines are removed. Finally, smoothing is adapted to each edge image. Middle: CG-based edge image. The 3D model is rendered by a renderer with certain environmental conditions. Then, line segments are detected and smoothing is adapted to each edge image. Lower: RGB-based edge image. The line segments are detected from RGB images. If a mask image is used, line segments are drawn onto the mask image.</p> "> Figure 3
<p>Examples of edge images created by simulation environment.</p> "> Figure 4
<p>Physical dataset. (<b>a</b>) An example of the image. The 3D-printed object is placed on the AR marker. (<b>b</b>) The distribution of camera position. An object is placed on the origin of the coordinate.</p> "> Figure 5
<p>Examples of edge images created by physical environment. These images are width 2.</p> "> Figure 6
<p>CAD-based edge images used at E1 and E2. <span class="html-italic">Width</span> means the width of the edge line (1 or 2) and <span class="html-italic">filter</span> means the size of the smoothing filter. Each condition has an <span class="html-italic">and mask</span> option mentioned in <a href="#sensors-22-09610-f002" class="html-fig">Figure 2</a>. The previous method uses a silhouette image (S0). S1 to S12 denote the indices of the conditions proposed by us.</p> "> Figure 7
<p>Examples of pose estimation results with the physical environment. The S0 row represents the result of the baseline method. Bounding boxes (BB) surrounding the object are calculated with pose information. Green BB is the ground truth pose and Blue BB is the estimated pose. Note that only the <span class="html-italic">edge and mask</span> result is represented because learning was not converged at several <span class="html-italic">only edge</span> conditions.</p> "> Figure 8
<p>An example of the difference between RGB-based and CAD-based images.</p> "> Figure 9
<p>The example of a learning curve with fine-tuning (E7, S7). A vertical axis shows the loss value and a horizontal axis shows the number of epoch.</p> ">
Abstract
:1. Introduction
2. Related Work
3. Methods
3.1. Outline
3.2. Edge Image Generation
3.2.1. CAD-Based Edge Image
3.2.2. CG-Based Edge Image
3.3. Training
3.4. Inference
4. Evaluation
4.1. Dataset
4.1.1. 3D Model
4.1.2. Simulation Environment
- Translation (original) avg.: [m], s.d.: ;
- Translation (re-gen.) avg.: [m], s.d.: ;
- Rotation (original) avg.: , s.d.: (the values were divided by norm);
- Rotation (re-gen.) the same as original.
4.1.3. Physical Environment
4.2. Methodology
4.2.1. Kinds of Experiments
4.2.2. Parameter of Each Experiment
Edge Image Generation
CNN Architecture
Evaluation Metrics
4.3. Result
4.4. Discussion
4.4.1. Simulation Environment
4.4.2. Physical Environment
4.4.3. Fine-Tuning
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
DoF | Degrees of Freedom |
CG | Computer Graphics |
CAD | Computer-Aided Design |
RGB | Red, Green and Blue |
AR | Augmented Reality |
DL | Deep Learning |
ToF | Time of Flight |
CNN | Convolutional Neural Network |
STL | Standard Triangulated Language |
LSD | Line Segment Detector |
BB | Bounding Box |
References
- Han, P.; Zhao, G. Line-based Initialization Method for Mobile Augmented Reality in Aircraft Assembly. Vis. Comput. 2017, 33, 1185–1196. [Google Scholar] [CrossRef]
- Konishi, Y.; Hanzawa, Y.; Kawade, M.; Hashimoto, M. Fast 6D Pose Estimation from a Molocular Image Using Hierarchical Pose Trees. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 398–413. [Google Scholar]
- Moteki, A.; Yamaguchi, N.; Karasudani, A.; Kobayashi, Y.; Yoshitake, T.; Kato, J.; Aoyagi, T. Manufacturing Defects Visualization via Robust Edge-Based Registration. In Proceedings of the IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), Munich, Germany, 6–20 October 2018; pp. 172–173. [Google Scholar]
- Drost, B.; Ulrich, M.; Navab, N.; Ilic, S. Model globally, match locally: Efficient and robust 3D object recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010; pp. 998–1005. [Google Scholar]
- Hinterstoisser, S.; Lepetit, V.; Ilic, S.; Holzer, S.; Bradski, G.; Konolige, K.; Navab, N. Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes. In Proceedings of the Asian Conference on Computer Vision (ACCV), Daejeon, Korea, 5–9 November 2012; pp. 548–562. [Google Scholar]
- Brachmann, E.; Michel, F.; Krull, A.; Yang, M.Y.; Gumhold, S.; Rother, C. Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3364–3372. [Google Scholar]
- Wu, J.; Zhou, B.; Russell, R.; Kee, V.; Wagner, S.; Hebert, M.; Torralba, A.; Johnson, D.M. Real-Time Object Pose Estimation with Pose Interpreter Networks. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 6798–6805. [Google Scholar]
- Kehl, W.; Manhardt, F.; Tombari, F.; Ilic, S.; Navab, N. SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1530–1538. [Google Scholar]
- Tekin, B.; Sinha, S.N.; Fua, P. Real-Time Seamless Single Shot 6D Object Pose Prediction. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 292–301. [Google Scholar]
- Sundermeyer, M.; Marton, Z.C.; Durner, M.; Brucker, M.; Triebel, R. Augmented Autoencoders: Implicit 3D Orientation Learning for 6D Object Detection. Int. J. Comput. Vis. 2020, 128, 714–729. [Google Scholar] [CrossRef]
- Song, C.; Song, J.; Huang, Q. HybridPose: 6D Object Pose Estimation under Hybrid Representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 431–440. [Google Scholar]
- Wang, G.; Manhardt, F.; Tombari, F.; Ji, X. GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 16611–16621. [Google Scholar]
- Yang, Z.; Yu, X.; Yang, Y. DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-scale Consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 3906–3915. [Google Scholar]
- Su, Y.; Saleh, M.; Fetzer, T.; Rambach, J.; Navab, N.; Busam, B.; Stricker, D.; Tombari, F. ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose Estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 6728–6738. [Google Scholar]
- Sun, B.; Feng, J.; Saenko, K. Return of Frustratingly Easy Domain Adaptation. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; pp. 2058–2065. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Tobin, J.; Fong, R.; Ray, A.; Schneider, J.; Zaremba, W.; Abbeel, P. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 23–30. [Google Scholar]
- Hoschek, J.; Lasser, D. Fundamentals of Computer Aided Geometric Design; Elsevier Inc.: Amsterdam, The Netherlands, 1974. [Google Scholar]
- Denninger, M.; Sundermeyer, M.; Winkelbauer, D.; Zidan, Y.; Olefir, D.; Elbadrawy, M.; Lodhi, A.; Katam, H. BlenderProc. arXiv 2019, arXiv:1911.01911. [Google Scholar]
- Blender Online Community. Blender—A 3D Modelling and Rendering Package; Blender Foundation, Stichting Blender Foundation: Amsterdam, The Netherlands, 2018. [Google Scholar]
- von Gioi, R.G.; Jakubowicz, J.; Morel, J.M.; Randall, G. LSD: A Fast Line Segment Detector with a False Detection Control. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 722–732. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Rusu, R.B.; Cousins, S. 3D is here: Point Cloud Library (PCL). In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, 9–13 May 2011; pp. 1–4. [Google Scholar]
- Hodan, T.; Haluza, P.; Obdrzalek, S.; Matas, J.; Lourakis, M.; Zabulis, X. T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Santa Rosa, CA, USA, 24–31 March 2017; pp. 880–888. [Google Scholar]
- Garrido-Jurado, S.; Muñoz-Salinas, R.; Madrid-Cuevas, F.; Marín-Jiménez, M. Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recognit. 2014, 47, 2280–2292. [Google Scholar] [CrossRef]
- Zhang, Z. A Flexible New Technique for Camera Calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1330–1334. [Google Scholar] [CrossRef] [Green Version]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the European Conference of Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; pp. 580–587. [Google Scholar]
- Gu, G.; Ko, B.; Go, S.; Lee, S.H.; Lee, J.; Shin, M. Towards Light-weight and Real-time Line Segment Detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 22 February–1 March 2022. [Google Scholar]
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. TensorFlow: A System for Large-scale Machine Learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
Training | Validation | Inference | |
---|---|---|---|
E1 | CAD-based, Simulation | CAD-based, Simulation | |
E2 | CAD-based, Simulation | CAD-based, Physical | |
E3 | CG-based, Simulation | CG-based, Simulation | |
E4 | CG-based, Simulation | CG-based, Physical | |
E5 | CAD-based, Simulation | RGB-based, Physical | |
E6 | CG-based, Simulation | RGB-based, Physical | |
E7 | CAD-based, Simulation | RGB-based, Fine-tuning | |
E8 | CG-based, Simulation | RGB-based, Fine-tuning |
(Translation Error) (cm) ↓ | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Exp. | S0 | S1 | S2 | S3 | S4 | S5 | S6 | S7 | S8 | S9 | S10 | S11 | S12 |
Mask | — | w/ | w/ | w/ | w/ | w/ | w/ | ||||||
Line width | — | 1 | 1 | 1 | 2 | 2 | 2 | 1 | 1 | 1 | 2 | 2 | 2 |
Filter | — | no | 3 × 3 | 5 × 5 | no | 3 × 3 | 5 × 5 | no | 3 × 3 | 5 × 5 | no | 3 × 3 | 5 × 5 |
E1(mean) | 1.70 | 44.48 | N/A | N/A | 12.69 | N/A | 44.07 | 5.45 | 1.13 | 1.73 | 1.31 | 55.93 | 22.31 |
E1(s.d.) | 1.93 | 14.56 | N/A | N/A | 4.54 | N/A | 12.91 | 41.00 | 1.07 | 6.50 | 1.27 | 474.49 | 215.28 |
E2(mean) | 1.56 | 48.88 | N/A | N/A | 13.14 | N/A | 52.40 | 2.06 | 2.39 | 7.25 | 1.42 | 4.20 | 4.09 |
E2(s.d.) | 1.52 | 15.42 | N/A | N/A | 4.36 | N/A | 13.07 | 1.43 | 1.91 | 4.66 | 0.96 | 23.85 | 2.11 |
E3(mean) | 1.70 | 12.71 | N/A | N/A | 8.98 | N/A | N/A | 4.37 | 1.15 | 14.25 | 6.64 | 10.57 | 1.35 |
E3(s.d.) | 1.93 | 4.55 | N/A | N/A | 4.40 | N/A | N/A | 26.33 | 1.19 | 116.09 | 64.51 | 95.98 | 2.60 |
E4(mean) | 1.56 | 12.71 | N/A | N/A | 10.86 | N/A | N/A | 1.00 | 1.20 | 3.31 | 1.24 | 3.00 | 6.29 |
E4(s.d.) | 1.52 | 4.70 | N/A | N/A | 4.23 | N/A | N/A | 1.31 | 0.90 | 3.03 | 1.03 | 2.02 | 3.20 |
E5(mean) | 1.56 | 49.50 | N/A | N/A | 18.11 | N/A | 53.47 | 2.55 | 2.29 | 6.91 | 3.03 | 4.26 | 4.95 |
E5(s.d.) | 1.52 | 15.38 | N/A | N/A | 5.83 | N/A | 13.37 | 2.33 | 2.16 | 4.60 | 2.52 | 15.21 | 2.84 |
E6(mean) | 1.56 | 14.95 | N/A | N/A | 11.85 | N/A | N/A | 6.70 | 7.69 | 8.07 | 4.21 | 6.40 | 9.34 |
E6(s.d.) | 1.52 | 5.28 | N/A | N/A | 4.25 | N/A | N/A | 4.09 | 4.19 | 3.96 | 3.70 | 4.66 | 4.52 |
E7(mean) | 2.56 | 57.80 | N/A | N/A | 26.57 | N/A | 32.95 | 1.64 | 1.47 | 2.66 | 2.03 | 8.15 | 6.37 |
E7(s.d.) | 2.35 | 17.37 | N/A | N/A | 7.45 | N/A | 8.59 | 1.96 | 0.97 | 1.82 | 1.44 | 38.60 | 30.82 |
E8(mean) | 2.56 | 55.90 | N/A | N/A | 44.01 | N/A | N/A | 4.39 | 1.93 | 5.95 | 2.94 | 4.68 | 2.05 |
E8(s.d.) | 2.35 | 16.35 | N/A | N/A | 11.75 | N/A | N/A | 16.75 | 1.29 | 24.04 | 10.62 | 19.97 | 1.38 |
(Rotation Error) (deg) ↓ | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Exp. | S0 | S1 | S2 | S3 | S4 | S5 | S6 | S7 | S8 | S9 | S10 | S11 | S12 |
Mask | — | w/ | w/ | w/ | w/ | w/ | w/ | ||||||
Line width | — | 1 | 1 | 1 | 2 | 2 | 2 | 1 | 1 | 1 | 2 | 2 | 2 |
Filter | — | no | 3 × 3 | 5 × 5 | no | 3 × 3 | 5 × 5 | no | 3 × 3 | 5 × 5 | no | 3 × 3 | 5 × 5 |
E1(mean) | 13.83 | 121.95 | N/A | N/A | 67.71 | N/A | 118.48 | 8.76 | 7.18 | 8.30 | 7.83 | 10.75 | 9.20 |
E1(s.d.) | 21.12 | 36.51 | N/A | N/A | 39.25 | N/A | 39.57 | 20.33 | 9.37 | 14.01 | 10.16 | 23.43 | 18.65 |
E2(mean) | 11.22 | 124.12 | N/A | N/A | 61.60 | N/A | 85.07 | 9.93 | 14.33 | 89.79 | 7.38 | 8.14 | 9.82 |
E2(s.d.) | 19.14 | 36.49 | N/A | N/A | 34.98 | N/A | 43.94 | 11.91 | 19.20 | 54.32 | 5.61 | 8.07 | 7.84 |
E3(mean) | 13.83 | 65.21 | N/A | N/A | 85.73 | N/A | N/A | 9.64 | 7.44 | 10.24 | 9.78 | 8.85 | 7.76 |
E3(s.d.) | 21.12 | 40.40 | N/A | N/A | 43.17 | N/A | N/A | 21.63 | 9.89 | 23.89 | 18.81 | 17.31 | 13.42 |
E4(mean) | 11.22 | 59.28 | N/A | N/A | 88.78 | N/A | N/A | 5.44 | 6.88 | 27.99 | 6.33 | 17.20 | 53.48 |
E4(s.d.) | 19.14 | 42.25 | N/A | N/A | 39.55 | N/A | N/A | 6.78 | 7.32 | 35.78 | 6.74 | 17.84 | 46.56 |
E5(mean) | 11.22 | 135.71 | N/A | N/A | 61.89 | N/A | 85.78 | 16.90 | 17.95 | 84.18 | 16.51 | 15.18 | 20.50 |
E5(s.d.) | 19.14 | 32.25 | N/A | N/A | 37.42 | N/A | 44.01 | 25.57 | 25.13 | 56.21 | 24.03 | 17.42 | 27.90 |
E6(mean) | 11.22 | 89.90 | N/A | N/A | 93.54 | N/A | N/A | 61.57 | 71.43 | 77.36 | 27.30 | 48.64 | 83.38 |
E6(s.d.) | 19.14 | 49.79 | N/A | N/A | 38.95 | N/A | N/A | 54.99 | 44.87 | 44.83 | 37.08 | 48.80 | 54.50 |
E7(mean) | 21.82 | 110.76 | N/A | N/A | 45.47 | N/A | 82.24 | 10.80 | 10.56 | 16.81 | 9.22 | 9.46 | 11.32 |
E7(s.d.) | 31.78 | 33.54 | N/A | N/A | 29.98 | N/A | 31.89 | 16.98 | 10.78 | 20.15 | 6.36 | 9.28 | 17.55 |
E8(mean) | 21.82 | 104.83 | N/A | N/A | 119.93 | N/A | N/A | 16.75 | 12.61 | 17.35 | 13.72 | 13.42 | 14.40 |
E8(s.d.) | 31.78 | 43.51 | N/A | N/A | 30.34 | N/A | N/A | 21.96 | 16.98 | 27.05 | 14.95 | 15.96 | 18.92 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Moteki, A.; Saito, H. Object Pose Estimation Using Edge Images Synthesized from Shape Information. Sensors 2022, 22, 9610. https://doi.org/10.3390/s22249610
Moteki A, Saito H. Object Pose Estimation Using Edge Images Synthesized from Shape Information. Sensors. 2022; 22(24):9610. https://doi.org/10.3390/s22249610
Chicago/Turabian StyleMoteki, Atsunori, and Hideo Saito. 2022. "Object Pose Estimation Using Edge Images Synthesized from Shape Information" Sensors 22, no. 24: 9610. https://doi.org/10.3390/s22249610
APA StyleMoteki, A., & Saito, H. (2022). Object Pose Estimation Using Edge Images Synthesized from Shape Information. Sensors, 22(24), 9610. https://doi.org/10.3390/s22249610