The Real-Time Optimal Attitude Control of Tunnel Boring Machine Based on Reinforcement Learning
<p>TBM tunneling cycle.</p> "> Figure 2
<p>Ratios of each GEC.</p> "> Figure 3
<p>Statistical distributions of the excavation parameters: (<b>a</b>) statistical distribution of parameter TDA, (<b>b</b>) statistical distribution of parameter TFA, (<b>c</b>) statistical distribution of parameter HDTH, (<b>d</b>) statistical distribution of parameter VDTH, (<b>e</b>) statistical distribution of parameter DDSB, (<b>f</b>) statistical distribution of parameter DLTC.</p> "> Figure 4
<p>OACP modeling framework.</p> "> Figure 5
<p>Change curve of GEC predictive model accuracy.</p> "> Figure 6
<p>Predictive performance of TBMAP predictive model of GEC 2: (<b>a</b>) predictive performance of TDA, (<b>b</b>) predictive performance of TFA, (<b>c</b>) predictive performance of HDTH, and (<b>d</b>) predictive performance of VDTH.</p> "> Figure 7
<p>Predictive performance of TBMAP predictive model of GEC 3: (<b>a</b>) predictive performance of TDA, (<b>b</b>) predictive performance of TFA, (<b>c</b>) predictive performance of HDTH, and (<b>d</b>) predictive performance of VDTH.</p> "> Figure 8
<p>Predictive performance of TBMAP predictive model of GEC 4: (<b>a</b>) predictive performance of TDA, (<b>b</b>) predictive performance of TFA, (<b>c</b>) predictive performance of HDTH, and (<b>d</b>) predictive performance of VDTH.</p> "> Figure 9
<p>Predictive performance of TBMAP predictive model of GEC 5: (<b>a</b>) predictive performance of TDA, (<b>b</b>) predictive performance of TFA, (<b>c</b>) predictive performance of HDTH, and (<b>d</b>) predictive performance of VDTH.</p> "> Figure 10
<p>Changes in episode rewards with epochs: (<b>a</b>) episode rewards change in GEC 2, (<b>b</b>) episode rewards change in GEC 3, (<b>c</b>) episode rewards change in GEC 4, and (<b>d</b>) episode rewards change in GEC 5.</p> "> Figure 11
<p>Episode rewards comparison of OACP and manual control: (<b>a</b>) episode rewards comparison in GEC 2, (<b>b</b>) episode rewards comparison in GEC 3, (<b>c</b>) episode rewards comparison in GEC 4, and (<b>d</b>) episode rewards comparison in GEC 5.</p> ">
Abstract
:1. Introduction
2. Data Review
2.1. The Origin Data
2.2. Training Dataset Construction
- (1)
- Data Preprocessing
- (2)
- Data Matching and Combination
2.3. Data Analysis
3. Methodology
3.1. Objective and General Idea
3.2. PPO Algorithm
- (1)
- Policy-Based Framework
- (2)
- TRPO Method
- (3)
- PPO Method
3.3. Base Model
- (1)
- GEC Predictive Model
- (2)
- TBMAP Predictive Model
3.4. OACP Model
4. Case Study
4.1. GEC Predictive Model
4.2. TBMAP Predictive Model
4.3. OACP Model
5. Conclusions
- (1)
- The paper proposes the GEC predictive model to obtain the real-time GEC for attitude control policy during tunneling. The GEC predictive model established using the DNN model was trained using the corresponding data of excavation parameters and GEC from the actual construction engineering. The accuracy of the trained GEC predictive model could reach 94%, and the model took excavation parameters as input, which indicates that the model can recognize the real-time GEC information from the excavation parameters as the input of the attitude control model.
- (2)
- The paper established the TBMAP predictive model for four GECs to be the interactive environment for training the attitude control policies. The TBMAP predictive model established by DNN was trained using the TBMAP parameters and attitude control parameters data of the corresponding GEC from the actual engineering. After training, R2s of different representing parameters prediction of different TBMAP models were computed, which all exceeded 0.85. It can be seen that the TBMAP predictive models have sufficient predictive accuracy and calculation efficiency, which can be used as the interactive environment to train the attitude control policy under the reinforcement learning framework.
- (3)
- For the hysteresis of TBM position response and the overall optimization of the tunneling axis, the paper proposes the optimization framework of attitude control policy based on reinforcement learning. The attitude control policy for each GEC was established by the DNN model and was gradually optimized during the alternating process between the interaction of the policy and the established TBMAP predictive environment and policy training using the PPO algorithm, which can optimal the policy based on the episode deviation. To verify its effectiveness, the obtained OACP was compared with manual control based on practical engineering data. The results revealed that OACP can significantly reduce the accumulated deviation of the tunneling axis from the design tunnel axis by over 80% compared with the manual control. OACP combined with the GEC predictive model can easily provide real-time decision support for attitude control in actual engineering.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Bayati, M.; Hamidi, J.K. A case study on TBM tunnelling in fault zones and lessons learned from ground improvement. Tunn. Undergr. Space Technol. 2017, 63, 162–170. [Google Scholar] [CrossRef]
- Gong, Q.M.; Yin, L.J.; Wu, S.Y.; Zhao, J.; Ting, Y. Rock burst and slabbing failure and its influence on TBM excavation at headrace tunnels in Jinping II hydropower station. Eng. Geol. 2012, 124, 98–108. [Google Scholar] [CrossRef]
- Du, C.; Pan, Y.; Liu, Q.; Huang, X.; Yin, X. Rockburst inoculation process at different structural planes and microseismic warning technology: A case study. Bull. Eng. Geol. Environ. 2022, 81, 499. [Google Scholar] [CrossRef]
- Sun, J.; Wang, S.J. Rock mechanics and rock engineering in China: Developments and current state-of-the-art. Int. J. Rock Mech. Min. Sci. 2000, 37, 447–465. [Google Scholar] [CrossRef]
- Lin, J.; Gao, K.; Gao, Y.; Wang, Z. Combined measurement system for double shield tunnel boring machine guidance based on optical and visual methods. J. Opt. Soc. Am. A-Opt. Image Sci. Vis. 2017, 34, 1810–1816. [Google Scholar] [CrossRef]
- Mao, S.; Shen, X.; Lu, M. Virtual Laser Target Board for Alignment Control and Machine Guidance in Tunnel-Boring Operations. J. Intell. Robot. Syst. 2015, 79, 385–400. [Google Scholar] [CrossRef]
- Pan, G.; Fan, W. Automatic Guidance System for Long-Distance Curved Pipe-Jacking. KSCE J. Civ. Eng. 2020, 24, 2505–2518. [Google Scholar] [CrossRef]
- Shen, X.; Lu, M.; Chen, W. Tunnel-Boring Machine Positioning during Microtunneling Operations through Integrating Automated Data Collection with Real-Time Computing. J. Constr. Eng. Manag. 2011, 137, 72–85. [Google Scholar] [CrossRef]
- Liu, B.; Chen, L.; Li, S.; Song, J.; Xu, X.; Li, M.; Nie, L. Three-Dimensional Seismic Ahead-Prospecting Method and Application in TBM Tunneling. J. Geotech. Geoenvironmental Eng. 2017, 143, 04017090. [Google Scholar] [CrossRef]
- Lee, K.-H.; Park, J.-H.; Park, J.; Lee, I.-M.; Lee, S.-W. Electrical resistivity tomography survey for prediction of anomaly in mechanized tunneling. Geomech. Eng. 2019, 19, 93–104. [Google Scholar] [CrossRef]
- Park, J.; Ryu, J.; Choi, H.; Lee, I.-M. Risky Ground Prediction ahead of Mechanized Tunnel Face using Electrical Methods: Laboratory Tests. KSCE J. Civ. Eng. 2018, 22, 3663–3675. [Google Scholar] [CrossRef]
- Liu, B.; Wang, R.; Zhao, G.; Guo, X.; Wang, Y.; Li, J.; Wang, S. Prediction of rock mass parameters in the TBM tunnel based on BP neural network integrated simulated annealing algorithm. Tunn. Undergr. Space Technol. 2020, 95, 103103. [Google Scholar] [CrossRef]
- Liu, B.; Wang, R.; Guan, Z.; Li, J.; Xu, Z.; Guo, X.; Wang, Y. Improved support vector regression models for predicting rock mass parameters using tunnel boring machine driving data. Tunn. Undergr. Space Technol. 2019, 91, 102958. [Google Scholar] [CrossRef]
- Zhang, Q.; Liu, Z.; Tan, J. Prediction of geological conditions for a tunnel boring machine using big operational data. Autom. Constr. 2019, 100, 73–83. [Google Scholar] [CrossRef]
- Jung, J.-H.; Chung, H.; Kwon, Y.-S.; Lee, I.-M. An ANN to Predict Ground Condition ahead of Tunnel Face using TBM Operational Data. KSCE J. Civ. Eng. 2019, 23, 3200–3206. [Google Scholar] [CrossRef]
- Xiao, H.; Xing, B.; Wang, Y.; Yu, P.; Liu, L.; Cao, R. Prediction of Shield Machine Attitude Based on Various Artificial Intelligence Technologies. Appl. Sci. 2021, 11, 10264. [Google Scholar] [CrossRef]
- Fu, X.; Wu, M.; Ponnarasu, S.; Zhang, L. A hybrid deep learning approach for dynamic attitude and position prediction in tunnel construction considering spatio-temporal patterns. Expert Syst. Appl. 2023, 212, 118721. [Google Scholar] [CrossRef]
- Zhou, C.; Xu, H.; Ding, L.; Wei, L.; Zhou, Y. Dynamic prediction for attitude and position in shield tunneling: A deep learning method. Autom. Constr. 2019, 105, 102840. [Google Scholar] [CrossRef]
- Chen, H.; Li, X.; Feng, Z.; Wang, L.; Qin, Y.; Skibniewski, M.J.; Chen, Z.-S.; Liu, Y. Shield attitude prediction based on Bayesian-LGBM machine learning. Inf. Sci. 2023, 632, 105–129. [Google Scholar] [CrossRef]
- Zhang, Z.; Ma, L. Attitude Correction System and Cooperative Control of Tunnel Boring Machine. Int. J. Pattern Recognit. Artif. Intell. 2018, 32, 1859018. [Google Scholar] [CrossRef]
- Wang, P.; Kong, X.; Guo, Z.; Hu, L. Prediction of Axis Attitude Deviation and Deviation Correction Method Based on Data Driven During Shield Tunneling. IEEE Access 2019, 7, 163487–163501. [Google Scholar] [CrossRef]
- Xie, H.; Duan, X.; Yang, H.; Liu, Z. Automatic trajectory tracking control of shield tunneling machine under complex stratum working condition. Tunn. Undergr. Space Technol. 2012, 32, 87–97. [Google Scholar] [CrossRef]
- GB50487-2008; Code for engineering geological investingation of water resources and hydropower. Ministry of Water Resources of the People’s Republic of China: Beijing, China, 2008.
- Wang, Z.; Bapst, V.; Heess, N.; Mnih, V.; Munos, R.; Kavukcuoglu, K.; de Freitas, N. Sample Efficient Actor-Critic with Experience Replay. arXiv 2016, arXiv:1611.01224. [Google Scholar]
- Zhao, T.; Hachiya, H.; Niu, G.; Sugiyama, M. Analysis and improvement of policy gradient estimation. Neural Netw. 2012, 26, 118–129. [Google Scholar] [CrossRef] [PubMed]
- Schulman, J.; Levine, S.; Moritz, P.; Jordan, M.; Abbeel, P. Trust Region Policy Optimization. In International Conference on Machine Learning; Bach, F., Blei, D., Eds.; JMLR-Journal Machine Learning Research: San Diego, CA, USA, 2015; Volume 37, pp. 1889–1897. Available online: https://www.webofscience.com/wos/woscc/full-record/WOS:000684115800200 (accessed on 1 January 2015).
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
Name | Abbreviation | Minimum Value | Maximum Value | Average Value | Unit | |
---|---|---|---|---|---|---|
20 excavation parameters | Advancing speed | AS | 0.031 | 234.245 | 36.577 | mm/min |
Penetration | PE | 0.003 | 39.022 | 6.575 | mm/rot | |
Total thrust force | TF | 1063.310 | 17,741.300 | 12,474.052 | kN | |
Cutterhead rotation speed | CRS | 0.285 | 8.788 | 5.888 | r/min | |
Cutterhead torque | CT | 150.676 | 3827.090 | 1501.380 | kN·m | |
Cutterhead average current | CAC | 48.901 | 390.743 | 188.012 | A | |
Pressure of chamber with rod of roof support | PWRS | −1.000 | 227.419 | 71.209 | bar | |
Pressure of chamber without rod of roof support | PORS | 22.031 | 263.534 | 115.070 | bar | |
Pressure of chamber with rod of left support | PWLS | −69.000 | 195.439 | 44.932 | bar | |
Pressure of chamber without rod of left support | POLS | −1.000 | 236.899 | 96.525 | bar | |
Pressure of chamber with rod of right support | PWRS | −50.000 | 192.189 | 65.765 | bar | |
Pressure of chamber without rod of right support | PORS | −18.000 | 291.743 | 130.415 | bar | |
Pressure of chamber with rod of propulsion cylinder | PWPC | −0.270 | 4.067 | 1.032 | bar | |
Pressure of chamber without rod of propulsion cylinder | PWPC | 30.104 | 229.142 | 159.032 | bar | |
Pressure of chamber with rod of left support boots | PWLS | 0.000 | 107.162 | 37.583 | bar | |
Pressure of chamber with rod of right support boots | PWRS | −1.000 | 96.993 | 41.707 | bar | |
Pressure of chamber without rod of left torque cylinders | POLT | 45.391 | 202.264 | 123.194 | bar | |
Pressure of chamber with rod of left torque cylinders | PWLT | −1.000 | 172.912 | 64.262 | bar | |
Pressure of chamber without rod of right torque cylinders | PORT | 44.074 | 147.439 | 91.565 | bar | |
Pressure of chamber with rod of right torque cylinders | PWRT | 22.114 | 166.529 | 100.531 | bar | |
Attitude control parameters | Displacement deviation of two support boots | DDSB | −175.000 | 204.000 | 23.464 | mm |
Displacement of left torque cylinders | DLTC | 55.743 | 155.000 | 109.593 | mm | |
TBMAP representation parameters | TBM dip angle | TDA | −3.279 | 12.405 | 4.520 | mm |
TBM flip angle | TFA | −11.202 | 7.442 | −1.549 | mm | |
Horizontal deviation of TBM head | HDTH | −241.000 | 307.351 | 13.818 | mm | |
Vertical deviation of TBM head | VDTH | −77.684 | 160.234 | 32.658 | mm |
GEC | Precision | Recall | F1-Score | Support |
---|---|---|---|---|
2 | 0.9571 | 0.9691 | 0.9604 | 13,234 |
3 | 0.9294 | 0.9254 | 0.9274 | 10,125 |
4 | 0.8828 | 0.8366 | 0.8591 | 1585 |
5 | 0.9769 | 0.9562 | 0.9664 | 5346 |
Epochs | Learning Rate | Batch Size | Verification Set Proportion | Optimization Algorithm | |
---|---|---|---|---|---|
TDA | 140 | 0.004 | 1000 | 0.1 | SGD |
TFA | 140 | 0.004 | 2000 | 0.1 | SGD |
HDTH | 200 | 0.004 | 2000 | 0.1 | SGD |
VDTH | 200 | 0.004 | 1000 | 0.1 | SGD |
TDA | TFA | HDTH | VDTH | |
---|---|---|---|---|
GEC 2 | 0.913 | 0.932 | 0.928 | 0.943 |
GEC 3 | 0.962 | 0.941 | 0.873 | 0.933 |
GEC 4 | 0.921 | 0.865 | 0.928 | 0.923 |
GEC 5 | 0.959 | 0.928 | 0.889 | 0.958 |
Hyperparameter Name | Hyperparameter Value |
---|---|
Total timesteps | 1,000,000 |
Learning rate | 3 × |
Parallel environment number | 8 |
Policy updates frequency | 2000 |
Lambda for GAE () | 0.95 |
Discount factor () | 0.95 |
Policy updates epochs | 8 |
PPO clip coefficient | 0.2 |
Coefficient of the value function loss | 0.5 |
The maximum norm for the gradient clipping | 0.5 |
Policy updates batch-size | 1600 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jia, G.; Huo, J.; Yang, B.; Wu, Z. The Real-Time Optimal Attitude Control of Tunnel Boring Machine Based on Reinforcement Learning. Appl. Sci. 2023, 13, 10026. https://doi.org/10.3390/app131810026
Jia G, Huo J, Yang B, Wu Z. The Real-Time Optimal Attitude Control of Tunnel Boring Machine Based on Reinforcement Learning. Applied Sciences. 2023; 13(18):10026. https://doi.org/10.3390/app131810026
Chicago/Turabian StyleJia, Guopeng, Junzhou Huo, Bowen Yang, and Zhen Wu. 2023. "The Real-Time Optimal Attitude Control of Tunnel Boring Machine Based on Reinforcement Learning" Applied Sciences 13, no. 18: 10026. https://doi.org/10.3390/app131810026
APA StyleJia, G., Huo, J., Yang, B., & Wu, Z. (2023). The Real-Time Optimal Attitude Control of Tunnel Boring Machine Based on Reinforcement Learning. Applied Sciences, 13(18), 10026. https://doi.org/10.3390/app131810026