Utilization of a Lightweight 3D U-Net Model for Reducing Execution Time of Numerical Weather Prediction Models
<p>The overall execution process of GloSea6.</p> "> Figure 2
<p>Operational structure of “um-atmos.exe”.</p> "> Figure 3
<p>Correlation heatmap of variables used in BiCGStab.</p> "> Figure 4
<p>Resolution size of the 3D grid data. (<b>a</b>) matches the latitude and longitude grid size of the Low GloSea6 UM model, and (<b>b</b>) is adjusted to be a multiple of 2 to facilitate the upsampling process in the U-Net architecture.</p> "> Figure 5
<p>U-Net architecture [<a href="#B30-atmosphere-16-00060" class="html-bibr">30</a>].</p> "> Figure 6
<p>Half-UNet architecture [<a href="#B35-atmosphere-16-00060" class="html-bibr">35</a>].</p> "> Figure 7
<p>CBAM-based Half-UNet (CH-UNet) architecture.</p> "> Figure 8
<p>Overall structure of CBAM and Sub-Attention Modules [<a href="#B36-atmosphere-16-00060" class="html-bibr">36</a>].</p> "> Figure 9
<p>Hybrid-DL NWP model structure integrating CH-UNet in the UM model of Low GloSea6.</p> "> Figure 10
<p>Comparison of ’um-atmos.exe’ file execution time for each timestep.</p> "> Figure 11
<p>Comparison of RMSE for each deep network model’s prediction results during Low GloSea6 execution by Timestep.</p> ">
Abstract
:1. Introduction
- We analyzed the overhead and execution time-based hotspots of the Low GloSea6 weather model.
- The structure of the identified hotspots was examined, and the parameters used were estimated to collect the optimal data for deep network training.
- This study demonstrates the feasibility of applying deep network by selectively modifying part of the existing numerical computation solutions of Low GloSea6 with deep network, without increasing the physical interpretative challenges.
- We further optimized the execution time by making simple modifications to the lightweight 3D convolution-based architecture when integrating it with the NWP, making it even more lightweight. Additionally, we improved the accuracy of predictions by adding an attention block to the deep network. We demonstrate and validate the resulting accuracy improvements.
- Through the proposed method, we successfully integrated a deep network model into the numerical computation solution of Low GloSea6, allowing it to be modified during execution in the Fortran 90 environment, and proved its potential for seamless operation.
2. Materials and Methods
2.1. Low-Resolution of GloSea6
2.1.1. Profiling Low GloSea6 for Target Selection in Deep Learning Applications
2.1.2. Hotspot Identified in Low GloSea6: 3D Successive Over-Relaxation
2.1.3. Biconjugate Gradient Stabilized Method
2.2. Data
2.3. Deep Learning Approach Methodology
2.3.1. Lightweight 3D U-Net Architecture
2.3.2. Deep Learning Utilization Method for NWP Models
3. Experiments
3.1. Deep Learning Model Experiments
3.1.1. Experimental Environments
3.1.2. Evaluation Metrics
3.1.3. Experimental Results
3.2. Experiments on the Utilization of Deep Learning in NWP Models
3.2.1. Experimental Environments
3.2.2. Experimental Results
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Schultz, M.G.; Betancourt, C.; Gong, B.; Kleinert, F.; Langguth, M.; Leufen, L.H.; Mozaffari, A.; Stadtler, S. Can deep learning beat numerical weather prediction? Philos. Trans. R. Soc. A 2021, 379, 20200097. [Google Scholar] [CrossRef] [PubMed]
- Kwok, P.H.; Qi, Q. A Variational U-Net for Weather Forecasting. arXiv 2021, arXiv:2111.03476. [Google Scholar]
- Chen, L.; Du, F.; Hu, Y.; Wang, Z.; Wang, F. SwinRDM: Integrate SwinRNN with Diffusion Model towards High-Resolution and High-Quality Weather Forecasting. Proc. AAAI Conf. Artif. Intell. 2023, 37, 322–330. [Google Scholar] [CrossRef]
- Frnda, J.; Durica, M.; Rozhon, J.; Vojtekova, M.; Nedoma, J.; Martinek, R. ECMWF short-term prediction accuracy improvement by deep learning. Sci. Rep. 2022, 12, 7898. [Google Scholar] [CrossRef]
- Cho, D.; Yoo, C.; Son, B.; Im, J.; Yoon, D.; Cha, D. A novel ensemble learning for post-processing of NWP Model’s next-day maximum air temperature forecast in summer using deep learning and statistical approaches. Weather Clim. Extrem. 2022, 35, 100410. [Google Scholar] [CrossRef]
- Yao, Y.; Zhong, X.; Zheng, Y.; Wang, Z. A Physics-Incorporated Deep Learning Framework for Parameterization of Atmospheric Radiative Transfer. J. Adv. Model. Earth Syst. 2023, 15, e2022MS003445. [Google Scholar] [CrossRef]
- Mu, B.; Chen, L.; Yuan, S.; Qin, B. A radiative transfer deep learning model coupled into WRF with a generic fortran torch adaptor. Front. Earth Sci. 2023, 11, 1149566. [Google Scholar] [CrossRef]
- Zhong, X.; Ma, Z.; Yao, Y.; Xu, L.; Wu, Y.; Wang, Z. WRF–ML v1. 0: A bridge between WRF v4. 3 and machine learning parameterizations and its application to atmospheric radiative transfer. Geosci. Model Dev. 2023, 16, 199–209. [Google Scholar] [CrossRef]
- Chen, G.; Wang, W.-C.; Yang, S.; Wang, Y.; Zhang, F.; Wu, K. A neural network-based scale-adaptive cloud-fraction scheme for GCMs. J. Adv. Model. Earth Syst. 2023, 15, e2022MS003415. [Google Scholar] [CrossRef]
- Zhong, X.; Yu, X.; Li, H. Machine learning parameterization of the multi-scale Kain–Fritsch (MSKF) convection scheme and stable simulation coupled in the Weather Research and Forecasting (WRF) model using WRF–ML v1. 0. Geosci. Model Dev. 2024, 17, 3667–3685. [Google Scholar] [CrossRef]
- Mu, B.; Zhao, Z.-J.; Yuan, S.-J.; Qin, B.; Dai, G.-K.; Zhou, G.-B. Developing intelligent Earth System Models: An AI framework for replacing sub-modules based on incremental learning and its application. Atmos. Res. 2024, 302, 107306. [Google Scholar] [CrossRef]
- Wang, X.; Han, Y.; Xue, W.; Yang, G.; Zhang, G.J. Stable climate simulations using a realistic general circulation model with neural network parameterizations for atmospheric moist physics and radiation processes. Geosci. Model Dev. 2022, 15, 3923–3940. [Google Scholar] [CrossRef]
- Choi, S.; Jung, E.S. Optimizing Numerical Weather Prediction Model Performance Using Machine Learning Techniques. IEEE Access 2023, 11, 86038–86055. [Google Scholar] [CrossRef]
- Walters, D.; Baran, A.J.; Boutle, I.; Brooks, M.; Earnshaw, P.; Edwards, J.; Furtado, K.; Hill, P.; Lock, A.; Manners, J.; et al. The Met Office Unified Model Global Atmosphere 7.0/7.1 and JULES Global Land 7.0 configurations. Geosci. Model Dev. 2019, 12, 1909–1963. [Google Scholar] [CrossRef]
- Intel VTune Profiler. Available online: https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html (accessed on 24 June 2024).
- ROSE. Available online: https://metomi.github.io/rose/2019.01.8/html/tutorial/rose/index.html (accessed on 8 January 2019).
- CYLC Introduction. Available online: https://cylc.github.io/cylc-doc/latest/html/tutorial/introduction.html (accessed on 25 June 2024).
- Jinja Introduction. Available online: https://jinja.palletsprojects.com/en/3.0.x/intro/ (accessed on 9 November 2021).
- Tee, G.J. Eigenvectors of the Successive Over-Relaxation Process, and its Combination with Chebyshev Semi-Iteration. Comput. J. 1963, 6, 250–263. [Google Scholar] [CrossRef]
- Mittal, S. A study of successive over-relaxation method parallelisation over modern HPC languages. Int. J. High Perform. Comput. Netw. 2014, 7, 292–298. [Google Scholar] [CrossRef]
- Allaviranloo, T. Successive over relaxation iterative method for fuzzy system of linear equations. Appl. Math. Comput. 2005, 162, 189–196. [Google Scholar] [CrossRef]
- Vorst, H.A. Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems. SIAM J. Sci. Stat. Comput. 1992, 13, 631–644. [Google Scholar] [CrossRef]
- Wang, M.; Sheu, T. An element-by-element BICGSTAB iterative method for three-dimensional steady Navier-Stokes equations. J. Comput. Appl. Math. 1997, 79, 147–165. [Google Scholar] [CrossRef]
- Long, C.; Liu, S.; Sun, R.; Lu, J. Impact of structural characteristics on thermal conductivity of foam structures revealed with machine learning. Comput. Mater. Sci. 2024, 237, 112898. [Google Scholar] [CrossRef]
- Havdiak, M.; Aliaga, J.I.; Iakymchuk, R. Robustness and Accuracy in Pipelined Bi-Conjugate Gradient Stabilized Method: A Comparative Study. arXiv 2024, arXiv:2404.13216. [Google Scholar]
- Joly, P.; Meurant, G. Complex conjugate gradient methods. Numer. Algorithms 1993, 4, 379–406. [Google Scholar] [CrossRef]
- Wang, H.; Liu, F.; Xia, L.; Crozier, S. An efficient impedance method for induced field evaluation based on a stabilized Bi-conjugate gradient algorithm. Phys. Med. Biol. 2008, 53, 6363. [Google Scholar] [CrossRef] [PubMed]
- Brownlee, J. How to choose a feature selection method for machine learning. Mach. Learn. Mastery 2019, 10, 1–7. [Google Scholar]
- Khairoutdinov, M.F.; Blossey, P.N.; Bretherton, C.S. Global system for atmospheric modeling: Model description and preliminary results. J. Adv. Model. Earth Syst. 2022, 14, e2021MS002968. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. arXiv 2014, arXiv:1411.4038. [Google Scholar]
- Kaparakis, C.; Mehrkanoon, S. WF-UNet: Weather Fusion UNet for Precipitation Nowcasting. arXiv 2023, arXiv:2302.04102. [Google Scholar]
- Kim, T.; Kang, S.; Shin, H.; Yoon, D.; Eom, S.; Shin, K.; Yun, S.Y. Region-conditioned orthogonal 3D U-Net for weather4cast competition. arXiv 2022, arXiv:2212.02059. [Google Scholar]
- Fernandez, J.G.; Mehrkanoon, S. Broad-UNet: Multi-scale feature learning for nowcasting tasks. Neural Netw. 2021, 144, 419–427. [Google Scholar] [CrossRef]
- Lu, H.; She, Y.; Tie, J.; Xu, S. Half-UNet: A simplified U-Net architecture for medical image segmentation. Front. Neuroinform. 2022, 16, 911679. [Google Scholar] [CrossRef] [PubMed]
- Woo, S.; Park, J.; Lee, J.; Kweon, I.S. CBAM: Convolutional Block Attention Module. arXiv 2018, arXiv:1807.06521. [Google Scholar]
- FTorch Documentation. Available online: https://cambridge-iccs.github.io/FTorch/ (accessed on 23 July 2024).
Model | Version | Grid Size | Resolution |
---|---|---|---|
Atmosphere | UM vn11.5 | 60 km/0.83° × 0.25° | N216L85 |
Land Surface | JULES vn5.6 | 60 km/0.83° × 0.25° | N216L4 |
Ocean | NEMO vn3.6 | 25 km/0.25° × 0.25° | eORCA025L75 |
Sea-Ice | CICE vn5.1.2 | 25 km/0.25° × 0.25° | eORCA025L75 |
Model | Component | Resolution | Grid Size | Grid Degree |
---|---|---|---|---|
GloSea6 | UM | N126 | 60 km | 0.83° × 0.25° |
NEMO | eORCA025 | 25 km | ∼0.25° | |
Low GloSea6 | UM | N96 | 170 km | 1.88° × 1.25° |
NEMO | eORCA1 | 100 km | ∼1.0° |
Name | Hardware Specification |
---|---|
CPU | Intel® Core 10th Gen i7-10700K |
RAM | 64 GB |
SSD | 1 TB |
GPU | GeForce RTX 3080 |
Item | Value |
---|---|
CPU Time | 29,877.390 s |
Effective Time | 13,390.712 s |
Spin Time | 16,486.679 s |
Overhead Time | 0 s |
Instructions Retired | 206,821,904,600,000 |
Microarchitecture Usage | 57.4% |
Total Thread Count | 17 |
Paused Time | 3.639 s |
CPU Time (s) | Microarchitecture Usage (%) | Module | Function (Full) |
---|---|---|---|
13,032.006 | 79.8% | libmpi.so.12.0.5 | MPIDI_CH3I_Progress |
3097.394 | 36.6% | libmpi.so.12.0.5 | MPID_nem_tcp_connpoll |
1408.185 | 34.6% | [Unknown] | [Outside any known module] |
1283.903 | 100.0% | libmpi.so.12.0.5 | MPIDU_Sched_are_pending |
1125.987 | 13.3% | um-atmos.exe | _tri_sor_mod_MOD |
888.238 | 48.4% | libm-2.17.so | __ieee754_pow_sse2 |
617.219 | 63.0% | libm-2.17.so | __ieee754_exp_avx |
450.796 | 41.3% | libm-2.17.so | __exp1 |
437.765 | 23.0% | libgfortran.so.4.0.0 | func@0x1c270 |
347.046 | 22.4% | um-atmos.exe | __mod_cosp_MOD_cosp_iter |
Variable | Description |
---|---|
The matrix x updated in the 1st iteration | |
The matrix corresponding to the right-hand side term in the linear system during the 1st iteration | |
The direction matrix used in the conjugate gradient method during the 1st iteration | |
The matrix x updated in the 2nd iteration | |
The matrix corresponding to the right-hand side term in the linear system during the 2nd iteration | |
The direction matrix used in the conjugate gradient method during the 2nd iteration | |
x | The matrix x after the convergence of the solution |
Loop Name | CPU Time (s) |
---|---|
Outer 1, Inner 1 | 1.9284 |
Outer 1, Inner 2 | 0.9091 |
Outer 2, Inner 1 | 0.5974 |
Outer 2, Inner 2 | 0.1548 |
Name | Hardware Specification |
---|---|
CPU | Intel(R) Xeon(R) Gold 6246R CPU @ 3.40 GHz |
RAM | 256 GB |
SSD | 1 TB |
GPU | NVIDIA RTX A6000 D6 48 GB × 4 pcs |
Up Sampling | Architecture | Params | FLOPs | RMSE | MAE | SMAPE | WAPE | R2 Score |
---|---|---|---|---|---|---|---|---|
U-Net | 58.9k | 3.55G | 0.0027 | 0.0013 | 0.2192% | 0.2191% | 0.8100 | |
Transposed | U-Net3+ | 85.9k | 12.07G | 0.0032 | 0.0017 | 0.2921% | 0.2920% | 0.7434 |
Convolution | Half-UNet | 13.2k | 2.98G | 0.0027 | 0.0014 | 0.2347% | 0.2347% | 0.8099 |
CH-UNet | 15.7k | 3.71G | 0.0024 | 0.0010 | 0.1814% | 0.1814% | 0.8576 | |
U-Net | 71.1k | 4.63G | 0.0026 | 0.0013 | 0.2211% | 0.2210% | 0.8296 | |
Trilinear | U-Net3+ | 85k | 10.53G | 0.0028 | 0.0016 | 0.2816% | 0.2815% | 0.7985 |
Interpolation | Half-UNet | 12k | 2.64G | 0.0030 | 0.0015 | 0.2600% | 0.2599% | 0.7661 |
CH-UNet | 14.5k | 3.37G | 0.0040 | 0.0021 | 0.3713% | 0.3713% | 0.5891 |
Server Node | Client Node | ||
---|---|---|---|
Name | Hardware Specification | Name | Hardware Specification |
CPU | Intel® Core 13th Gen i9-13900F | CPU | Intel® Core 7th Gen i7-7700 |
RAM | 126 GB | RAM | 8 GB |
SSD | 32 TB | SSD | 1 TB |
GPU | GeForce RTX 4070 Ti | GPU | No GPU |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Park, H.; Chung, S. Utilization of a Lightweight 3D U-Net Model for Reducing Execution Time of Numerical Weather Prediction Models. Atmosphere 2025, 16, 60. https://doi.org/10.3390/atmos16010060
Park H, Chung S. Utilization of a Lightweight 3D U-Net Model for Reducing Execution Time of Numerical Weather Prediction Models. Atmosphere. 2025; 16(1):60. https://doi.org/10.3390/atmos16010060
Chicago/Turabian StylePark, Hyesung, and Sungwook Chung. 2025. "Utilization of a Lightweight 3D U-Net Model for Reducing Execution Time of Numerical Weather Prediction Models" Atmosphere 16, no. 1: 60. https://doi.org/10.3390/atmos16010060
APA StylePark, H., & Chung, S. (2025). Utilization of a Lightweight 3D U-Net Model for Reducing Execution Time of Numerical Weather Prediction Models. Atmosphere, 16(1), 60. https://doi.org/10.3390/atmos16010060