Open AccessArticle

Machine Learning Prediction of Photovoltaic Hydrogen Production Capacity Using Long Short-Term Memory Model

Qian He

¹,

Mingbin Zhao

Shujie Li

³,

Xuefang Li

^3,*

and

Zuoxun Wang

^1,*

College of Engineering, Shandong Xiehe University, Jinan 250109, China

State Key Laboratory of Fire Science, University of Science and Technology of China, Hefei 230027, China

Center for Hydrogen Energy, Shandong University, Jinan 250061, China

Authors to whom correspondence should be addressed.

Energies 2025, 18(3), 543; https://doi.org/10.3390/en18030543

Submission received: 12 November 2024 / Revised: 10 January 2025 / Accepted: 21 January 2025 / Published: 24 January 2025

(This article belongs to the Special Issue Advances in Fuel Cells and Hydrogen Storage Technologies)

Download

Browse Figures

Figure 1
Structures of photovoltaic hydrogen production system. (a) Off-grid photovoltaic hydrogen production system; (b) grid-connected photovoltaic hydrogen production system. "> Figure 2
LSTM network structure. "> Figure 3
Distribution of ground meteorological elements in the Third Pole region. "> Figure 4
Daily solar irradiance variations from 2013 to 2022. "> Figure 5
Correlation analysis of ground meteorological element variables. Abbreviations: IR—Solar Irradiance (W/m2); PP—Precipitation (mm/h); P—Atmospheric Pressure (hPa); RH—Relative Humidity (kg/kg); T—Temperature (K); WS—Wind Speed (m/s). "> Figure 6
Normalization of the meteorological parameters: (a) original values; (b) normalized values. "> Figure 7
Loss variation curve. "> Figure 8
Comparison between predicted and actual solar irradiance values: (a) original values; (b) normalized values (the dashed line represents the predictions matching the true values). "> Figure 9
Prediction results of daily average photovoltaic hydrogen production capacity in summer and winter. ">

Versions Notes

Abstract

The yield of photovoltaic hydrogen production systems is influenced by a number of factors, including weather conditions, the cleanliness of photovoltaic modules, and operational efficiency. Temporal variations in weather conditions have been shown to significantly impact the output of photovoltaic systems, thereby influencing hydrogen production. To address the inaccuracies in hydrogen production capacity predictions due to weather-related temporal variations in different regions, this study develops a method for predicting photovoltaic hydrogen production capacity using the long short-term memory (LSTM) neural network model. The proposed method integrates meteorological parameters, including temperature, wind speed, precipitation, and humidity into a neural network model to estimate the daily solar radiation intensity. This approach is then integrated with a photovoltaic hydrogen production prediction model to estimate the region’s hydrogen production capacity. To validate the accuracy and feasibility of this method, meteorological data from Lanzhou, China, from 2013 to 2022 were used to train the model and test its performance. The results show that the predicted hydrogen production agrees well with the actual values, with a low mean absolute percentage error (MAPE) and a high coefficient of determination (R²). The predicted hydrogen production in winter has a MAPE of 0.55% and an R² of 0.985, while the predicted hydrogen production in summer has a slightly higher MAPE of 0.61% and a lower R² of 0.968, due to higher irradiance levels and weather fluctuations. The present model captures long-term dependencies in the time series data, significantly improving prediction accuracy compared to conventional methods. This approach offers a cost-effective and practical solution for predicting photovoltaic hydrogen production, demonstrating significant potential for the optimization of the operation of photovoltaic hydrogen production systems in diverse environments.

Keywords:

photovoltaic hydrogen production; capacity prediction; LSTM network model; neural network

1. Introduction

According to the 2023 national power industry statistical data [1], the total installed capacity of photovoltaic (PV) power generation in China has reached 610 million kilowatts, signifying a substantial advancement in the nation’s solar energy sector. This development is accompanied by continuous advancements in photovoltaic technology, a rapid decline in costs, and an increasing diversification of development models. Various “PV+” development models are constantly emerging, and the goal of achieving parity and diversification in the photovoltaic industry is within reach [2]. However, as the industry progresses, challenges are becoming increasingly apparent. The inherent variability and cyclical patterns of photovoltaic power generation are leading to increased complexity in power grid management. This, in turn, results in the stability of photovoltaic power systems being less robust compared to conventional systems. The issue of poor stability in photovoltaic power systems is a significant constraint on the continuous large-scale development of photovoltaic power generation and is the primary problem that must be addressed at present. The key to solving this problem lies in the development of efficient and flexible energy storage methods. Hydrogen, which is an effective energy carrier, possesses significant benefits that can tackle the challenge. Integrating hydrogen generation and storage within the processes of renewable energy sources, like solar photovoltaics, can efficiently mitigate the instability inherent in photovoltaic power systems. Moreover, hydrogen is an environmentally benign fuel, and its storage facilitates broad applications across many sectors, including transportation, decentralized power generation for heating, chemical industries, and metallurgy. The vigorous development of the hydrogen energy industry, along with the utilization of hydrogen energy as a long-term energy storage medium for the new power system, can alleviate the spatiotemporal imbalance of electricity production and use, thereby enhancing the flexibility of the power grid. The establishment of a green hydrogen production system that is independent of the constraints imposed by the power grid can facilitate the development and construction process for new energy sources. This system can directly convert wind and solar resources into hydrogen energy for subsequent storage and utilization, which is of significant strategic and practical importance [3].

In contemporary energy models, photovoltaic power generation has emerged as a pioneering approach for hydrogen production, characterized by its innovative utilization of photovoltaic systems to directly generate hydrogen via electrolysis. This model stands in contrast to conventional power plants, wherein the steps of inversion and voltage elevation are crucial components of the electrolysis process. Photovoltaic panels offer a high degree of flexibility in configuration, with the ability to be connected in series or parallel to meet the voltage and current demands of hydrogen production facilities. This flexibility contributes to the optimization of system performance. The technology for generating hydrogen through water electrolysis is currently reaching a state of maturity. The apparatus is straightforward, the operations and management are comparatively easy, and the hydrogen yielded is characterized by high purity, making it well suited for various applications. Photovoltaic hydrogen production systems can be categorized into off-grid PV hydrogen production systems and grid-connected PV hydrogen production systems based on the interaction between the hydrogen production system and the power grid [3]; the basic structures of the two systems are shown in Figure 1. The off-grid PV hydrogen production system is composed of a PV array, which consists of solar cell modules, a battery pack, a DC converter, an electrolysis device, and a hydrogen storage device. This type of system is a source-independent power generation system that does not require the participation of the main power grid. Consequently, it has superior flexibility and mobility. In contrast, the grid-connected photovoltaic hydrogen production system can be classified into two types: one featuring a common AC bus and the other a common DC bus, as shown in Figure 1b. For DC bus hydrogen production, while constructing a PV/storage/hydrogen DC microgrid, it is necessary to interact with the power grid through a DC/AC conversion device. The AC bus hydrogen production has been widely adopted in the northwest and other regions rich in photovoltaic resources in China. This approach utilizes the stability of the larger power grid to support the PV hydrogen production system, and it leverages low-cost off-peak power to enhance equipment utilization and optimize the project’s economic viability.

The PV hydrogen production system primarily converts light energy directly or indirectly into electrical energy through photovoltaic or photochemical effects. The subsequent production and storage of hydrogen is facilitated by the electrolysis of water. In the context of practical applications, light energy is predominantly sourced from solar energy. Under constant technical conditions, the conversion efficiency of solar cell modules is typically constant, and the current state of photovoltaic systems exhibits limited absorption efficiency for solar radiation energy; typically, it does not exceed 20%. The output of the PV hydrogen production is directly related to the quantity of solar radiation arriving at the ground, which changes with meteorological conditions [4]. To efficiently convert this electrical energy into hydrogen, various water electrolysis technologies are used [5]. Current mainstream technologies include alkaline water electrolysis (AWE), proton exchange membrane electrolysis (PEM), solid oxide electrolysis (SOEC), and anion exchange membrane electrolysis (AEM). AWE is a mature and cost-effective technology suitable for large-scale applications, though it suffers from lower efficiency (60–80%) and electrode corrosion. PEM has high efficiency and the capacity to produce hydrogen of high purity, making it well suited for small-scale systems. However, PEM depends on costly noble metal catalysts. SOEC operates at high temperatures (900–1000 °C) with high efficiency but faces challenges such as a long startup time and material degradation. AEM combines the advantages of AWE and PEM, using low-cost materials, but its performance and durability require further improvement. Each technology has distinct characteristics, and the choice depends on specific application requirements, including scale, cost, and operational conditions.

Machine learning has achieved considerable success in a variety of fields in recent years, and its advanced strong self-learning capabilities and substantial data processing abilities have played an important role in the research processes of numerous fields. Given the substantial randomness inherent in the PV hydrogen production industry, the ability to predict hydrogen production capacity over a certain period using existing data would effectively enhance energy utilization efficiency. Machine learning encompasses a variety of algorithms, such as support vector regression (SVR), clustering algorithms, artificial neural networks (ANNs), and principal component analysis (PCA), which have been widely used in predictive issues related to industrial production. Cheng et al. [6] used support vector machine (SVM) and FbProphet algorithms to predict annual hydrogen yields in regions with optimal solar radiation and available land. The regression coefficients (R²) obtained from the test sets were above 0.95, confirming the accuracy of the model. The results demonstrate the high potential and environmental benefits of producing green hydrogen through PV-powered water electrolysis in China. Ozdemir et al. [7] evaluated the electrochemical performance of proton exchange membrane (PEM) water electrolyzers using machine learning techniques. They identified SVM as the most effective method for predicting hydrogen flow rate, with a mean absolute error (MAE) of 0.0317, and current density, with an MAE of 0.0671, thus optimizing operational parameters for enhanced efficiency and durability. Haider et al. [8] used various machine learning methods, including the Prophet algorithm, stochastic gradient descent (SGD), and seasonal autoregressive integrated moving average exogenous (SARIMAX), to predict the solar hydrogen production potential in Islamabad, Pakistan. A comparative analysis showed that the Prophet algorithm achieved an R² score of 0.983, outperforming SGD and SARIMAX, which scored 0.969 and 0.966, respectively. This indicates that the Prophet model demonstrated superior data-fitting capabilities compared to the other methods. The prediction results demonstrate a significant daily average production and highlight the region’s potential for green energy through a photovoltaic–electrolytic (PV-E) system. Kabir et al. [9] highlighted significant challenges in scaling up green hydrogen production (GHP) technologies, particularly in yield prediction and process optimization. They applied various machine learning algorithms to predict and optimize GHP using the PEM technology. The results show that the K-nearest neighbor (KNN) model is the best-performing approach, achieving a high regression coefficient of 0.948, a low root mean squared error (RMSE) of 0.038, and a minimal MAE of 0.161. The integration of machine learning into the field of hydrogen production is a testament to the versatility and adaptability of these algorithms. However, the nonlinear transformation ability and feature representation capabilities of machine learning models are relatively weak and often fail to achieve satisfactory prediction accuracy.

Deep learning models have become the preferred method for complex modeling tasks due to their powerful nonlinear transformation capabilities. Over the past few years, the thriving advancement of deep learning has provided efficient solutions for various industrial applications. Recurrent neural networks (RNNs) have become prevalent in sequence modeling tasks due to their capability to grasp contextual information. Adeli et al. [10] used an RNN model to forecast Morocco’s prospective energy production capacity and hydrogen production potential in the next decade. Akhter et al. [11] developed and evaluated a hybrid deep learning approach (SSA-RNN-LSTM) for short-term power yield prediction in three distinct PV systems, demonstrating superior precision and robustness compared to other models over a four-year data period. Javaid et al. [12] assessed the feasibility of harnessing wind power for hydrogen generation using an LSTM model. Ruhani et al. [13] used the LSTM model to achieve optimized scheduling of hydrogen energy systems. Kazi and Eljack [14] accurately predicted the future hydrogen demand of the maritime sector using the LSTM model. These applications underscore the versatility and effectiveness of deep learning models in tackling complex industrial problems. As deep learning continues to evolve, its integration into energy prediction and optimization promises to drive significant advancements in efficiency and sustainability across various sectors.

Photovoltaic hydrogen production, which combines photovoltaic power generation with water electrolysis, has emerged as a key technology for achieving energy sustainability. However, photovoltaic power generation is highly susceptible to various meteorological and environmental factors and exhibits significant intermittency and variability. These characteristics can lead to the unstable efficiency of water electrolysis for hydrogen production, reduce the overall equipment utilization rate, and increase the unit hydrogen production cost. Therefore, a comprehensive understanding of the patterns of solar radiation reaching the ground, combined with timely and accurate forecasts, can facilitate efficient power scheduling, prevent electricity waste or shortages, and enhance the efficiency and output of hydrogen production. From a broader perspective, predicting the capacity of photovoltaic hydrogen production systems can help identify the advantages and limitations of different regions for PV hydrogen production. This, in turn, enables the prioritization of regions with greater hydrogen production potential. Furthermore, such insights can guide policymakers and industry stakeholders in optimizing resource allocation and promoting sustainable economic development in the renewable energy sector.

The objective of this study is to develop a photovoltaic hydrogen production capacity prediction model using the LSTM neural networks to address the challenges posed by the effects of the temporal variability of weather conditions on photovoltaic systems. The model aims to accurately predict daily solar radiation intensity by integrating key meteorological data, such as temperature, wind speed, precipitation, and humidity, and thereby estimate regional hydrogen production capacity. The ultimate goal is to provide an economical, efficient, and widely applicable tool for predicting photovoltaic hydrogen production capacity. This tool will not only mitigate the instability of photovoltaic power systems and optimize system operations but also promote the use of hydrogen as a sustainable energy carrier and enhance energy utilization efficiency. These objectives hold significant theoretical and practical importance for advancing renewable energy technologies and achieving green energy transition goals.

In this study, an LSTM model is used to predict photovoltaic hydrogen production, leveraging its superior ability to capture long-term dependencies in time series data, particularly for meteorological variables. Unlike traditional machine learning models, such as SVM and ANN, LSTM excels at nonlinear transformations and the feature representation of time series data, enabling more accurate predictions. The present approach incorporates solar irradiance and other meteorological parameters, including temperature, wind speed, humidity, and precipitation, to construct a multivariate prediction model. This comprehensive input framework more effectively reflects the impact of meteorological conditions on hydrogen production, significantly enhancing prediction accuracy. Furthermore, the present model is trained and tested on a decade-long dataset (2013–2022). This extended temporal coverage distinguishes this study from those relying on short-term data, as it captures a wider range of climatic variations, seasonal patterns, and long-term trends. The inclusion of such a comprehensive dataset enhances the model’s generalization capabilities, enabling it to perform robustly across diverse and unpredictable environmental conditions. These aspects collectively underscore the novelty and robustness of the present work in advancing the field of photovoltaic hydrogen production predictions.

2. Models and Methods

2.1. LSTM Model

The hydrogen production capacity at a given moment is not only related to the current moment’s meteorological parameters; it is also influenced by the input and output of the previous moment. In contrast to traditional neural networks (NNs), which establish an independent parameter at each single time step, recurrent neural networks (RNNs) establish connections between the past and the present within the hidden layer. This allows the network model to store a significant portion of time. The LSTM network is a sophisticated type of RNN that addresses the “forgetting” problem in networks, enhancing memory from historical training data and selecting relatively valuable information. As a special kind of recurrent neural network, the LSTM is better equipped to handle long-term dependency issues compared to conventional neural networks. The LSTM network is distinguished from a typical neural network by the inclusion of a hidden layer, which contains an additional “core (cell)” in addition to the external self-loop module, as illustrated in Figure 2. This LSTM network core encompasses multiple storage units and three distinct types of “gates”: the input gate, the forget gate, and the output gate. The primary functions of these gates are to manage historical information, gather external data, and sift through internal data. Each “gate” plays a specific role, as follows [15,16].

Forget Gate: This mechanism is responsible for determining which information should be discarded from the cell state. By applying a sigmoid activation function, it filters the input, producing a value between 0 and 1 that signifies the extent of the information to be retained within each cell state. A value of 0 means the information is totally discarded, whereas a value of 1 indicates full retention.

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

(1)

Input Gate: This component determines which information should be added to the cell state. It employs a sigmoid activation function, which serves to filter the input, thereby generating a value ranging from 0 to 1. This value indicates the relative significance of each input element. Subsequently, a hyperbolic tangent (tanh) activation function is implemented to generate a new candidate vector, which proposes potential updates to the cell state.

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})

(2)

{\tilde{C}}_{t} = \tanh (W_{C} \cdot [h_{t - 1,} x_{t}] + b_{C})

(3)

C_{t} = f_{t} * C_{t - 1} + i_{t} * {\tilde{C}}_{t}

(4)

Output Gate: This mechanism determines which information from the cell state will be passed to the subsequent layer or serve as the final output. It employs a sigmoid activation function to filter the input, generating a value between 0 and 1 that reflects the degree to which information from each cell state is released. The updated cell state is then processed through a tanh activation function, and the outcome is multiplied by the output from the output gate to produce the final output of the LSTM unit.

o_{t} = σ (W_{o} [h_{t - 1}, x_{t}] + b_{o})

(5)

h_{t} = o_{t} * \tanh (C_{t})

(6)

2.2. Data Analysis and Preprocessing

Irradiance intensity is the factor with the most direct influence on the capacity of photovoltaic hydrogen production systems. In this predictive model, the values selected as input data are meteorological elements that have a certain correlation with irradiance intensity, such as geographical location, meteorological conditions, atmospheric quality, terrain, and season. Among these elements, the impact of ground meteorological elements is the greatest. The attenuation of solar radiation, influenced by climatic and humidity conditions, exhibits variability. Concurrently, meteorological factors can also impact the components and equipment of the photovoltaic hydrogen production system, thereby affecting its capacity. This aspect of the data collection is of paramount importance. This study selected a long-term time series of high-resolution ground meteorological element-driving datasets from the Third Pole region [17] as the data source for model training. The data covered seven meteorological elements, including daily average precipitation, 2 m temperature, 2 m specific humidity, 10 m wind speed, near-surface atmospheric pressure, downward longwave radiation, and downward shortwave radiation from 2013 to 2022. Among them, precipitation, temperature, specific humidity, and pressure were derived by integrating short-term high-resolution WRF simulations, long-sequence ERA5 data, and station observation data; shortwave radiation was obtained by integrating ERA5 reanalysis data and satellite-inverted shortwave radiation; longwave radiation was calculated using the semi-empirical CD99 model. This dataset offers greater precision than the prevailing reanalysis datasets and is extensively utilized for climate studies in the Third Pole area, as well as for land surface, hydrological, and ecological model inputs. A typical distribution of meteorological elements in August from this dataset is shown in Figure 3. Subsequent predictive models extract meteorological data from Lanzhou as training data to analyze the local changes in photovoltaic hydrogen production capacity.

In the current dataset, solar irradiance emerges as the paramount factor affecting the generation of photovoltaic power, providing the most direct assessment of a region’s hydrogen production potential. The daily average solar irradiance pattern from 2013 to 2022 is shown in Figure 4. Solar irradiance exhibits a discernible cyclical variation, with irradiance levels in winter being considerably lower than the mean levels in summer. Moreover, due to the influence of weather and precipitation, solar irradiance also shows considerable fluctuation, which can lead to a decline in the predictive performance of subsequent model training. Random deviations from the mean can increase the difficulty of the training process, potentially resulting in overly complex models that are difficult to interpret. It may also cause the model to be overly sensitive to specific changes in the data or to suffer from overfitting. To address these challenges, a moving average method was employed to process the solar irradiance to reduce data volatility. This method involved the summation of the downward solar irradiance and the upward solar irradiance, thereby obtaining the total downward irradiance for the day. This total downward irradiance was then utilized as the primary influencing factor in the subsequent calculations of the photovoltaic hydrogen production volume.

In addition to solar irradiance, wind speed can reflect the seasonal changes in the local climate. Local temperature and precipitation can illustrate the local meteorological conditions well. Humidity represents the atmospheric quality conditions of the region, and the average atmospheric pressure can serve as an indicator of geographical location. The aforementioned data are very important meteorological elements that have a certain impact on the capacity of photovoltaic hydrogen production. In previous research, these data were frequently utilized as secondary factors to assess the variations in solar irradiance. In this study, the Pearson correlation coefficient model [18] is used to correlate the aforementioned secondary variables and solar irradiance, which serves as the primary influencing factor. Variables unrelated to the current primary variable are eliminated, thereby simplifying the input of the predictive model and optimizing the model training process. The calculation method for the Pearson correlation coefficient is as follows:

ρ_{X, Y} = \frac{cov (X, Y)}{σ_{X} σ_{Y}} = \frac{E ((X - μ_{X}) (Y - μ_{Y}))}{σ_{X} σ_{Y}}

(7)

where

ρ_{X, Y}

spans from −1 to 1.

ρ_{X, Y}

= 0 means no correlation between variables. The results of the correlation analysis between solar irradiance and other secondary ground meteorological element variables are shown in Figure 5. From Figure 5a, it can be seen that there is a significant positive correlation between solar irradiance and temperature, while there is a significant negative correlation with atmospheric pressure. By calculating the Pearson coefficient, the correlation coefficient between solar irradiance and atmospheric pressure is found to be −0.74. Therefore, atmospheric pressure is excluded in the subsequent dataset construction process. Ultimately, the meteorological factors of solar irradiance (W/m²), wind speed (m/s), temperature (K), humidity (kg/kg), and precipitation (mm/h) are selected to form the dataset.

Normalization methods are used to process the aforementioned feature values to ensure that the absolute values of each set of data are within the range of 0 to 1. This process eliminates the scale differences between features, allowing each feature to play a relatively balanced role in the model training, which enhances the training speed and robustness of the model. Subsequently, a random sampling method is employed to partition the data into training and testing subsets with 80% for training and 20% for testing. It is imperative that the test set does not contain any information about the data within it to prevent interference with the results. Then, 10% of the data from the training set are extracted to form a validation set for evaluating the accuracy and generalization ability of the model during the training process and to prevent the occurrence of overfitting. The results of the dataset normalization and the division are shown in Figure 6.

2.3. Prediction Model

The multivariate photovoltaic hydrogen production capacity prediction model can be divided into three modules, as follows:

(1) Data preprocessing module: Process the obtained gas phase element parameters with outliers, perform sliding window averaging, and perform correlation analysis to remove gas phase element parameters with low correlation with solar irradiance. Then, divide the data into training and testing sets in an 8:2 ratio for subsequent model training.

(2) Solar irradiance prediction module: Build and train an LSTM neural network model, using meteorological parameters, such as wind speed, temperature, humidity, and precipitation, as input parameters to predict solar irradiance. The LSTM network is particularly adept at capturing complex patterns in time series data, allowing it to implicitly account for the effects of other factors through historical patterns in solar irradiance data. The multivariate network model constructed in this study consists of four LSTM layers, each containing 128 neurons. The dropout method is used to avert overfitting and is set to a dropout rate of 20% and a step size of 5, which predicts the daily solar irradiance using the meteorological element parameters from the first five days. Using the Adam optimizer, the model has a learning rate of 0.001, and the training loss function uses MSELoss. The batch size used during the training process is 64. When the loss of the validation set does not decrease for 10 consecutive steps, the training ends. The model is built using PyTorch v2.0.0 (Python 3.10) and then trained using an NVIDIA V100 graphics processor.

(3) Photovoltaic hydrogen production prediction module: The solar irradiance predicted by the model, based on wind speed, temperature, humidity, and precipitation, is used as a known quantity. The hydrogen yield is calculated according to the predicted solar irradiance. Although meteorological factors such as wind speed, temperature, humidity, and precipitation can affect hydrogen production, solar irradiance remains the most direct and significant factor affecting the energy output of photovoltaic panels. This is because solar irradiance determines the amount of sunlight available for conversion into electrical energy, which subsequently drives hydrogen production through electrolysis. Additionally, solar irradiance data are generally more readily available and reliable compared to other variables. The choice to prioritize solar irradiance strikes a balance between model complexity, data availability, and the specific context of the present study. A simplified model is developed based on the efficiency loss of the photovoltaic power generation process and the electrolysis water hydrogen production process. The electricity production of photovoltaic cells and the hydrogen production of water electrolyzers are calculated separately to achieve functions such as local hydrogen production prediction, error analysis, and model performance evaluation. The specific steps of this process are as follows [17]:

(a) Calculation of electrical energy generated by solar photovoltaic cells: The prediction model used a single-crystal photovoltaic panel with a rated power of 252.83 W, an efficiency of 21%, and an area of 1.501 m². The electrical energy E_pv (kWh/m²) generated by the photovoltaic panel can be calculated using the following equation:

E_{p v} = G η_{p c} η_{p v}

(8)

where G is the solar irradiance, and

η_{p c}

is the power conditioning efficiency, which can reach up to 97%. However, according to the research in [17], a value of 85% for

η_{p c}

is more appropriate.

η_{p v}

represents the efficiency of the solar photovoltaic panel, with a value of 21%.

(b) Calculation of hydrogen production by the electrolyzer: The current calculation used a photovoltaic power generation system in conjunction with a PEM water electrolysis device for hydrogen production as a representative example. The PEM water electrolyzer is notable for its longevity, high efficiency, and ability to work well with the intermittent power produced by renewable energy sources, creating an efficient and stable secondary system [17]. However, the present calculation framework is not only limited to the PEM system but can also be extended to other electrolysis technologies, including AWE and AEM. AWE typically operates at lower current densities (200–500 mA/cm²) and higher cell voltages (1.8–2.4 V) compared to the PEM electrolysis [5], which affects the overall energy consumption and hydrogen yield. Similarly, the AEM electrolysis operates at lower temperatures (50–70 °C) and uses non-noble metal catalysts, which can influence the system efficiency and cost. The efficiency of the electrolyzer (

η_{e l e}

) and the specific energy consumption (kWh/Nm³) can be adjusted based on operational conditions and catalyst performance to adapt the calculations for various hydrogen production technologies. Typically, the AWE systems have an efficiency range of 60–80%, with the energy consumption varying between 4.5 and 7.5 kWh/m³. For the AEM electrolysis, the efficiency may be lower due to the challenges associated with membrane conductivity and catalyst activity. The present framework can be used to evaluate the performance of AWE, AEM, and other electrolysis systems by substituting the appropriate efficiency values and operational conditions. This adaptability highlights the flexibility of the present model in assessing hydrogen production across different technologies. The present selected PEM water electrolyzer can produce 1 kg of hydrogen with 53 kWh of electrical energy, exhibiting an efficiency that approaches 75%. The hydrogen production amount MH₂ (kg/m²) can be calculated using the following equation:

M H_{2} = \frac{E_{p v} η_{e l e}}{H H V_{H 2}}

(9)

where E_pv is the electrical energy generated by the solar photovoltaic cells,

η_{e l e}

is the efficiency of the PEM electrolyzer, taken as 75%, and HHV_H₂ is the higher heating value of hydrogen, which is 39.4 kWh/kg.

3. Results

To this end, the photovoltaic hydrogen production capacity in Lanzhou city was selected as the prediction target. The LSTM network model was used to predict solar irradiance, and the hydrogen production for the day was obtained in combination with the model. The loss performance of the LSTM network model during the training process on the training and testing sets is depicted in Figure 7. The results indicate that the quality of the preprocessed dataset is high as all the parameters are randomly initialized, with the training loss rapidly decreasing and reaching a stable state during the training process. When trained for 66 epochs, the loss no longer decreases and remains stable at a value of 3.64 × 10⁻³. The trend of decreasing loss values reflects the progression of the model from a randomly initialized state to one that has learned the salient features of the data. This trend not only validates the efficacy of the selected model but also underscores the importance of carefully configuring training strategies and hyperparameters to enhance model performance. The eventual stabilization of the loss value at a low level indicates that the model has successfully learned the operational patterns of the photovoltaic hydrogen production system and possesses robust predictive capabilities. The loss curves obtained on the training and validation sets are highly consistent, suggesting that the current model training quality is good, and there are no signs of underfitting or overfitting.

As demonstrated in Figure 8, the LSTM multivariate prediction model demonstrates a capacity for relatively precise forecasting of solar irradiance over a specified timeframe. The predicted data values exhibit periodicity, exhibiting peak values in late summer and early autumn and nadirs in winter. For the period of winter exhibiting minimal variation in solar irradiance, the forecasted values demonstrate a close alignment with the observed values. However, during summer, the significant fluctuations in solar irradiance data lead to an increase in the prediction error.

The present work used the following metrics to evaluate the quality of the forecast results:

(1) Th root mean square error (RMSE) is the square root of the ratio between the sum of the squares of the differences between the predicted values

{\hat{y}}_{i}

and the actual values

y_{i}

and the number of observations N.

R M S E = \sqrt{\frac{Σ_{i = 1}^{N} {({\hat{y}}_{i} - y_{i})}^{2}}{N}}

(10)

(2) Mean absolute error (MAE) is calculated by taking the mean of the absolute differences between the forecasted and actual outcomes.

M A E = \frac{1}{N} \sum_{i = 1}^{N} |{\hat{y}}_{i} - y_{i}|

(11)

(3) The mean absolute percentage error (MAPE) is an indicator that measures the average percentage deviation between the forecast and the actual value. It quantifies how much the forecast deviates from the actual value, with a scale ranging from 0 to infinity. An ideal forecasting model would have a MAPE of 0%, indicating no error. Conversely, a MAPE exceeding 100% suggests significant inaccuracies in the model’s predictions.

M A P E = \frac{1}{N} \sum_{i = 1}^{N} |\frac{{\hat{y}}_{i} - y_{i}}{y_{i}}|

(12)

(4) The coefficient of determination, denoted as R² or the fitness degree, indicates the association between the dependent and independent variables and assesses the overall fit of the regression model. R² does not represent the square of a number; hence, it can take negative values, with a range of [-∞, 1].

R^{2} = 1 - \frac{\sum_{i = 1}^{N} ({\hat{y}}_{i} - y_{i})}{\sum_{i = 1}^{N} ({\hat{y}}_{i} - {\bar{y}}_{i})}

(13)

Based on the evaluation results of the RMSE model index for the multivariate model, the multivariate model has a prediction RMSE value of 33.64, an MAE value of 4.59, and a MAPE value of 0.001; these values are acceptable compared to the range of solar irradiance values. The R² value is 0.99, indicating that the model can explain the variability of the target variable well; that is, the model has a high degree of fit to the observed data. This shows that the difference between the model predictions and the actual observations is minimal, indicating that the model effectively captures the underlying patterns and trends in the data.

Using the solar irradiance predicted by the model in combination with the photovoltaic hydrogen production prediction model, it is possible to estimate changes in hydrogen production over a period of time. To compare the changes in photovoltaic hydrogen production capacity over different seasons, a two-month observation period is selected here to compare the actual and predicted hydrogen production changes during the summer (July–August) and winter (November–December) in the Lanzhou area, with error bars plotted based on the absolute error between the forecasted and actual results, as shown in Figure 9. The results indicate that daily hydrogen production in summer is significantly higher and more variable compared to that in winter. This difference is primarily due to variations in solar radiation intensity between the two seasons. Solar radiation intensity exhibits significant seasonal fluctuations throughout the year. In summer, the solar elevation angle is higher, and the duration of sunlight is longer, resulting in a substantial increase in the amount of solar radiation received per unit area. In contrast, in winter, the solar elevation angle is lower, and the daylight hours are shorter, leading to a notable reduction in the solar radiation reaching the ground. This pronounced cyclical variation directly contributes to the differences in the production capacity of photovoltaic hydrogen generation systems across seasons. Furthermore, meteorological conditions are another factor responsible for the variations. Although summer experiences more frequent rainfall, it typically occurs in the form of short-duration showers, after which the sky clears rapidly, allowing solar radiation to recover quickly. Short-duration rainfall has a minimal impact on photovoltaic power generation and may even enhance efficiency by cleaning dust off the surface of photovoltaic panels. Conversely, in winter, especially after snowfall, accumulated snow may freeze and cover the photovoltaic panels, obstructing the entry of solar radiation for a long time. If snow is not promptly cleared, the photovoltaic panels will be unable to effectively absorb solar energy, severely impairing power generation efficiency.

The observed prediction error in summer is greater than in winter. Higher R² scores and lower MAPE, RMSE, and MAE in winter indicate more accurate and stable forecasts. While the quantitative evaluation metrics for summer predictions are slightly worse than those for winter, they remain at a reliable level and provide valuable guidance for the operation of photovoltaic hydrogen production systems. The primary cause of forecast deviations in summer is the fluctuation of solar irradiance caused by short-term extreme weather events. The current prediction model uses data from the preceding 5 days as input, and the drastic fluctuations in this input data contribute to lower prediction accuracy. In contrast, in winter, the daily solar irradiance levels are relatively stable, resulting in smaller errors in data prediction.

Table 1 summarizes the key differences between the present work and the recent related studies in the field of photovoltaic hydrogen production prediction. This table provides a clearer understanding of the advancements and performance of the current method. The table highlights the methodologies, models used, datasets, and performance metrics of each study, enabling a direct comparison with the present results. The table shows that the present model achieves competitive prediction accuracy compared to other studies. The use of LSTM for time series prediction, combined with multivariable input (temperature, wind speed, humidity, and precipitation), allows the model to capture the complex dependencies in meteorological data, resulting in accurate and reliable predictions.

4. Conclusions

This study established a capacity prediction method using the LSTM network model to accurately predict the capacity of photovoltaic hydrogen production systems. Firstly, the LSTM network model was constructed. Secondly, a long-term time series of high-resolution ground meteorological element-driving datasets from the Third Pole region was selected as the data source for model training to train the solar irradiance prediction model, and meteorological factors, such as wind speed, temperature, humidity, and precipitation, were determined to constitute the dataset for the subsequent capacity prediction model. Finally, a multivariate photovoltaic hydrogen production capacity prediction model was constructed based on the predicted results of solar irradiance and meteorological factors, such as wind speed, temperature, humidity, and precipitation. The results show that the predicted hydrogen production agrees well with the actual values, with a low MAPE and a high R². The predicted hydrogen production in winter has a MAPE of 0.55% and an R² of 0.985, while the predicted hydrogen production in summer has a slightly higher MAPE of 0.61% and a lower R² of 0.968, due to higher irradiance levels and weather fluctuations. These metrics underscore the high accuracy and feasibility of the LSTM-based method for predicting photovoltaic hydrogen production capacity. The model effectively captures long-term dependencies in time series data, significantly enhancing prediction accuracy compared to conventional methods.

While the current model demonstrated effective predictive capabilities, there are still notable errors in regions with significant fluctuations, such as sudden changes in solar irradiance or abrupt weather shifts. To improve the performance of the present model for fluctuating conditions, future research will focus on incorporating additional meteorological parameters and higher-resolution training data to better capture rapid weather changes. Hybrid models combining machine learning techniques (e.g., LSTM) with traditional statistical methods (e.g., ARIMA or SARIMA) or attention mechanisms will be explored to address seasonal complexities and enhance prediction robustness. Additionally, dynamic parameters will replace fixed coefficients, such as real-time data on dust accumulation, snow cover, PV panel aging, and electrolyzer efficiency. High-resolution meteorological data and advanced sensor technologies will be leveraged to more accurately simulate complex real-world interactions. The purpose of these enhancements is to significantly improve prediction accuracy and robustness, optimizing the design and operation of photovoltaic hydrogen production systems and advancing the technology towards higher efficiency and broader applicability.

Author Contributions

Conceptualization, Q.H. and X.L.; methodology, S.L., Q.H. and M.Z.; software, M.Z.; formal analysis, Q.H., S.L. and M.Z.; writing—original draft preparation, Q.H. and M.Z.; writing—review and editing, X.L. and Z.W.; supervision, Z.W. and X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Shandong Province University Youth Entrepreneurship Talent Introduction Program, namely the Ultraviolet Pulsed Bright Light Disinfection Mobile Robot Project of Shandong Xiehe University–Medical Convenient Robot Technology Innovation Team under Project 2021KJ088.

Data Availability Statement

Data are available upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

The National Energy Administration Releases Statistical Data on the National Power Industry for 2023. Available online: https://www.nea.gov.cn/2024-01/26/c_1310762246.htm (accessed on 26 January 2024).
Ghosh, S.; Yadav, R. Future of photovoltaic technologies: A comprehensive review. Sustain. Energy Technol. Assess. 2021, 47, 101410. [Google Scholar] [CrossRef]
Song, H.; Luo, S.; Huang, H.; Deng, B.; Ye, J. Solar-Driven hydrogen production: Recent advances, challenges, and future perspectives. ACS Energy Lett. 2022, 7, 1043–1065. [Google Scholar] [CrossRef]
Liu, J.; Wang, J.; Tang, Y.; Jin, J.; Li, W. Solar photovoltaic–thermal hydrogen production system based on full-spectrum utilization. J. Clean. Prod. 2023, 430, 139340. [Google Scholar] [CrossRef]
Mostafa, E. Hydrogen production by water electrolysis technologies: A review. Results Eng. 2023, 20, 101426. [Google Scholar] [CrossRef]
Cheng, G.; Luo, E.; Zhao, Y.; Yang, Y.; Chen, B.; Cai, Y.; Wang, X.; Dong, C. Analysis and prediction of green hydrogen production potential by photovoltaic-powered water electrolysis using machine learning in China. Energy 2023, 284, 129302. [Google Scholar] [CrossRef]
Ozdemir, S.N.; Pektezel, O. Performance prediction of experimental PEM electrolyzer using machine learning algorithms. Fuel 2024, 378, 132853. [Google Scholar] [CrossRef]
Haider, S.A.; Sajid, M.; Iqbal, S. Forecasting hydrogen production potential in islamabad from solar energy using water electrolysis. Int. J. Hydrogen Energy 2021, 46, 1671–1681. [Google Scholar] [CrossRef]
Kabir, M.M.; Roy, S.K.; Alam, F.; Nam, S.Y.; Im, K.S.; Tijing, L.; Shon, H.K. Machine learning-based prediction and optimization of green hydrogen production technologies from water industries for a circular economy. Desalination 2023, 567, 116992. [Google Scholar] [CrossRef]
Adeli, K.; Nachtane, M.; Faik, A.; Rachid, A.; Tarfaoui, M.; Saifaoui, D. A deep learning-enhanced framework for sustainable hydrogen production from solar and wind energy in the Moroccan Sahara: Coastal regions focus. Energy Convers. Manag. 2024, 302, 118084. [Google Scholar] [CrossRef]
Akhter, M.N.; Mekhilef, S.; Mokhlis, H.; Ali, R.; Usama, M.; Muhammad, M.A.; Khairuddin, A.S.M. A hybrid deep learning method for an hour ahead power output forecasting of three different photovoltaic systems. Appl. Energy 2022, 307, 118185. [Google Scholar] [CrossRef]
Javaid, A.; Javaid, U.; Sajid, M.; Rashid, M.; Uddin, E.; Ayaz, Y.; Waqas, A. Forecasting Hydrogen Production from Wind Energy in a Suburban Environment Using Machine Learning. Energies 2022, 15, 8901. [Google Scholar] [CrossRef]
Ruhani, B.; Moghaddas, S.A.; Kheradmand, A. Hydrogen production via renewable-based energy system: Thermoeconomic assessment and Long Short-Term Memory (LSTM) optimization approach. Int. J. Hydrogen Energy 2024, 52, 505–519. [Google Scholar] [CrossRef]
Kazi, M.-K.; Eljack, F. Practicality of Green H2 Economy for Industry and Maritime Sector Decarbonization through Multiobjective Optimization and RNN-LSTM Model Analysis. Ind. Eng. Chem. Res. 2022, 61, 6173–6189. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Understanding LSTM Networks. Available online: https://colah.github.io/posts/2015-08-Understanding-LSTMs/ (accessed on 24 January 2024).
Yang, K.; Jiang, Y.; Tang, W.; He, J.; Shao, C.; Zhou, X.; Lu, H.; Chen, Y.; Li, X.; Shi, J. A High-Resolution Near-Surface Meteorological Forcing Dataset for the Third Pole Region (TPMFD, 1979–2022). National Tibetan Plateau/Third Pole Environment Data Center, 2023. Available online: https://cstr.cn/18406.11.Atmos.tpdc.300398 (accessed on 16 August 2024).
Boudries, R.; Dizene, R. Potentialities of hydrogen production in Algeria. Int. J. Hydrogen Energy 2008, 33, 4476–4487. [Google Scholar] [CrossRef]
Haider, S.A.; Sajid, M.; Sajid, H.; Uddin, E.; Ayaz, Y. Deep learning and statistical methods for short- and long-term solar irradiance forecasting for Islamabad. Renew. Energy 2022, 198, 51–60. [Google Scholar] [CrossRef]

Figure 1. Structures of photovoltaic hydrogen production system. (a) Off-grid photovoltaic hydrogen production system; (b) grid-connected photovoltaic hydrogen production system.

Figure 2. LSTM network structure.

Figure 3. Distribution of ground meteorological elements in the Third Pole region.

Figure 4. Daily solar irradiance variations from 2013 to 2022.

Figure 5. Correlation analysis of ground meteorological element variables. Abbreviations: IR—Solar Irradiance (W/m²); PP—Precipitation (mm/h); P—Atmospheric Pressure (hPa); RH—Relative Humidity (kg/kg); T—Temperature (K); WS—Wind Speed (m/s).

Figure 6. Normalization of the meteorological parameters: (a) original values; (b) normalized values.

Figure 7. Loss variation curve.

Figure 8. Comparison between predicted and actual solar irradiance values: (a) original values; (b) normalized values (the dashed line represents the predictions matching the true values).

Figure 9. Prediction results of daily average photovoltaic hydrogen production capacity in summer and winter.

Table 1. Comparative analysis of this study with related literature.

Study	Model Used	Dataset/Region	Key Metrics	Performance
This study	LSTM	Weather data from Lanzhou, China (2013–2022)	R²: 0.968 (summer) R²: 0.985 (winter)	This work used LSTM for time series prediction of solar irradiance and hydrogen production, achieving high accuracy with multivariable input.
Haider et al. (2021) [8]	Prophet Algorithm	Weather data from Islamabad, Pakistan (13 months)	R²: 0.983	The Prophet algorithm was used for forecasting and demonstrated the best performance in terms of R² scores compared to other methods.
	SGD	Weather data from Islamabad, Pakistan (13 months)	R²: 0.969	Performed well but slightly worse than Prophet in terms of R² and error metrics.
	SARIMAX	Weather data from Islamabad, Pakistan (13 months)	R²: 0.966	Performed the worst among the three models, with the highest MAPE and lowest R² score.
Cheng et al. (2023) [6]	SVM	Weather data from four climate zones in China: MPZ, SMZ, TCZ, TMZ (2021–2022)	R²: 0.968 (MPZ) 0.980 (SMZ) 0.955 (TCZ) 0.960 (TMZ)	SVM performed better than Prophet, with higher R² scores and lower RMSE values across all regions.
Cheng et al. (2023) [6]	FbProphet	Weather data from four climate zones in China: MPZ, SMZ, TCZ, TMZ (2021–2022)	R²: 0.855 (MPZ) 0.795 (SMZ) 0.811 (TCZ) 0.628 (TMZ)	Prophet performed worse than SVM, with lower R² scores and higher RMSE values, especially in the TMZ region.
Haider et al. (2022) [19]	ANN	Weather data from Islamabad, Pakistan (2015–2019)	R²: 0.987 (1 h ahead) 0.944 (3 h ahead) 0.924 (6 h ahead) 0.872 (24 h ahead)	Performed best for short-term forecasts, especially for 1 h and 3 h ahead.
	CNN	Weather data from Islamabad, Pakistan (2015–2019)	R²: 0.977 (1 h ahead) 0.938 (3 h ahead) 0.922 (6 h ahead) 0.817 (24 h ahead)	Performed well for short-term forecasts, especially 1 h and 3 h ahead.
	LSTM	Weather data from Islamabad, Pakistan (2015–2019)	R²: 0.984 (1 h ahead) 0.913 (3 h ahead) 0.455 (6 h ahead) 0.109 (12 h ahead)	Performed best for 1 h and 3 h ahead forecasts, but the performance declined for longer horizons.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

He, Q.; Zhao, M.; Li, S.; Li, X.; Wang, Z. Machine Learning Prediction of Photovoltaic Hydrogen Production Capacity Using Long Short-Term Memory Model. Energies 2025, 18, 543. https://doi.org/10.3390/en18030543

AMA Style

He Q, Zhao M, Li S, Li X, Wang Z. Machine Learning Prediction of Photovoltaic Hydrogen Production Capacity Using Long Short-Term Memory Model. Energies. 2025; 18(3):543. https://doi.org/10.3390/en18030543

Chicago/Turabian Style

He, Qian, Mingbin Zhao, Shujie Li, Xuefang Li, and Zuoxun Wang. 2025. "Machine Learning Prediction of Photovoltaic Hydrogen Production Capacity Using Long Short-Term Memory Model" Energies 18, no. 3: 543. https://doi.org/10.3390/en18030543

APA Style

He, Q., Zhao, M., Li, S., Li, X., & Wang, Z. (2025). Machine Learning Prediction of Photovoltaic Hydrogen Production Capacity Using Long Short-Term Memory Model. Energies, 18(3), 543. https://doi.org/10.3390/en18030543

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu