1. Introduction
According to the 2023 national power industry statistical data [
1], the total installed capacity of photovoltaic (PV) power generation in China has reached 610 million kilowatts, signifying a substantial advancement in the nation’s solar energy sector. This development is accompanied by continuous advancements in photovoltaic technology, a rapid decline in costs, and an increasing diversification of development models. Various “PV+” development models are constantly emerging, and the goal of achieving parity and diversification in the photovoltaic industry is within reach [
2]. However, as the industry progresses, challenges are becoming increasingly apparent. The inherent variability and cyclical patterns of photovoltaic power generation are leading to increased complexity in power grid management. This, in turn, results in the stability of photovoltaic power systems being less robust compared to conventional systems. The issue of poor stability in photovoltaic power systems is a significant constraint on the continuous large-scale development of photovoltaic power generation and is the primary problem that must be addressed at present. The key to solving this problem lies in the development of efficient and flexible energy storage methods. Hydrogen, which is an effective energy carrier, possesses significant benefits that can tackle the challenge. Integrating hydrogen generation and storage within the processes of renewable energy sources, like solar photovoltaics, can efficiently mitigate the instability inherent in photovoltaic power systems. Moreover, hydrogen is an environmentally benign fuel, and its storage facilitates broad applications across many sectors, including transportation, decentralized power generation for heating, chemical industries, and metallurgy. The vigorous development of the hydrogen energy industry, along with the utilization of hydrogen energy as a long-term energy storage medium for the new power system, can alleviate the spatiotemporal imbalance of electricity production and use, thereby enhancing the flexibility of the power grid. The establishment of a green hydrogen production system that is independent of the constraints imposed by the power grid can facilitate the development and construction process for new energy sources. This system can directly convert wind and solar resources into hydrogen energy for subsequent storage and utilization, which is of significant strategic and practical importance [
3].
In contemporary energy models, photovoltaic power generation has emerged as a pioneering approach for hydrogen production, characterized by its innovative utilization of photovoltaic systems to directly generate hydrogen via electrolysis. This model stands in contrast to conventional power plants, wherein the steps of inversion and voltage elevation are crucial components of the electrolysis process. Photovoltaic panels offer a high degree of flexibility in configuration, with the ability to be connected in series or parallel to meet the voltage and current demands of hydrogen production facilities. This flexibility contributes to the optimization of system performance. The technology for generating hydrogen through water electrolysis is currently reaching a state of maturity. The apparatus is straightforward, the operations and management are comparatively easy, and the hydrogen yielded is characterized by high purity, making it well suited for various applications. Photovoltaic hydrogen production systems can be categorized into off-grid PV hydrogen production systems and grid-connected PV hydrogen production systems based on the interaction between the hydrogen production system and the power grid [
3]; the basic structures of the two systems are shown in
Figure 1. The off-grid PV hydrogen production system is composed of a PV array, which consists of solar cell modules, a battery pack, a DC converter, an electrolysis device, and a hydrogen storage device. This type of system is a source-independent power generation system that does not require the participation of the main power grid. Consequently, it has superior flexibility and mobility. In contrast, the grid-connected photovoltaic hydrogen production system can be classified into two types: one featuring a common AC bus and the other a common DC bus, as shown in
Figure 1b. For DC bus hydrogen production, while constructing a PV/storage/hydrogen DC microgrid, it is necessary to interact with the power grid through a DC/AC conversion device. The AC bus hydrogen production has been widely adopted in the northwest and other regions rich in photovoltaic resources in China. This approach utilizes the stability of the larger power grid to support the PV hydrogen production system, and it leverages low-cost off-peak power to enhance equipment utilization and optimize the project’s economic viability.
The PV hydrogen production system primarily converts light energy directly or indirectly into electrical energy through photovoltaic or photochemical effects. The subsequent production and storage of hydrogen is facilitated by the electrolysis of water. In the context of practical applications, light energy is predominantly sourced from solar energy. Under constant technical conditions, the conversion efficiency of solar cell modules is typically constant, and the current state of photovoltaic systems exhibits limited absorption efficiency for solar radiation energy; typically, it does not exceed 20%. The output of the PV hydrogen production is directly related to the quantity of solar radiation arriving at the ground, which changes with meteorological conditions [
4]. To efficiently convert this electrical energy into hydrogen, various water electrolysis technologies are used [
5]. Current mainstream technologies include alkaline water electrolysis (AWE), proton exchange membrane electrolysis (PEM), solid oxide electrolysis (SOEC), and anion exchange membrane electrolysis (AEM). AWE is a mature and cost-effective technology suitable for large-scale applications, though it suffers from lower efficiency (60–80%) and electrode corrosion. PEM has high efficiency and the capacity to produce hydrogen of high purity, making it well suited for small-scale systems. However, PEM depends on costly noble metal catalysts. SOEC operates at high temperatures (900–1000 °C) with high efficiency but faces challenges such as a long startup time and material degradation. AEM combines the advantages of AWE and PEM, using low-cost materials, but its performance and durability require further improvement. Each technology has distinct characteristics, and the choice depends on specific application requirements, including scale, cost, and operational conditions.
Machine learning has achieved considerable success in a variety of fields in recent years, and its advanced strong self-learning capabilities and substantial data processing abilities have played an important role in the research processes of numerous fields. Given the substantial randomness inherent in the PV hydrogen production industry, the ability to predict hydrogen production capacity over a certain period using existing data would effectively enhance energy utilization efficiency. Machine learning encompasses a variety of algorithms, such as support vector regression (SVR), clustering algorithms, artificial neural networks (ANNs), and principal component analysis (PCA), which have been widely used in predictive issues related to industrial production. Cheng et al. [
6] used support vector machine (SVM) and FbProphet algorithms to predict annual hydrogen yields in regions with optimal solar radiation and available land. The regression coefficients (R
2) obtained from the test sets were above 0.95, confirming the accuracy of the model. The results demonstrate the high potential and environmental benefits of producing green hydrogen through PV-powered water electrolysis in China. Ozdemir et al. [
7] evaluated the electrochemical performance of proton exchange membrane (PEM) water electrolyzers using machine learning techniques. They identified SVM as the most effective method for predicting hydrogen flow rate, with a mean absolute error (MAE) of 0.0317, and current density, with an MAE of 0.0671, thus optimizing operational parameters for enhanced efficiency and durability. Haider et al. [
8] used various machine learning methods, including the Prophet algorithm, stochastic gradient descent (SGD), and seasonal autoregressive integrated moving average exogenous (SARIMAX), to predict the solar hydrogen production potential in Islamabad, Pakistan. A comparative analysis showed that the Prophet algorithm achieved an R
2 score of 0.983, outperforming SGD and SARIMAX, which scored 0.969 and 0.966, respectively. This indicates that the Prophet model demonstrated superior data-fitting capabilities compared to the other methods. The prediction results demonstrate a significant daily average production and highlight the region’s potential for green energy through a photovoltaic–electrolytic (PV-E) system. Kabir et al. [
9] highlighted significant challenges in scaling up green hydrogen production (GHP) technologies, particularly in yield prediction and process optimization. They applied various machine learning algorithms to predict and optimize GHP using the PEM technology. The results show that the K-nearest neighbor (KNN) model is the best-performing approach, achieving a high regression coefficient of 0.948, a low root mean squared error (RMSE) of 0.038, and a minimal MAE of 0.161. The integration of machine learning into the field of hydrogen production is a testament to the versatility and adaptability of these algorithms. However, the nonlinear transformation ability and feature representation capabilities of machine learning models are relatively weak and often fail to achieve satisfactory prediction accuracy.
Deep learning models have become the preferred method for complex modeling tasks due to their powerful nonlinear transformation capabilities. Over the past few years, the thriving advancement of deep learning has provided efficient solutions for various industrial applications. Recurrent neural networks (RNNs) have become prevalent in sequence modeling tasks due to their capability to grasp contextual information. Adeli et al. [
10] used an RNN model to forecast Morocco’s prospective energy production capacity and hydrogen production potential in the next decade. Akhter et al. [
11] developed and evaluated a hybrid deep learning approach (SSA-RNN-LSTM) for short-term power yield prediction in three distinct PV systems, demonstrating superior precision and robustness compared to other models over a four-year data period. Javaid et al. [
12] assessed the feasibility of harnessing wind power for hydrogen generation using an LSTM model. Ruhani et al. [
13] used the LSTM model to achieve optimized scheduling of hydrogen energy systems. Kazi and Eljack [
14] accurately predicted the future hydrogen demand of the maritime sector using the LSTM model. These applications underscore the versatility and effectiveness of deep learning models in tackling complex industrial problems. As deep learning continues to evolve, its integration into energy prediction and optimization promises to drive significant advancements in efficiency and sustainability across various sectors.
Photovoltaic hydrogen production, which combines photovoltaic power generation with water electrolysis, has emerged as a key technology for achieving energy sustainability. However, photovoltaic power generation is highly susceptible to various meteorological and environmental factors and exhibits significant intermittency and variability. These characteristics can lead to the unstable efficiency of water electrolysis for hydrogen production, reduce the overall equipment utilization rate, and increase the unit hydrogen production cost. Therefore, a comprehensive understanding of the patterns of solar radiation reaching the ground, combined with timely and accurate forecasts, can facilitate efficient power scheduling, prevent electricity waste or shortages, and enhance the efficiency and output of hydrogen production. From a broader perspective, predicting the capacity of photovoltaic hydrogen production systems can help identify the advantages and limitations of different regions for PV hydrogen production. This, in turn, enables the prioritization of regions with greater hydrogen production potential. Furthermore, such insights can guide policymakers and industry stakeholders in optimizing resource allocation and promoting sustainable economic development in the renewable energy sector.
The objective of this study is to develop a photovoltaic hydrogen production capacity prediction model using the LSTM neural networks to address the challenges posed by the effects of the temporal variability of weather conditions on photovoltaic systems. The model aims to accurately predict daily solar radiation intensity by integrating key meteorological data, such as temperature, wind speed, precipitation, and humidity, and thereby estimate regional hydrogen production capacity. The ultimate goal is to provide an economical, efficient, and widely applicable tool for predicting photovoltaic hydrogen production capacity. This tool will not only mitigate the instability of photovoltaic power systems and optimize system operations but also promote the use of hydrogen as a sustainable energy carrier and enhance energy utilization efficiency. These objectives hold significant theoretical and practical importance for advancing renewable energy technologies and achieving green energy transition goals.
In this study, an LSTM model is used to predict photovoltaic hydrogen production, leveraging its superior ability to capture long-term dependencies in time series data, particularly for meteorological variables. Unlike traditional machine learning models, such as SVM and ANN, LSTM excels at nonlinear transformations and the feature representation of time series data, enabling more accurate predictions. The present approach incorporates solar irradiance and other meteorological parameters, including temperature, wind speed, humidity, and precipitation, to construct a multivariate prediction model. This comprehensive input framework more effectively reflects the impact of meteorological conditions on hydrogen production, significantly enhancing prediction accuracy. Furthermore, the present model is trained and tested on a decade-long dataset (2013–2022). This extended temporal coverage distinguishes this study from those relying on short-term data, as it captures a wider range of climatic variations, seasonal patterns, and long-term trends. The inclusion of such a comprehensive dataset enhances the model’s generalization capabilities, enabling it to perform robustly across diverse and unpredictable environmental conditions. These aspects collectively underscore the novelty and robustness of the present work in advancing the field of photovoltaic hydrogen production predictions.
3. Results
To this end, the photovoltaic hydrogen production capacity in Lanzhou city was selected as the prediction target. The LSTM network model was used to predict solar irradiance, and the hydrogen production for the day was obtained in combination with the model. The loss performance of the LSTM network model during the training process on the training and testing sets is depicted in
Figure 7. The results indicate that the quality of the preprocessed dataset is high as all the parameters are randomly initialized, with the training loss rapidly decreasing and reaching a stable state during the training process. When trained for 66 epochs, the loss no longer decreases and remains stable at a value of 3.64 × 10
−3. The trend of decreasing loss values reflects the progression of the model from a randomly initialized state to one that has learned the salient features of the data. This trend not only validates the efficacy of the selected model but also underscores the importance of carefully configuring training strategies and hyperparameters to enhance model performance. The eventual stabilization of the loss value at a low level indicates that the model has successfully learned the operational patterns of the photovoltaic hydrogen production system and possesses robust predictive capabilities. The loss curves obtained on the training and validation sets are highly consistent, suggesting that the current model training quality is good, and there are no signs of underfitting or overfitting.
As demonstrated in
Figure 8, the LSTM multivariate prediction model demonstrates a capacity for relatively precise forecasting of solar irradiance over a specified timeframe. The predicted data values exhibit periodicity, exhibiting peak values in late summer and early autumn and nadirs in winter. For the period of winter exhibiting minimal variation in solar irradiance, the forecasted values demonstrate a close alignment with the observed values. However, during summer, the significant fluctuations in solar irradiance data lead to an increase in the prediction error.
The present work used the following metrics to evaluate the quality of the forecast results:
(1) Th root mean square error (RMSE) is the square root of the ratio between the sum of the squares of the differences between the predicted values
and the actual values
and the number of observations
N.
(2) Mean absolute error (MAE) is calculated by taking the mean of the absolute differences between the forecasted and actual outcomes.
(3) The mean absolute percentage error (MAPE) is an indicator that measures the average percentage deviation between the forecast and the actual value. It quantifies how much the forecast deviates from the actual value, with a scale ranging from 0 to infinity. An ideal forecasting model would have a MAPE of 0%, indicating no error. Conversely, a MAPE exceeding 100% suggests significant inaccuracies in the model’s predictions.
(4) The coefficient of determination, denoted as
R2 or the fitness degree, indicates the association between the dependent and independent variables and assesses the overall fit of the regression model.
R2 does not represent the square of a number; hence, it can take negative values, with a range of [-∞, 1].
Based on the evaluation results of the RMSE model index for the multivariate model, the multivariate model has a prediction RMSE value of 33.64, an MAE value of 4.59, and a MAPE value of 0.001; these values are acceptable compared to the range of solar irradiance values. The R2 value is 0.99, indicating that the model can explain the variability of the target variable well; that is, the model has a high degree of fit to the observed data. This shows that the difference between the model predictions and the actual observations is minimal, indicating that the model effectively captures the underlying patterns and trends in the data.
Using the solar irradiance predicted by the model in combination with the photovoltaic hydrogen production prediction model, it is possible to estimate changes in hydrogen production over a period of time. To compare the changes in photovoltaic hydrogen production capacity over different seasons, a two-month observation period is selected here to compare the actual and predicted hydrogen production changes during the summer (July–August) and winter (November–December) in the Lanzhou area, with error bars plotted based on the absolute error between the forecasted and actual results, as shown in
Figure 9. The results indicate that daily hydrogen production in summer is significantly higher and more variable compared to that in winter. This difference is primarily due to variations in solar radiation intensity between the two seasons. Solar radiation intensity exhibits significant seasonal fluctuations throughout the year. In summer, the solar elevation angle is higher, and the duration of sunlight is longer, resulting in a substantial increase in the amount of solar radiation received per unit area. In contrast, in winter, the solar elevation angle is lower, and the daylight hours are shorter, leading to a notable reduction in the solar radiation reaching the ground. This pronounced cyclical variation directly contributes to the differences in the production capacity of photovoltaic hydrogen generation systems across seasons. Furthermore, meteorological conditions are another factor responsible for the variations. Although summer experiences more frequent rainfall, it typically occurs in the form of short-duration showers, after which the sky clears rapidly, allowing solar radiation to recover quickly. Short-duration rainfall has a minimal impact on photovoltaic power generation and may even enhance efficiency by cleaning dust off the surface of photovoltaic panels. Conversely, in winter, especially after snowfall, accumulated snow may freeze and cover the photovoltaic panels, obstructing the entry of solar radiation for a long time. If snow is not promptly cleared, the photovoltaic panels will be unable to effectively absorb solar energy, severely impairing power generation efficiency.
The observed prediction error in summer is greater than in winter. Higher R2 scores and lower MAPE, RMSE, and MAE in winter indicate more accurate and stable forecasts. While the quantitative evaluation metrics for summer predictions are slightly worse than those for winter, they remain at a reliable level and provide valuable guidance for the operation of photovoltaic hydrogen production systems. The primary cause of forecast deviations in summer is the fluctuation of solar irradiance caused by short-term extreme weather events. The current prediction model uses data from the preceding 5 days as input, and the drastic fluctuations in this input data contribute to lower prediction accuracy. In contrast, in winter, the daily solar irradiance levels are relatively stable, resulting in smaller errors in data prediction.
Table 1 summarizes the key differences between the present work and the recent related studies in the field of photovoltaic hydrogen production prediction. This table provides a clearer understanding of the advancements and performance of the current method. The table highlights the methodologies, models used, datasets, and performance metrics of each study, enabling a direct comparison with the present results. The table shows that the present model achieves competitive prediction accuracy compared to other studies. The use of LSTM for time series prediction, combined with multivariable input (temperature, wind speed, humidity, and precipitation), allows the model to capture the complex dependencies in meteorological data, resulting in accurate and reliable predictions.
4. Conclusions
This study established a capacity prediction method using the LSTM network model to accurately predict the capacity of photovoltaic hydrogen production systems. Firstly, the LSTM network model was constructed. Secondly, a long-term time series of high-resolution ground meteorological element-driving datasets from the Third Pole region was selected as the data source for model training to train the solar irradiance prediction model, and meteorological factors, such as wind speed, temperature, humidity, and precipitation, were determined to constitute the dataset for the subsequent capacity prediction model. Finally, a multivariate photovoltaic hydrogen production capacity prediction model was constructed based on the predicted results of solar irradiance and meteorological factors, such as wind speed, temperature, humidity, and precipitation. The results show that the predicted hydrogen production agrees well with the actual values, with a low MAPE and a high R2. The predicted hydrogen production in winter has a MAPE of 0.55% and an R2 of 0.985, while the predicted hydrogen production in summer has a slightly higher MAPE of 0.61% and a lower R2 of 0.968, due to higher irradiance levels and weather fluctuations. These metrics underscore the high accuracy and feasibility of the LSTM-based method for predicting photovoltaic hydrogen production capacity. The model effectively captures long-term dependencies in time series data, significantly enhancing prediction accuracy compared to conventional methods.
While the current model demonstrated effective predictive capabilities, there are still notable errors in regions with significant fluctuations, such as sudden changes in solar irradiance or abrupt weather shifts. To improve the performance of the present model for fluctuating conditions, future research will focus on incorporating additional meteorological parameters and higher-resolution training data to better capture rapid weather changes. Hybrid models combining machine learning techniques (e.g., LSTM) with traditional statistical methods (e.g., ARIMA or SARIMA) or attention mechanisms will be explored to address seasonal complexities and enhance prediction robustness. Additionally, dynamic parameters will replace fixed coefficients, such as real-time data on dust accumulation, snow cover, PV panel aging, and electrolyzer efficiency. High-resolution meteorological data and advanced sensor technologies will be leveraged to more accurately simulate complex real-world interactions. The purpose of these enhancements is to significantly improve prediction accuracy and robustness, optimizing the design and operation of photovoltaic hydrogen production systems and advancing the technology towards higher efficiency and broader applicability.