1. Introduction
Aquaculture, as a crucial economic industry globally, provides an abundance of food resources for humanity. China is the world’s largest aquaculture producer, accounting for more than 60% of the total [
1]. In recent years, the discharge of contaminated water from Japanese nuclear facilities has led to severe marine pollution, sparking deep concerns among consumers about the safety of seafood. Against this backdrop, consumers have actively sought out seafood from reliable sources with guaranteed quality, which has promoted the development of freshwater aquaculture markets. Compared to marine aquaculture, freshwater aquaculture offers higher control and regulatory capabilities, thus better ensuring the safety of the aquaculture environment and the quality of aquatic products. Meanwhile, the “land-based seafood farming” model in Xinjiang, China, has gradually garnered attention. This model shifts seafood farming to land, utilizing saline/alkali soil to simulate a marine environment for stable farming. It not only enriches the seafood market but also alleviates the resource and environmental pressures on marine aquaculture. Furthermore, by leveraging Xinjiang’s geographical advantages and scientific farming techniques, this model provides a new avenue for food safety and the sustainable development of aquaculture [
2].
Dissolved oxygen (DO) is one of the crucial water quality parameters used for assessing water body quality, whether in freshwater aquaculture or land-based marine aquaculture [
3,
4]. An appropriate DO content (above 5.0 mg/L) can ensure the growth and development of aquatic organisms, while a low DO content (below 3.0 mg/L) might hinder biological growth, resulting in considerable economic losses and even mortality [
5]. Therefore, maintaining DO content above the minimum value is essential. Currently, most aquaculture farms regulate the DO content with periodic oxygenation. This approach not only results in a low DO content in cases of untimely oxygenation but also increases the zero-output energy consumption in cases of failure to halt oxygenation in a timely way following DO saturation.
Accurate prediction of the DO content provides a basis for water quality monitoring and control, achieving early warning and scientific water quality management objectives [
6], reducing fish disease outbreaks, and improving aquaculture efficiency, all of which have significant economic value. Simultaneously, by anticipating the DO content, accurately regulating the operation time of oxygenation equipment can prevent continued oxygenation after DO saturation, thus conserving energy and reducing carbon emissions. This is critical for sustainable development and environmental protection, especially given the current societal emphasis on ecological balance and resource conservation, and it has significant academic and practical value.
However, DO exhibits nonlinearity, strong coupling, and temporal variability, making simple modeling challenging [
7]. Scholars around the world have used various methods to predict DO concentrations. Traditional prediction models such as linear regression, autoregressive integrated moving averages (ARIMAs), and the Markov model are widely used. For example, Li et al. used a discrete hidden Markov model and K-means clustering to predict DO turbidity and saturation [
8]. The results showed that the model possessed a simple structure and strong interpretability, and it achieved satisfactory predictive accuracy. However, statistical models have inherent limitations, while water quality parameters are often jointly influenced by various nonlinear factors such as temperature, pH levels, and organic matter content. Therefore, traditional statistical models have inherent limitations in predicting and analyzing these water quality parameters [
9].
Machine learning models such as support vector machine (SVM), extreme learning machine (ELM), and random forest (RF) have been extensively utilized for DO prediction in different environments. For example, Feng et al. proposed a hybrid model, WTD-GWO-SVR, that combines wavelet threshold denoising (WTD), grey wolf optimization (GWO), and support vector regression (SVR) to accurately predict the DO content in aquaculture environments [
10]. Nong et al. proposed an SVR model that combines multiple intelligent technologies to accurately predict DO in complex environments [
11]. Shi et al. used an optimized regularized extreme learning machine (RELM) model, factor extraction, and K-medoids clustering to accurately predict DO in black bass culture ponds. They improved the prediction accuracy of DO variations under different weather conditions by using day/night cycle segmentation and clustering mechanisms [
12]. Although machine learning models have been successfully applied in DO prediction, they have some limitations. The SVM model is efficient but challenging to train with large-scale samples and sensitive to kernel function selection [
13]. The ELM model is limited in its ability to handle nonlinear data, while the RF model is susceptible to overfitting [
14].
In recent years, deep learning has gained significant popularity in various fields as a widely applied method. Neural network models effectively correlate previous information with current tasks by mimicking the memory function of the human brain. Thus, they have received extensive attention and application in dealing with complex temporal problems. Huan et al. utilized a combination of GBDT and LSTM to establish a DO prediction model for ponds, which exhibited high prediction accuracy and generalization ability [
15]. However, we need to modify the model parameters based on subjective experience, which introduces a certain degree of uncertainty into the process. Guo et al. proposed a PCA-PFA-GRU hybrid prediction model for DO prediction in black bass aquaculture water bodies [
16]. This model reduced prediction error fluctuations and improved prediction accuracy by extracting key factors through principal component analysis (PCA) and optimizing critical parameters of GRU through a pathfinding algorithm (PFA). However, PCA-PFA requires the pre-determination of parameters such as the number of principal components, which limits its ability to deal with nonlinear data.
Barzegar et al. established LSTM and CNN models as well as their hybrid model CNN-LSTM to predict DO concentration. The experimental findings showed that the hybrid model, combining the advantages of LSTM and CNN, effectively captured water quality variables and performed well in prediction [
17]. Similarly, Hu et al. also developed a CNN-LSTM hybrid model to predict the daily minimum DO concentration in rivers, which showed greater stability than using only the LSTM model [
18]. However, despite the superior performance of the CNN-LSTM model in DO prediction, it still faces challenges such as difficulty in hyperparameter tuning and long training times.
In recent years, besides the integration of neural network models, attention mechanisms have also been increasingly incorporated into time series forecasting. For instance, Liu et al. developed an attention-based recurrent neural network (RNN) for both short-term and long-term predictions of dissolved oxygen levels. The experimental findings demonstrate that integrating attention mechanisms with RNNs enhances the model’s prediction accuracy [
19]. Wang et al. integrated spatial and temporal attention with LSTM to develop the STA-LSTM hybrid model. Specifically, the temporal attention mechanism emphasizes relevant information within the temporal dimension by assigning weights to information extracted at each time step, thereby enhancing prediction accuracy and improving model interpretability [
20]. However, the spatio-temporal attention mechanism is relatively complex, and there may be issues with timeliness when applied to dissolved oxygen prediction.
Liu et al. optimized the parameters of the BP neural network prediction model using an improved whale optimization algorithm (IWOA), thereby establishing a hybrid prediction model known as CEEMDAN-IWOA-BP. Their research findings indicate that the IWOA outperforms the standard WOA, genetic algorithm (GA), and PSO in optimizing BP neural network parameters for water quality characteristic prediction [
21]. However, the study failed to compare IWOA with IPSO, making it impossible to determine the advantages and disadvantages of the two. Ji et al. proposed a hybrid model combining an IPSO algorithm with an LSTM network for stock price prediction. By introducing an adaptive mutation factor and adopting a nonlinear inertia weight adjustment strategy, the study effectively optimized the hyperparameters of the LSTM network, significantly boosting the model’s prediction performance [
22]. Sun et al. constructed a predictive model integrating fish behavior and aquatic environmental factors, based on a BP neural network enhanced with an IPSO algorithm incorporating an inverse sine decreasing inertia weight strategy (IPSO-BPNN), to accurately predict the number of carp suffering from hypoxia [
23]. IPSO demonstrates superior parameter selection performance and strong adaptability across various domains, yet its application in dissolved oxygen prediction is still inadequate.
To address the above issues, this study proposes the IPSO-CNN-GRU-TAM model, which integrates improved particle swarm optimization, the temporal attention mechanism, the convolutional neural network, and the gated recurrent unit to model and predict the DO content in aquaculture. The aim is to increase prediction accuracy, verify the reliability and accuracy of the model, and provide a decision-making basis for DO regulation.
3. Data Modeling Methodology
Deep learning models, in contrast to traditional statistical methods, exhibit automatic feature extraction and autonomous learning characteristics. They are not influenced by the assumption of data normality. Therefore, deep learning models have been widely applied in dealing with nonlinear and non-stationary data. In this section, we introduce the deep learning methods employed in this study, including CNN, GRU, TAM, IPSO, and the comprehensive prediction model (IPSO-CNN-GRU-TAM).
3.1. Convolutional Neural Network (CNN)
Due to the multitude of factors affecting DO and the intricate coupling relationships between them, this study employs a one-dimensional convolutional neural network (1D-CNN) for feature extraction. Compared to a 2D convolutional neural network (2D-CNN) or traditional RNN, the 1D-CNN offers higher efficiency and better performance when dealing with low-dimensional data with temporal characteristics. They are more suitable for uncovering potential information among input variables. This method explores the intrinsic connections of the data through local perception and weight sharing, effectively extracting deep features from the data [
28]. The feature extraction formula is as follows:
where Y is the result after feature extraction; σ is the activation function; W is the weight; X is the input sequence; and b is the bias term.
The CNN mainly consists of convolutional layers and pooling layers. The convolutional layer is responsible for extracting parameter features of the input sequence, which are then passed to the pooling layer, and finally, the results are obtained through the fully connected layer.
Figure 2 illustrates the structure of the CNN.
3.2. Gated Recurrent Unit (GRU)
The GRU, first introduced by Chung et al. in 2014, is an optimized model based on LSTM and is a variant of the RNN [
29]. It addresses the problems of gradient explosion and vanishing gradients when dealing with long-term dependencies by introducing gate structures. Compared to LSTM, the GRU network has only two gate structures: the reset gate and the update gate. The reset gate combines the current input information with past information, while the update gate retains memory information by setting time steps. Its structure is illustrated in
Figure 3.
Although LSTM and GRU exhibit similar prediction accuracy, the GRU model demonstrates faster convergence and lower computational costs. The computational formulas are as follows:
where
is the update gate,
is the reset gate,
is the input to the hidden layer,
represents the aggregation of input
and past hidden layer states
,
,
,
,
, and
are trainable parameter matrices, and
and
represent the sigmoid activation function and the tanh activation function, respectively.
3.3. Temporal Attention Mechanism (TAM)
Attention mechanisms are crucial tools in neural networks, playing a pivotal role in handling sequential data. They mimic the human process of allocating attention during information processing, enabling networks to dynamically focus on different parts of the input, thereby enhancing the model’s expressive and generalization capabilities [
30]. Traditional neural networks treat all inputs uniformly and are unable to distinguish the importance of different elements within a sequence, whereas attention mechanisms can dynamically assign weight to inputs. By learning these allocation patterns, networks can focus on task-relevant information while ignoring irrelevant data.
In constructing predictive models, since the variation in the target variable is influenced by historical states and the impact of water environment data at different time points vary in prediction outcomes, we introduce the TAM combined with CNN to enhance its performance, as illustrated in
Figure 4. This mechanism adaptively captures critical information from historical states and strengthens the influence of relevant temporal state information on current prediction outcomes. Initially, the model applies convolutional operations to input features, followed by dimension reduction through average and max pooling layers, generating two new sets of features. Subsequently, these two sets of features are concatenated and further compressed via convolution to produce a single-channel feature. Next, a sigmoid activation function is used to generate and normalize temporal attention weights. Finally, these weights are multiplied by the corresponding elements of the input features to produce weighted temporal features. The operational process of TAM is formulated as follows:
where
represents the output from the previous time step;
denotes convolutional operations;
and
are global average pooling and global max pooling operations on features in the channel dimension;
denotes feature concatenation;
is the sigmoid function;
is the weight vector of TAM;
denotes the output of TAM; and
represents element-wise multiplication.
TAM performance relies heavily on the training data and model parameter selection. Insufficient training data or inappropriate parameter settings can lead to overfitting or underfitting. This affects the model’s generalization ability and stability.
3.4. Improved Particle Swarm Optimization (IPSO)
The final prediction results are directly influenced by the CNN-GRU-AM model’s various parameters, including the number of hidden layer neurons, learning rate, and training epochs. Typically, it requires a considerable amount of experimentation to find suitable parameter settings, which can involve a high degree of subjectivity and uncertainty. To solve this problem, we introduce the PSO algorithm for parameter optimization. Kennedy and Eberhart first proposed the PSO algorithm in their research [
31]. PSO is a heuristic optimization algorithm inspired by observations of collective behavior in natural groups, such as birds or fish. Its basic procedure involves the following steps:
- (1)
Population initialization (with a population size of N), including random positions and velocities;
- (2)
Computing the fitness of each particle;
- (3)
Comparing the fitness of each particle with its individual best position (pbest). If the fitness is improved, the particle’s best position (pbest) is updated to the current position;
- (4)
Comparing the fitness of each particle with the swarm’s best position (gbest). If the fitness is improved, the swarm’s best position (gbest) is updated to the current position;
- (5)
Adjusting the velocity and position of each particle according to Formulas (15) and (16);
- (6)
Iterating from step (2) until the termination condition is met.
In these formulas, denotes the current iteration, represents the inertia weight, and are learning factors, and are random constants within the range [0, 1], represents the individual best position, and denotes the swarm’s best position.
The termination condition is typically set as the maximum number of iterations or achieving the desired prediction accuracy.
In researching complex optimization problems, the PSO algorithm shows significant effects. However, it often lacks effective parameter control, leading to slow convergence, a tendency to become stuck in local optima, and low precision in later iterations [
32]. To address these issues, we adjust the inertia weight and learning factors to balance exploration and exploitation. Exploration helps the algorithm find new solutions in the search space. Exploitation focuses on optimizing the already discovered high-quality solutions. This approach improves the overall performance of the algorithm.
3.4.1. Improved Inertia Weight
Based on previous research findings, the inertia weight is considered one of the most critical adjustable parameters in the PSO model [
33]. Proper selection of the inertia weight helps balance local and global search, reduce iteration numbers, and enhance the performance of the particle swarm optimization algorithm. As the number of iterations increases, the inertia weight exhibits a nonlinear decreasing trend, which significantly enhances particle convergence. Therefore, to achieve a better balance between local and global search, an improved inertia weight can be adjusted using a nonlinear hyperbolic tangent function, as shown in the following expression:
where
is the improved inertia weight for each particle,
is the minimum inertia weight,
is the maximum inertia weight,
is the current iteration, and
is the maximum number of iterations. We replace
in Equation (15) with
. Each particle’s inertia weight (
) is independently updated according to the above equation.
3.4.2. Improved Learning Factor
In the particle swarm optimization algorithm, the learning factors
and
are used to adjust the step size of particles moving between individual best positions and global best positions. As the iteration progresses, it is typically necessary to adjust the values of
and
to achieve better search performance at different stages. In general, larger values are needed in the initial stage to accelerate the search speed and enhance the global search capability, whereas larger values are needed in the later iterations for the local refinement search to improve accuracy [
34]. However, in the standard particle swarm optimization algorithm, setting (
=
= 2) often fails to meet practical application requirements. Therefore, to improve the computation of the learning factors, a sine function is introduced to adjust the values of
and
. The improved formula for computing the acceleration coefficients is as follows:
3.5. Evaluation Indicators
The quality of prediction results for different models can be difficult to determine with the naked eye alone, so it is necessary to use specific parameter indicators to measure their performance. This study selects root mean square error (RMSE), mean square error (MSE), mean absolute error (MAE), and coefficient of determination (R
2) to analyze and evaluate the prediction accuracy and generalization ability of various models [
35]. These metrics can objectively reflect the performance of models in prediction tasks and are closely related to the field of time series prediction.
3.6. IPSO-CNN-GRU-AM Model
3.6.1. The Structure of the Model
In time series prediction, CNNs can effectively extract local features from sequential data and uncover latent information. However, due to their local attention characteristics, they may overlook global information, limiting their effectiveness in capturing long-term dependencies. In contrast, the recurrent neural network structure of GRU is appropriate for detecting long-term dependencies in time series data, but it may not be as effective in feature extraction as the CNN.
This study overcomes their respective limitations by combining the CNN and GRU, fully leveraging the CNN’s sensitivity to local features and the GRU’s sequence modeling capabilities. First, the CNN receives various parameters influencing the DO content in water quality and extracts high-level features of input sequence data using convolutional layers. Then, through pooling layers, the main data features are filtered to reduce computational complexity. Subsequently, the GRU receives the important data features processed by the CNN in chronological order and utilizes their gate structures to capture the temporal information and long-term dependencies of sequential data. Thus, the advantages of the CNN and GRU complement each other, improving the accuracy and generalization ability of time series prediction.
On the basis of the CNN-GRU structure, attention mechanisms are introduced to further extract key information. The temporal attention mechanism updates the weight parameter matrix by assigning weights to different feature vectors, improving the model’s prediction accuracy. To address potential overfitting, we added a dropout layer before the fully connected layer. This reduces the model’s excessive reliance on training data. It improves the model’s generalization ability. Finally, the IPSO algorithm is employed to optimize the critical parameters of the model, further improving its performance and robustness. The overall structure of the IPSO-CNN-GRU-TAM model is illustrated in
Figure 5.
3.6.2. Algorithm Flow of the Model
The model algorithm flow is depicted in
Figure 6 and is described in detail as follows:
Step 1: Data preprocessing (cleaning, normalization) and division into training and testing sets at a ratio of 8:2;
Step 2: Initialize model parameters. Set the number of population particles, maximum iteration count T, and basic parameters such as fitness function threshold and initialize the basic parameters of the CNN-GRU-TAM model;
Step 3: Particle initialization, randomly assign initial velocities and positions to each particle;
Step 4: Map particle parameters (number of hidden layers, learning rate, iteration count) to the CNN-GRU-TAM model and perform training;
Step 5: Compute the fitness of particles using the RMSE function as the fitness function;
Step 6: Select individual and global best particles;
Step 7: Update the current velocities and positions of particles;
Step 8: Check if the maximum iteration count is reached. If the condition is met, stop iteration; otherwise, return to Step 5;
Step 9: Convert the best solution of particles into the optimal parameters of the model;
Step 10: Construct the CNN-GRU-TAM model using the computed optimal parameters and predict the output results.
3.7. Model Parameter Initialization
In this experiment, to simulate scenarios of application control on an embedded system (Raspberry Pi 5) in subsequent work, all models were built and trained using the Keras framework. The environment used Python 3.9 and TensorFlow 2.10.0. The hardware is an AMD Ryzen 7 7735HS processor with Radeon Graphics CPU @ 3.20 GHz, 16 GB RAM, and a 64-bit operating system. We used the CPU for the experiment to make the performance closer to that of actual embedded systems. To ensure fairness, except for the IPSO-CNN-GRU-TAM and Transformer model, the common hyperparameters for all models were set as follows: 128 neurons in hidden layers, a learning rate of 0.01, 1200 iterations, and the loss function is MAE. For the Transformer model, its hyperparameters were set based on previous research findings, specifically, a batch size of 32, 100 iterations, a learning rate of 0.002, 2 attention heads, 1 encoder layer, and the loss function is MAE [
36].
The initialization parameters for IPSO were as follows: population size P = 50, inertia weight
= 0.95,
= 0.3, maximum iteration count T = 200, and a fitness function threshold of 0.005 for particles in the population. The root mean square error (RMSE) was selected as the fitness function to evaluate the performance of the CNN-GRU-TAM model. The selection of IPSO parameters is based on extensive experiments and references [
37], achieving minimal error and optimal computational efficiency on the training dataset. The search ranges for the parameters were set as follows: hidden layer neuron count (hidden_size) in the range [16, 256], learning rate (learning_rate) in the range [0.001, 0.2], and number of epochs (num_epochs) in the range [200, 1500]. Through the IPSO algorithm optimization, the optimal parameter combination for the CNN-GRU-TAM model was determined as hidden_size = 17, learning_rate = 0.001, and num_epochs = 1493.
4. Results and Discussion
4.1. Analysis of the Ablation Experiment
To validate the effectiveness of the CNN, TAM, and IPSO components in the model proposed in this study, four comparative models were designed:
- (1)
GRU: A baseline model that excludes the CNN, TAM, and IPSO components;
- (2)
CNN-GRU: A model that excludes TAM and IPSO;
- (3)
CNN-GRU-TAM: A model that excludes IPSO;
- (4)
IPSO-CNN-GRU-TAM: The comprehensive model proposed in this study.
Figure 7 and
Figure 8 demonstrate that the model presented in this study surpasses the GRU, CNN-GRU, and CNN-GRU-TAM models in terms of prediction accuracy and curve tracking, with notably lower prediction errors. Additionally, it is evident that the CNN and TAM significantly influence the model’s predictive performance; their removal results in a substantial shift in the model’s prediction curve away from the actual values.
From
Table 2 and
Figure 9, it is evident that the RMSE of the CNN-GRU model, compared to the GRU model, decreased significantly by 32.36% and the R
2 increased markedly by 19.77%. This significant improvement mainly results from the introduction of the CNN architecture. Through convolution operations, CNNs efficiently extract key features from input data. This enhancement strengthens the model’s ability to capture local patterns in time series, leading to a substantial increase in prediction accuracy. Compared to the CNN-GRU model, the CNN-GRU-TAM model demonstrated a 34.09% reduction in RMSE and a 7.83% increase in R
2, indicating that the inclusion of TAM effectively captures intricate temporal dynamics and extracts dependency structures. Compared to the CNN-GRU-TAM model, the IPSO-CNN-GRU-TAM model reduced RMSE by 22.43%, enhanced R
2 by 2.22%, and decreased the running time by 93 s. This demonstrates that the IPSO algorithm precisely optimizes model parameters to better align with data characteristics, thus enhancing the model’s prediction accuracy, generalization capacity, and convergence speed while significantly reducing the running time. In conclusion, the integration of CNN, TAM, and IPSO significantly enhances the model’s predictive performance.
4.2. Comparative Analysis of Different Models in DO Prediction Performance
To validate the predictive performance of the proposed model, this study employed traditional models (RNN, LSTM), hybrid neural network models (BiLSTM, CNN-LSTM), an RNN and LSTM integrated with attention mechanisms (RNN-AM, LSTM-AM), the Transformer model, and a model combining IWOA with a CNN, GRU, and TAM (IWOA-CNN-GRU-TAM) to forecast sequences for the same time period. Through the IPSO algorithm optimization, the optimal parameter combination for the CNN-GRU-TAM model was determined as hidden_size = 46, learning_rate = 0.01, and num_epochs = 1258.
The DO content prediction results are depicted in
Figure 10. The RNN, LSTM, and Transformer models exhibit relatively large prediction errors. In contrast, the prediction curves for the BiLSTM, CNN-LSTM, RNN-AM, LSTM-AM, IWOA-CNN-GRU-TAM, and IPSO-CNN-GRU-TAM models are relatively similar. Upon closer inspection of the curve details, it is evident that the fitting curves of other models exhibit significant deviations at peak and valley points, whereas the fitting curve of the IPSO-CNN-GRU-TAM model more closely aligns with the true values, demonstrating superior accuracy and tracking performance in predicting the DO content.
However, curve fitting can only macroscopically assess the direct proximity between actual and predicted values, failing to provide a quantitative mathematical analysis.
Table 3 consolidates the performance metrics for the other models mentioned above and the IPSO-CNN-GRU-TAM model introduced in this study, while
Figure 11 presents visual representations of these metrics, namely, MAE, MAPE, RMSE, and R
2. The comparison of these evaluation metrics clearly shows that the IPSO-CNN-GRU-TAM model significantly outperforms the other models in all evaluation metrics, with MAE, RMSE, MSE, and R
2 at 0.0158, 0.0249, 0.0006, and 0.9682, respectively.
Compared to the other eight models, the MAE of this model was reduced by 78.91%, 77.23%, 74.56%, 65.87%, 62.56%, 62.20%, 59.38%, and 21%; the R2 increased by 74.99%, 46.25%, 38.95%, 19.37%, 15.41%, 12.03%, 11.99%, and 2.21%. Although the model takes slightly longer to run than the RNN, LSTM, BiLSTM, LSTM-AM, and IWOA-CNN-GRU-TAM models, it significantly enhances prediction accuracy, a crucial metric. Furthermore, its operational time of 30 s fully satisfies efficiency demands in practical applications. By comparing the IWOA-CNN-GRU-TAM and IPSO-CNN-GRU-TAM models, it is evident that both significantly reduce the runtime of the CNN-GRU-TAM model. However, IWOA does not enhance the prediction accuracy of the CNN-GRU-TAM model. Therefore, the IPSO algorithm proposed in this paper outperforms the IWOA algorithm.
Figure 12 illustrates the prediction errors of various models in dissolved oxygen prediction. The IPSO-CNN-GRU-TAM model proposed in this study exhibits the smallest fluctuations in its error curve, consistently hovering near 0. This demonstrates that the model, through deep exploration of the dataset, effectively captures hidden information that might be overlooked in correlation statistical analyses due to their relatively low correlation coefficients. Examples include the potential indirect or nonlinear impacts of temperature on dissolved oxygen dynamics. Consequently, the model offers more precise predictions of dissolved oxygen content in aquaculture environments. It provides a more reliable basis for the subsequent application in aquaculture.
4.3. Study on Model Generalization Ability
4.3.1. Cross-Validation Experiments
To validate the model’s applicability to different datasets, this study employed a fivefold cross-validation method. In this cross-validation, the dataset was randomly divided into five equal subsets. Each time, one subset served as the test set, and the remaining four subsets served as the training set, with five independent training and testing rounds. The final evaluation of the model’s performance relied on the average of the five test results. This method not only provided a comprehensive and objective assessment of the model’s performance but also helped detect whether the model overfitted or underfitted specific types of data, thereby validating the model’s stability and generalization ability.
Table 4 shows the cross-validation results. According to the results in
Table 4, the model’s performance across different subsets is observable. The model achieved an average MAE of 0.0200, an average RMSE of 0.0287, an average MSE of 0.0008, and an average R
2 of 0.9607 in the fivefold cross-validation. These metrics indicate that the model performed consistently across different data subsets, demonstrating good generalization ability.
4.3.2. Validation on Datasets of Varying Depths
To further validate the model’s generalization ability, we predicted dissolved oxygen content at different depths. In aquaculture, lake enclosure farming is another common method, with water depths typically ranging from 1 to 6 m. Therefore, we selected 4-m and 6-m depths for dissolved oxygen content prediction to verify the model’s accuracy. The prediction method remained consistent with the previous sections.
Figure 13 and
Figure 14 show the prediction results.
Figure 13 and
Figure 14 demonstrate that the IPSO-CNN-GRU-TAM model’s prediction curves closely match the actual measurement curves, indicating good prediction accuracy and tracking capability.
According to
Figure 15 and
Table 5 and
Table 6, the IPSO-CNN-GRU-TAM model performs excellently on datasets at 4-m and 6-m depths. Specifically, on the 4-m depth dataset, compared to the CNN-GRU and CNN-GRU-TAM models, the RMSE of the IPSO-CNN-GRU-TAM model decreased by 44.04% and 21.02%, respectively. On the 6-m depth dataset, the RMSE decreased by 55.05% and 35.51%, respectively. Additionally, the R
2 value of the IPSO-CNN-GRU-TAM model improved.
In summary, the IPSO-CNN-GRU-TAM model not only excels on datasets at different depths but also outperforms other models in terms of error and fit. This fully demonstrates the model’s strong generalization ability, effectively addressing challenges from various farming modes and maintaining stable prediction performance under different water depths. The model is suitable for dissolved oxygen content prediction in multiple aquaculture scenarios.
For more complex datasets, the model constructs a sophisticated framework to deeply mine hidden information within the data, demonstrating robust capabilities when handling intricate datasets. Furthermore, the model adopts the IPSO algorithm, leading to significant improvements in model parameter optimization and strong adaptability. The experimental results show that the model exhibits exceptional generalization across different sections of the same dataset and across datasets of varying depths. Therefore, it is reasonable to anticipate that the model will maintain good predictive performance over longer training periods and with even more complex datasets.
4.4. Future Work
The IPSO algorithm employed in this study only optimized three hyperparameters of the neural network. In future research, to enhance model performance, we intend to investigate a wider range of hyperparameters, such as the dropout rate and the choice of activation functions, and explore more efficient optimization strategies.
In future applications, we aim to conduct an in-depth exploration of the practical application effect of the model in aquaculture. We intend to deploy water quality sensors for real-time data collection and transmission to a Raspberry Pi 5 system. Utilizing the IPSO-CNN-GRU-TAM model, we will predict the DO content on the Raspberry Pi 5. If the predictions indicate that the content falls below 5 mg/L, the system will automatically activate the oxygenator; when saturation (DOsat) is achieved, it will close the oxygenator. This strategy enables precise regulation of DO, optimizing the aquaculture environment, enhancing cultivation efficiency, and effectively addressing the issue of energy wastage associated with the current periodic oxygenation practices in the aquaculture industry, thereby achieving energy conservation and carbon emission reduction.
Furthermore, we plan to apply this established embedded system and data acquisition platform to local aquaculture ponds for extensive data collection and DO prediction. Our objective is to further validate the model’s practicality and generalization capabilities, ensuring its excellent performance across various aquaculture environments and conditions. Through practical verification, we aim to comprehensively demonstrate the application value and extensive development potential of this model in aquaculture, contributing to the advancement of intelligent and efficient aquaculture industry development.