Open AccessArticle

Research on Maneuvering Motion Prediction for Intelligent Ships Based on LSTM-Multi-Head Attention Model

Dongyu Liu

Xiaopeng Gao

^*,

Cong Huo

and

Wentao Su

College of Ships and Oceanography, Naval University of Engineering, Wuhan 430033, China

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2025, 13(3), 503; https://doi.org/10.3390/jmse13030503

Submission received: 10 February 2025 / Revised: 26 February 2025 / Accepted: 2 March 2025 / Published: 5 March 2025

Download

Browse Figures

Figure 1
Define the ship’s coordinate system of motion. "> Figure 2
Turning motion data collection: (a) u in still water; (b) v in still water; (c) r in still water; (d) u in wave environment; (e) v in wave environment; and (f) r in wave environment. "> Figure 3
Zigzag motion data collection: (a) u in still water; (b) v in still water; (c) r in still water; (d) u in wave environment; (e) v in wave environment; and (f) r in wave environment. "> Figure 3 Cont.
Zigzag motion data collection: (a) u in still water; (b) v in still water; (c) r in still water; (d) u in wave environment; (e) v in wave environment; and (f) r in wave environment. "> Figure 4
Training, validation, and testing sets. "> Figure 5
LSTM model unit structure. "> Figure 6
Multi-Head Attention Mechanism structure. "> Figure 7
LSTM-Multi-Head Attention-1 Model Framework. "> Figure 8
LSTM-Multi-Head Attention-2 Model Framework. "> Figure 9
LSTM-Multi-Head Attention-3 Model Framework. "> Figure 10
Forecasting effects of the proposed models. "> Figure 11
RMSE and loss curves of the proposed models. "> Figure 12
Forecasting effects of models with different regularization methods. "> Figure 13
RMSE and loss curves with different regularization methods. "> Figure 14
Forecasting effects of models with different numbers of heads. "> Figure 15
RMSE and loss curves with different numbers of heads. "> Figure 16
Analysis of the impact of the number of neurons on model performance. "> Figure 17
RMSE and loss curves with different number of neurons. "> Figure 18
Forecasting effects of models with different training batch sizes. "> Figure 19
RMSE and loss curves with different training batch sizes. "> Figure 20
Analysis of the Impact of sliding window size. "> Figure 21
RMSE and loss curves with different sliding window size. "> Figure 22
Comparison of prediction effects among LSTM, GRU, Multi-Head Attention, Transformer, and LSTM-Multi-Head Attention-2 models. "> Figure 23
RMSE and loss curves of LSTM, GRU, Multi-Head Attention, Transformer, and LSTM-Multi-Head Attention-2 models. "> Figure 24
Prediction of u, v, r, and heading for an 8-degree turning movement. "> Figure 25
Prediction of u, v, r, and heading for a 15-degree turning movement. "> Figure 26
Prediction of trajectory for 8-degree and 15-degree turning movement. "> Figure 27
Prediction of u, v, r, and heading for 5°/5° Zigzag. "> Figure 27 Cont.
Prediction of u, v, r, and heading for 5°/5° Zigzag. "> Figure 28
The optimized forecasting effect. ">

Versions Notes

Abstract

In complex marine environments, accurate prediction of maneuvering motion is crucial for the precise control of intelligent ships. This study aims to enhance the predictive capabilities of maneuvering motion for intelligent ships in such environments. We propose a novel maneuvering motion prediction method based on Long Short-Term Memory (LSTM) and Multi-Head Attention Mechanisms (MHAM). To construct a foundational dataset, we integrate Computational Fluid Dynamics (CFD) numerical simulation technology to develop a mathematical model of actual ship maneuvering motions influenced by wind, waves, and currents. We simulate typical operating conditions to acquire relevant data. To emulate real marine environmental noise and data loss phenomena, we introduce Ornstein–Uhlenbeck (OU) noise and random occlusion noise into the data and apply the MaxAbsScaler method for dataset normalization. Subsequently, we develop a black-box model for intelligent ship maneuvering motion prediction based on LSTM networks and Multi-Head Attention Mechanisms. We conduct a comprehensive analysis and discussion of the model structure and hyperparameters, iteratively optimize the model, and compare the optimized model with standalone LSTM and MHAM approaches. Finally, we perform generalization testing on the optimized motion prediction model using test sets for zigzag and turning conditions. The results demonstrate that our proposed model significantly improves the accuracy of ship maneuvering predictions compared to standalone LSTM and MHAM algorithms and exhibits superior generalization performance.

Keywords:

intelligent ships; maneuvering motion prediction; Long Short-Term Memory Network; Multi-Head Attention Mechanisms; Computational Fluid Dynamics; maneuvering mathematical model

1. Introduction

The rapid evolution of computer technology and artificial intelligence has led to the extensive deployment of intelligent ships across various sectors, including industry, defense, and scientific research. The pivotal capabilities of intelligent ships encompass perception, positioning, decision-making, planning, and motion control. Notably, motion control stands out as a critical technology for intelligent navigation, exerting a direct influence on the intelligence level, safety, stability, and economic performance of intelligent ships. However, when navigating in the real maritime environment, ships face considerable challenges from environmental factors such as wind, waves, and currents, which significantly impact motion control [1]. Therefore, the accurate real-time prediction of ship motion responses is crucial for enhancing the efficacy of control algorithms. Currently, methods for forecasting ship motion can be broadly categorized into white-box, black-box, and gray-box models.

White-box models, also referred to as mechanistic models, are mathematical constructs predicated on classical kinematics and dynamics that characterize ship motion, including the response model, Abkowitz model, and MMG model [2]. In 1957, Nomoto et al. introduced a response model by establishing a functional relationship between the yaw rate and the rudder angle [3], leveraging first-order and second-order response models. In 1967, Abkowitz [4] formulated a comprehensive nonlinear equation that regarded the ship–propeller–rudder system as an integrated system and expanded the hydrodynamic forces via a Taylor series expansion. Following extensive research, the Maneuvering Modeling Group (MMG) model was proposed, which considers the hydrodynamic forces of the hull–propeller–rudder separately and represents the mutual influence among them with disturbance coefficients [5]. MMG models possess a well-defined structure and clear principles. The unknown hydrodynamic parameters can be determined via empirical formulas, constrained model tests, and system identification methods. For instance, Araki et al. [6] employed data from Capture Model Tests (CMT) and Computational Fluid Dynamics (CFD) simulations to estimate the unknown coefficients in the model using the least squares (LS) method. Åström [7] employed the Kalman Filter (KF) to identify unknown parameters in the Abkowitz mathematical model. Notably, both LS and KF methods share a common limitation: the algorithm parameters must be adjusted according to different ships or environments. With the rapid advancement of artificial intelligence, machine learning techniques have been increasingly utilized in the field of ship model parameter identification. Luo et al. and Zhang et al. [8,9,10,11] employed Support Vector Machines (SVM) and their variants to identify parameters of ship maneuvering models. Wang et al. [12] proposed a “nu”-Support Vector Regression method with a Gaussian kernel (V-SVR) to establish a robust ship motion model. Specifically, they designed a parameter tuning scheme that combines cross-validation and dynamic process simulation to mitigate the impact of parameter drift and overfitting on the reliability of model identification. ZHANG [13] introduced a Chebyshev orthogonal basis function neural network and trained and validated the model using ship simulation data. Analysis of these research methods highlights that mechanistic modeling requires a reasonable mathematical model to be assumed in advance, and the identified parameters are only applicable within a limited range of conditions. Moreover, mechanistic modeling relies on a quasi-steady-state assumption, neglecting the influence of the ship’s historical motion state. Therefore, mechanistic modeling methods have certain limitations in practical applications.

To reduce the complexity of the forecasting process and dependence on prior knowledge, and to expand the applicable range of forecasting models, black-box model-based forecasting methods have become a popular research direction. These methods aim to improve forecasting efficiency and adaptability by leveraging data-driven approaches without requiring a detailed understanding of the underlying mechanisms. Black-box models, which are based on data-driven machine learning algorithms, do not rely on traditional physical laws or deterministic mathematical equations. Instead, they directly learn and extract patterns, relationships, and features from large amounts of data to establish the optimal mapping between inputs and outputs. These models are termed “black-box” because their internal mechanisms and decision-making processes are often opaque and complex, making it difficult to interpret how they arrive at specific outputs [14,15]. Due to the coupled interaction between fluid and hull, the historical motion of a ship significantly influences its future motion trend. Therefore, time series models based on Recurrent Neural Networks (RNN) excel in processing sequential data and effectively capture temporal dependencies, owing to their unique structure and memory mechanism. Moreira [16] and Hao [17] used simulation data and model test data, respectively, to complete the verification of ship maneuvering motion forecasting methods based on RNN models. However, standard RNNs suffer from vanishing or exploding gradient problems when processing long sequences, leading to poor prediction results. To address this issue, variant models such as Long Short-Term Memory Networks (LSTM), Gated Recurrent Units (GRU), and Transformer have been proposed [18,19,20]. Woo et al. [21] proposed a differential maneuvering motion model for USVs (Unmanned Surface Vehicles) based on the LSTM architecture. Furthermore, Guo et al. [22] utilized the LSTM model to predict the heave and buffeting motion of a semi-submerged network and examined the impact of input sequence length. Jiang [23] employed an LSTM neural network to learn and train the standard maneuvering experimental data of the KVLCC2 model. The training and validation results indicated that the LSTM deep neural network can accurately identify the mapping relationship between ship motion states and control inputs. However, while LSTM is capable of extracting temporal features from data, it may not fully capture and leverage the complex interrelationships among various input variables and their respective impacts on the output in ship motion prediction tasks. Moreover, due to the “forget gate” and “parameter sharing” mechanisms inherent in the LSTM model, the information within the input sequence data tends to decay gradually during transmission [24]. Siwen [25] proposed a ship trajectory prediction model (STPGL) that integrates spatiotemporal perception graph attention networks (GAT) based on the LSTM network. STPGL was employed to predict ship motion trajectories in complex scenarios, demonstrating its effectiveness in enhancing the prediction accuracy of short-term, medium-term, and long-term ship trajectories. Jia [26] introduced an attention mechanism into the Bidirectional Long-Short Memory (BILSTM) model to increase the weight of crucial information. Jia proposed the Attention-BILSTM model to predict ship motion trajectories and address the complex hyperparameter design problem of the model. Furthermore, the Whale Optimization Algorithm (WOA) was employed to optimize the network’s hyperparameters. Wang [27] combined a self-attention weighted bidirectional LSTM network with a one-dimensional convolutional network to develop a super short-term deep learning predictor named “Seabil”. Additionally, this study designed a one-dimensional convolutional algorithm to extract coupled feature mappings from multidimensional inputs, utilized Bi-LSTM to learn the forward and reverse feature mappings of ship maneuvering time series data, and employed a sequence-level cascaded self-attention mechanism to dynamically weight the hidden states. Comparisons with typical methods, such as DMD, SVR, GRU, and LSTM models, demonstrated the predictor’s superiority. Dong L [28] proposed a ship maneuvering prediction model using Multi-Head Attention Mechanisms, where different attention layers explore various time response characteristics of input motion states and control signals and compare the prediction effects of iterative multi-step prediction and direct multi-step prediction.

Current research has made significant progress in ship motion prediction, with improvements in the forecasting accuracy of black-box models. However, challenges such as long training times, complex parameter adjustments, and poor model interpretability pose significant obstacles for practical applications. Motivated by these challenges, this paper proposes a non-parametric ship dynamics modeling method based on the LSTM network and Multi-Head Attention Mechanisms. The LSTM-Multi-Head Attention model referred to in this paper employs the LSTM network to capture the temporal local features in ship motion. The MHAM are responsible for extracting multi-dimensional dynamic features, thereby enhancing the modeling capability of temporal nonlinear patterns. Through the decoupling of spatiotemporal features and dynamic weight allocation, the model achieves a good balance between prediction accuracy and computational efficiency.

The proposed architecture fully captures the historical information of ship motion time series data and the impact of various variables related to changes in ship motion, thereby improving the accuracy of ship motion forecasting. The remainder of this paper is organized as follows. Section 2 introduces the construction and processing methods of the ship motion dataset. Section 3 explicitly describes the non-parametric ship dynamics modeling method based on the LSTM-Multi-Head Attention model. Section 4 analyzes the model structure parameters and compares and verifies them with other algorithms, such as LSTM, Multi-Head Attention, and GRU. This section further validates the forecasting performance of the proposed model. Finally, the optimized model is applied to predict the ship’s turning and zigzag motions.

2. Construction and Data Processing of Ship Motion Dataset

2.1. Maneuvering Mathematical Model of USV

The maneuvering motion of the ship is described using two right-handed Cartesian coordinate systems, as illustrated in Figure 1. The coordinate system,

o_{1} - x_{1} y_{1} z_{1}

, is fixed on the Earth’s surface and represents a geodetic coordinate system. The

o_{1} x_{1} y_{1}

plane is aligned with the still water surface, with the

o_{1} z_{1}

axis perpendicular to the water surface and pointing downward. This fixed coordinate system is used to represent the ship’s navigation trajectory and direction. The moving coordinate system,

G - x y z

, is attached to the ship’s hull with the ship’s center of gravity as the origin. The

G x

axis is parallel to the base plane and points toward the bow (positive direction). The

G y

axis is perpendicular to the longitudinal section and points to the starboard side (positive direction). The

G z

axis is aligned with the

o_{1} z_{1}

axis. This moving coordinate system describes the forces and moments acting on the ship. The surge velocity, sway velocity, and yaw rate are represented by u, v, and r, respectively.

Under the influence of the marine environment, ships exhibit six degrees of freedom (6-DOF) in maneuvering motions. To simplify the mathematical model, this paper considers the ship’s 3-DOF motion, specifically surge, sway, and yaw. According to the expression of the centroid motion theorem in the coordinate system of motion, when only the horizontal motion is considered, the ship motion equation can be described as Equation (1). According to the 3-DOF MMG model [2], ship motion force under environmental disturbance is decomposed into hull force, propeller force, rudder force, and environmental force, and the ship’s motion equations are shown in Equation (2).

\{\begin{cases} m (\dot{u} - v r) = X \\ m (\dot{v} + u r) = Y \\ I_{Z} \dot{r} = N \end{cases}

(1)

\{\begin{cases} X = X_{H} + X_{P} + X_{R} + X_{E} \\ Y = Y_{H} + Y_{P} + Y_{R} + Y_{E} \\ N = N_{H} + N_{P} + N_{R} + N_{E} \end{cases}

(2)

where m represents the mass of the ship,

\dot{u}

\dot{v}

\dot{r}

are the surge acceleration, sway acceleration, and rotational angular velocity around the

o_{1} z_{1}

axis, respectively.

I_{Z}

represents the moment of inertia of the ship’s mass around the

o_{1} z_{1}

axis, and X, Y, and N denote the longitudinal force, transverse force, and yawing moment acting on the ship, respectively.

X_{H}

Y_{H}

and

N_{H}

represent the longitudinal force, transverse force, and yawing moment acting on the hull, respectively.

X_{P}

Y_{P}

and

N_{P}

are the longitudinal force, transverse force, and yawing moment acting on the propeller.

X_{R}

Y_{R}

and

N_{R}

denote the longitudinal force, transverse force, and yawing moment acting on the rudder.

X_{E}

Y_{E}

and

N_{E}

are the forces and moments caused by external environmental factors. The viscous hydrodynamic forces acting on the hull depend on the ship’s shape and motion. Under minor perturbation conditions, the viscous hydrodynamic forces acting on the hull can be linearly represented by a first-order model as follows [2]:

\{\begin{cases} X_{H} = X_{0} + X_{u} Δ u \\ Y_{H} = Y_{v} v + Y_{r} r + Y_{δ} δ \\ N_{H} = N_{v} + N_{r} r + N_{δ} δ \end{cases}

(3)

When studying the maneuvering motion of a ship at large rudder angles, the non-linear terms of the motion parameters should be included in the mathematical formulation, represented by a third-order model as follows [2]:

\{\begin{cases} X_{H} = X_{0} + X_{v v} v^{2} + X_{v r} v r + X_{r r} r^{2} \\ Y_{H} = Y_{v} v + Y_{r} r + Y_{v v v} v^{3} + Y_{v v r} v^{2} r + Y_{v r r} v r^{2} + Y_{r r r} r^{3} \\ N_{H} = N_{v} v + N_{r} r + N_{v v v} v^{3} + N_{v v r} v^{2} r + N_{v r r} v r^{2} + N_{r r r} r^{3} \end{cases}

(4)

where

X_{0}

represents the resistance experienced by the ship during straight-line navigation, and

Y_{v}

Y_{r}

N_{v}

and

N_{r}

describe the linear hydrodynamic derivatives,

Y_{v v v}

Y_{r r r}

N_{v v v}

and

N_{r r r}

are the higher-order hydrodynamic derivatives, and

Y_{v v r}

Y_{v r r}

N_{v v r}

and

N_{v r r}

are the higher-order coupled hydrodynamic derivatives. In order to accurately solve the above hydrodynamic derivatives, CFD technology is used to conduct simulation tests of three planar motion mechanisms: oblique navigation, pure sway, and pure yaw. The least squares method is used to fit the simulation results and calculate the corresponding hydrodynamic derivatives [29].

2.2. Sample Data Collection

Simulations of the ship performing turning and zigzag maneuvers under calm water and wave conditions were conducted using Equations (1)–(4). Table 1 and Table 2 summarize the ship’s dimensions and the simulation environment. To enhance the generalization of the black-box model, turning tests were performed at rudder angles of ±5°, ±8°, ±10°, and ±15°, while zigzag maneuvers were executed at rudder angles of 5°/5°, 8°/8°, 10°/10°, and 15°/15°. These tests were conducted at speeds of 3 kn, 4 kn, 5 kn, 6 kn, 7 kn, and 8 kn. The data collection frequency for the simulation experiments was 100 Hz. Additionally, to improve the model training speed, we designed datasets with a sampling frequency of 20 Hz. The corresponding data are depicted in Figure 2 and Figure 3. Figure 2 illustrates the longitudinal velocity u, transverse velocity v, and yaw rate r of the ship during turning maneuvers in both calm water and wave environments, while Figure 3 shows the same characteristics during zigzag maneuvers.

To improve model accuracy while preventing overfitting, the dataset is divided into training, validation, and testing sets in the ratios of 60%, 20%, and 20%, respectively, as shown in Figure 4. Table 3 lists the specific conditions for these divisions.

2.3. Data Preprocessing

In actual environments, ship sensor equipment is often subject to external interference, which can lead to data loss, transmission delays, or inaccurate data collection. To address these challenges and ensure the model’s robustness in practical applications, this paper employs data augmentation techniques on the simulation data. Specifically, we introduce Ornstein–Uhlenbeck (OU) noise and Random Occlusion Noise. OU noise is a mean-reverting stochastic process, where the current noise value is influenced by random disturbances and the previous state, effectively simulating noise with temporal correlation. The mathematical formulation of OU noise is provided in Equation (5), and the noise update formula is given in Equation (6) [30]. Random Occlusion Noise simulates partial damage or occlusion by randomly setting some time series data points to zero, mimicking real-world scenarios where data may be partially lost or obscured [31]. The final processed data are described in Equation (7).

d X_{t} = θ (μ - X_{t}) d t + σ ε \sqrt{Δ t}

(5)

X_{t + 1}^{(O U)} = X_{t} + θ (μ - X_{t}) d t + σ ε \sqrt{Δ t}

(6)

d a t a^{(n o s i y)} = d a t a^{(i d e a l)} + x^{(O U)}

(7)

Here,

θ

is the rate parameter controlling the noise regression to the mean, with larger values resulting in less interference.

μ

is the mean of the noise, σ represents the noise volatility,

ε \sqrt{Δ t}

is a random term generated by standard Brownian motion, and

ε

denotes a normal distribution with mean 0 and variance 1.

To improve training speed and enhance model robustness, the data are normalized to the interval [−1, 1] before model training. The normalization process is presented in Equation (8), where

x_{m a x}

and

x_{m i n}

represent the maximum and minimum values in the dataset, respectively. After obtaining the prediction results, the predicted values are denormalized to their actual values, and the denormalization formula is shown in Equation (9) [32].

x^{'} = \frac{2 (x - x_{m i n})}{x_{m a x} - x_{m i n}} - 1

(8)

y = \frac{(y^{'} + 1) (x_{m a x} - x_{m i n})}{2} + x_{m i n}

(9)

3. Black-Box Prediction Model of USV Motion Based on LSTM and Multi-Head Attention Mechanism

3.1. LSTM Algorithm

Recurrent Neural Networks (RNN) are neural network models designed for processing sequential data. They feature an internal loop structure that enables them to capture dynamic information in time series. However, traditional RNNs often encounter issues with vanishing or exploding gradients when dealing with long-term dependencies. To address these challenges, Long Short-Term Memory Networks (LSTM) introduce memory cells and three gating mechanisms: the input gate, forget gate, and output gate. These mechanisms control the network’s memory capabilities, effectively mitigating gradient vanishing while capturing and retaining long-term dependencies. Ship motion is characterized by nonlinearity, time-varying behavior, coupling, and response lag. Therefore, ship dynamics modeling involves analyzing the ship’s motion response under the influence of complex factors, such as control commands, marine environment, and historical motion states. Given that ship motion states exhibit time series dependencies, the LSTM model is well-suited to extract their time-varying characteristics for ship dynamics modeling in waves. Figure 5 illustrates the LSTM model [18].

The inputs to the model unit are the ship’s motion state

x_{t - l} = {u, v, r, x, y, ϕ}

(where

l

represents the length of the sliding window), the model’s memory cell state

C_{t - 1}

, the hidden state

h_{t - 1}

, and control quantity

U_{t}

. Accordingly, the outputs are the memory cell

C_{t}

, the hidden state

h_{t}

, and ship’s motion state at time t,

x_{t} = {u, v, r, x, y, ϕ}

. According to reference [18], the model comprises three gating units: the forget gate, the input gate, and the output gate. The forget gate controls the extent to which the network retains historical information through a neural network layer using a sigmoid activation function. The sigmoid activation function,

f_{t}

, outputs a value between 0 and 1, indicating the retention degree of each unit in the cell state. Values close to 1 signify full retention, while values close to 0 signify full forgetting. The calculation formula for the forget gate

f_{t}

is as follows:

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

(10)

where

σ

is the sigmoid activation function,

W_{f}

is the weight matrix between the layer and the forget gate, and

b_{f}

is the bias coefficient. The input gate determines the new information to be remembered, comprising two parts: a sigmoid layer determining the information to be updated and a tanh layer that creates a new candidate value vector representing the values to be updated:

\{\begin{cases} i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i}) \\ {\tilde{C}}_{t} = \tanh (W_{c} \cdot [h_{t - 1}, x_{t}] + b_{c}) \end{cases}

(11)

where

W_{i}

and

W_{c}

are the weight matrices,

b_{i}

and

b_{c}

are the bias coefficients. The final memory cell state

C_{t}

is determined by the forget and input gates, as follows:

C_{t} = f_{t} * C_{t - 1} + i_{t} * {\tilde{C}}_{t}

(12)

The output gate determines which part of the cell state is output to the next time step by regulating the influence of the current state on the output (Equation (13)). The resulting values are then propagated to the next time step.

\{\begin{cases} O_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o}) \\ h_{t} = O_{t} \cdot \tanh (C_{t}) \end{cases}

(13)

In practical applications of ship dynamics non-parametric modeling, LSTM demonstrates strong nonlinear fitting and long-term memory capabilities, which can predict the complex nonlinear states of ship motion well. However, LSTM still suffers from the following limitations. First, the model input data typically includes state and control information, and LSTM cannot sufficiently distinguish them. Second, different dimensions of the dataset have various degrees of impact on modeling accuracy. LSTM cannot automatically differentiate its attention, leading to certain deficiencies in forecasting ship dynamics modeling with nonlinear and strongly coupled motion characteristics. Thus, in 2017, Vaswani et al. [20] proposed the Multi-Head Attention Mechanism (MHAM), which allows the model to learn information in different subspaces in parallel, simultaneously focusing on various dimensions of the input sequence.

3.2. Multi-Head Attention Mechanism

When processing a data sequence, the Multi-Head Attention Mechanism can allocate weights to improve the model’s focus on crucial information, enabling each position in the input sequence to attend to other positions within the same sequence and capture the dependencies between them. Figure 6 illustrates the structure of the Multi-Head Attention Mechanism [33].

The MHAM, based on self-attention, adds multiple attention heads to parallelize different dimensional attention allocations to input information. It comprises three key steps: linear transformation, attention calculation, and head merging and transformation: (1) Linear Transformation: The input sequence is transformed through three different linear transformations (Linear) to obtain the query matrix Q (query), key matrix K (key), and value matrix V (value). In the Multi-Head Attention Mechanism, this step is performed h times (where h is the number of heads), with each head having independent weight matrices, thus dividing the input vector into h different subspaces. (2) Attention Calculation: The self-attention mechanism calculates the attention weights for each subspace. The calculation formula is presented in Equation (14), where

i

denotes the index of the attention head,

d_{k}

is the dimension of the key matrix, and Softmax is the normalization function, as presented in Equation (15). (3) Head Merging and Linear Transformation: The outputs of all heads are connected to form a new matrix. The connected matrix is then linearly transformed to obtain the final output, as presented in Equation (16), where

W^{O}

is the output linear transformation matrix, and “Concat” represents the concatenation operation.

H e a d_{i} = A t t e n t i o n_{i} (Q_{i}, K_{i}, V_{i}) = s o f t \max (\frac{Q_{i} K_{i}^{T}}{\sqrt{d_{k_{i}}}}) V_{i}

(14)

softmax (x) = \frac{e^{x_{i}}}{\sum_{i} e^{x_{i}}}

(15)

M u l t i H e a d (Q, K, V) = C o n c a t (h e a d_{1}, h e a d_{2}, \dots, h e a d_{n}) W^{O}

(16)

3.3. LSTM-Multi-Head Attention Model Framework

LSTM and Multi-Head Attention Mechanisms have advantages when dealing with long time series data. Specifically, LSTM excels at capturing long-term dependencies and is suitable for processing time series data with long-term correlations. It controls the information flow through gating mechanisms (forget gate, input gate, output gate), thereby reducing the interference of irrelevant information. The MHAM excels at the parallel processing of information in multiple subspaces, capturing various dependencies in the data. Dynamically focusing on crucial parts of the sequence through attention weights enhances the model’s sensitivity to important information. Combining the strengths of both models, this paper integrates them to design three black-box models: LSTM-Multi-Head Attention-1, LSTM-Multi-Head Attention-2, and LSTM-Multi-Head Attention-3, as illustrated in Figure 7, Figure 8 and Figure 9, we expect to explore the optimal combination scheme through different model combinations. All three models combine LSTM and MHAM. Specifically, the LSTM-Multi-Head Attention-1 and LSTM-Multi-Head Attention-2 first use LSTM layers to extract the temporal relationships in the data, then apply attention mechanisms to process the output of the LSTM layers to focus on the impact of different dimensions of data on the forecasting results and finally use dense fully connected layers to output the propagation motion forecasting results. Model LSTM-Multi-Head Attention-1 uses a single output layer to forecast the motion data of six dimensions simultaneously, while model LSTM-Multi-Head Attention-2 designs separate network layers for each dimension of the ship motion information. LSTM-Multi-Head Attention-3 replaces the fully connected layer in models 1 and 2 with an additional LSTM network layer, with the expectation that the designed network will fully extract complex motion characteristics, such as time lags in the ship’s motion process.

The model contains a large number of hyperparameters, currently, no optimization algorithm can directly determine the optimal model structure, and commonly, the model’s hyperparameters and number of network layers are set up based on experience and experiments. We will analyze the parameter optimization in detail in Section 4.

3.4. Model Training Process

Equations (17) and (18) present the inputs x and outputs y of the model, where u, v, and r are the ship’s longitudinal velocity, transverse velocity, and yaw rate, respectively; δ denotes the control input rudder angle,

n_{p}

is the control input speed, t is the length of the sliding window, and p is the model prediction time series length, i.e., using the motion information of t historical time steps to predict the ship’s motion state for the following p time steps.

\{\begin{cases} x = [u_{i}, v_{i}, r_{i}, x_{i}, y_{i}, ϕ_{i}] (i = 1, 2, \dots, t) \\ U = [δ_{t}, n_{p_{t}}] \end{cases}

(17)

y = [u_{j}, v_{j}, r_{j}, x_{j}, y_{j}, ϕ_{j}] (j = t + 1, t + 2, \dots, t + p)

(18)

The model employs the Mean Squared Error (MSE) as the loss function and is trained using the Adam optimizer, with the specific formulations as follows:

M S E = \frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}

(19)

The model’s performance in predicting ship trajectories is evaluated using the Root Mean Square Error (RMSE) [34], which is formulated as follows:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i}})^{2}

(20)

4. Model Structure Analysis and Comparative Verification

This section applies the test set collected in Section 2 to compare and analyze the three LSTM-Multi-Head Attention models proposed. Then, the network parameters are optimized through simulation, including the number of neurons, regularization degree, training batch size, learning rate, the number of attention heads, sliding window length, and dataset sampling frequency.

4.1. Model Structure Analysis

4.1.1. Impact of Network Structure on Forecasting Accuracy

In this experiment, all models share the same hyperparameters to systematically analyze their training and forecasting performance, as reported in Table 4. The corresponding forecasting effects are depicted in Figure 10. The distribution of RMSE and the loss curve during the training process are illustrated in Figure 11. According to the forecasting results, using only one network layer to process motion information results in an overly simplistic model. Therefore, the loss value during training cannot decrease, leading to a poor forecasting effect. The loss curves of models 2 and 3 converge to small values. Additionally, from the forecasting curve and RMSE distribution, it is evident that in the u and v dimensions, model 3 has the best forecasting effect. In the other dimensions, model 2 performs best.

4.1.2. Impact of the Regularization Degree

In model 2, the regularization degree is varied to investigate its impact. The corresponding forecasting effects of the model with different regularization methods are depicted in Figure 12. The distribution of forecasting RMSE errors and the training loss curves are illustrated in Figure 13. The RMSE distribution indicates that the model achieves the best forecasting performance when the dropout rate is 0.4, with an average RMSE of 0.001173. Conversely, when the dropout rate increases to 0.5, the average RMSE rises to 0.007092, resulting in a decline in forecasting performance. For dropout rates of 0.2 and 0.1, the average RMSE values are 0.003132 and 0.007092, respectively, with no significant improvement in forecasting performance observed. Notably, the training loss curve suggests that the training loss decreases as the dropout rate decreases. Specifically, when the dropout rate is 0.1, the model tends to overfit, thereby diminishing forecasting performance. In summary, a dropout rate of 0.5 leads to excessive regularization, causing the model to underperform on both the training and test sets. In contrast, a dropout rate of 0.4 strikes the optimal balance, yielding the best model performance.

4.1.3. Impact of the Number of Attention Heads

Increasing the number of attention heads allows the model to learn different features and dependencies in parallel, thereby enhancing its expressiveness. However, an excessive number of heads can lead to overfitting during model training. In this paper, we simulate the model’s forecasting performance with three, five, and eight attention heads. The forecasting performance of models with different numbers of attention heads is depicted in Figure 14. The distribution of forecasting RMSE errors and the training loss curves are illustrated in Figure 15. For three attention heads, the average RMSE value is 0.002162, and the training loss converges to 0.019041. For five and eight attention heads, the average RMSE value is 0.001492 and 0.001173, and the training loss converges to 0.011864 and 0.007766, respectively. Thus, the model performance improves as the number of attention heads increases, with the lowest average RMSE value achieved for eight attention heads. While increasing the number of attention heads decreases the training loss, it can also reduce the model’s forecasting ability, indicating potential overfitting. However, when considering the model’s performance in individual dimensions of motion state, specifically in the u and v dimensions, increasing the number of attention heads has a negative impact on the model’s performance.

4.1.4. Impact of the Number of Neurons

Increasing the number of neurons can enhance the model’s complexity and learning ability, enabling it to capture complex data features more effectively. However, when the data are limited or contain significant noise, excessively increasing the number of neurons can lead to model overfitting. Therefore, this study simulates and analyzes the model’s forecasting performance with 256, 128, and 64 neurons. The forecasting performance of models with different numbers of neurons is depicted in Figure 16. The distribution of forecasting RMSE errors and the training loss curves are illustrated in Figure 17. For 256 neurons, the model’s average RMSE value is 0.004325, and its training loss converges to 0.012267. For 128 and 64 neurons, the model’s average RMSE values are 0.002705 and 0.003251, respectively, and the training loss converges to 0.014912 and 0.013823, respectively. Thus, the model achieves the best forecasting performance with 128 neurons, as evidenced by the smallest RMSE value. When the number of neurons increases to 256, although the training loss is minimized, the forecasting performance deteriorates, indicating overfitting. From the perspective of individual motion state dimensions, in the u and v dimensions, fewer neurons lead to better forecasting performance. Conversely, in the x and heading dimensions, more neurons are associated with better performance.

4.1.5. Impact of Training Batch Size

A larger batch size can accelerate the overall training process and yield more stable gradient updates but may discard detailed information. Conversely, small-batch training can capture more detailed information but demands higher memory and computational resources. Therefore, the training batch size affects the model’s performance. This paper investigates the effects of batch sizes of 2048, 1024, and 512 on model training and prediction. The forecasting performance of the model under different batch sizes is shown in Figure 18, and the training loss curves and predicted RMSE distributions are depicted in Figure 19. The model’s average RMSE values are 0.003251, 0.003825, and 0.002346 for batch sizes of 2048, 1024, and 512, respectively. Correspondingly, the training loss converges to 0.013823, 0.011551, and 0.011413 for these batch sizes. Overall, the training loss decreases as the batch size decreases. The model achieves the best forecasting performance with a batch size of 512.

4.1.6. Impact of Sliding Window Width

The size of the sliding window determines the length of historical data that the model can refer to during each prediction. A larger window size can capture trends over a more extended period but may introduce more noise, making it challenging for the model to learn compelling features. Conversely, a smaller window size may ignore critical long-term dependencies. The sliding window size affects the model’s generalization performance, computational efficiency, and feature extraction capabilities. The research object of this paper is a 3 m-long catamaran. According to the CFD simulation results, after steering the ship, the ship’s heading acceleration disappeared after 36.5 s, and the course tended to be stable. The sampling interval of the data set in this paper was 0.5 s, and the corresponding time window length was 73. Therefore, to select the appropriate window size, this paper trains models with window sizes of 10, 20, 30, and 73. The predictive capabilities of these models are presented in Figure 20 and Figure 21. The corresponding average RMSE values are 0.001637 for a window size of 10, 0.002346 for a window size of 20 (demonstrating poorer model performance), 0.001173 for a window size of 30, and 0.001156 or a window size of 73 (delivering the best model forecasting performance). According to the results, it can be inferred that when the window length conforms to the characteristics of the ship motion period, the model can learn more abundant ship motion performance. In the future, this feature will be fully combined to conduct more in-depth research.

4.2. Comparative Modeling Accuracy of LSTM, Multi-Head Attention, LSTM-Multi-Head Attention, Transformer and GRU

The previous section presented a comparative optimization analysis of model parameters, revealing that the LSTM-Multi-Head Attention-2 model exhibited the best forecasting performance. The optimal hyperparameters are detailed in Table 5. This section compares the forecasting accuracy of the LSTM, Multi-Head Attention, LSTM-Multi-Head Attention-2, Transformer, and GRU models. To ensure a fair evaluation, all five models were trained and tested using the same dataset, hyperparameters, and loss function, with a sufficient number of iterations to guarantee the convergence of the loss function. The models’ forecasting abilities were assessed using the RMSE error function, with the forecasting results of the five models illustrated in Figure 22 and Figure 23. Figure 22 indicates that the GRU model had the worst forecasting performance for the ship motion dataset used. In the u dimension, the LSTM-Multi-Head Attention-2 and Multi-Head Attention models demonstrated similar performance, and both highly coincide with the MMG data. In the v and r dimensions, LSTM-Multi-Head Attention-2 achieves the best performance. Regarding motion trajectory prediction, the forecasting results of LSTM-Multi-Head Attention-2 and Multi-Head Attention both highly coincide with the MMG data. The model training error curve suggests that the five models converge on the training set, among which the training errors of the Transformer model are the smallest, followed by the LSTM-Multi-Head Attention-2 and Multi-Head Attention models. However, the Transformer converges slowly. The RMSE distribution map shows that the comprehensive forecasting error of LSTM-Multi-Head Attention-2 is the smallest, further indicating that combining the LSTM and Multi-Head Attention models improves the forecasting ability of LSTM and Multi-Head Attention and is superior to the Transformer model.

4.3. Model Simulation

Based on the optimized LSTM-Multi-Head Attention-2 model structure, a motion prediction model was trained using sample data. The trained model was then applied to the test set and compared with the simulation results of the MMG model. The test set includes turning motion data of 8°/8° and 15°/15° and zigzag motion data of 5°/5°, representing two different maneuvering characteristics of the ship. The motion data scenarios of the three scenarios are not included in the model’s training and validation sets, verifying the accuracy of the model’s generalization ability. Figure 24, Figure 25, Figure 26 and Figure 27 present the simulation verification results, revealing that the forecasting results of the LSTM-Multi-Head Attention-2 model are highly consistent with the simulation results of the MMG model, demonstrating the excellent learning ability and generalization performance of the model using the LSTM-Multi-Head Attention-2 network. Despite adding OU noise and random loss to the dataset, the data preprocessing steps effectively repair defects, and the model training learns and resists this interference. The simulation results highlight that the model’s heading angle and motion trajectory forecasting results for the 5°/5° zigzag test set differ more from the MMG model results. The possible reasons are: (1) since the heading angle data, after normalization, have relatively small values compared to other dimensions’ data, the Multi-Head Attention Mechanism does not focus on this dimension; (2) compared with turning motion, the heading angle fluctuates in a small range during the zigzag motion process, making it challenging for the model to learn its patterns effectively. Hence, to solve this problem, a weight coefficient is assigned to the heading angle data in the zigzag motion dataset to amplify the patterns between data:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i}})^{2}

(21)

where α is the amplification coefficient. The forecasting performance is further improved after training the model with the processed dataset, as depicted in Figure 28.

5. Discussion

Applying experimental data to identify ship motion patterns often requires a significant investment. Since it is uncertain which types of experimental data are suitable for model training, using simulated data to train the model in advance can reduce experimental costs. However, simulated data are often too perfect, lacking noise and interference loss issues compared to real-world data, degrading the generalization capabilities of the deep learning models and making them unsuitable for real environments. Therefore, this paper first establishes the MMG maneuvering motion mathematical model of a ship and uses the MMG model to generate simulated motion datasets. Then, this study adds OU and random occlusion noise to the datasets to simulate the characteristics of random noise and random data loss in experimental data. Subsequently, the data are normalized using MaxAbsScaler before training to improve training speed and enhance model robustness. On this basis, a three-degree-of-freedom black-box prediction model for ship motion based on LSTM networks and Multi-Head Attention Mechanisms is established.

Subsequently, this work conducts an in-depth comparative study of this model from the following aspects: model structure, hyperparameter adjustment, and forecasting accuracy. A total of 23 models were trained, and each was deeply compared and analyzed regarding training error and RMSE of motion forecasting in each dimension, ultimately determining an optimized model strategy. The optimal model was further compared with LSTM, Multi-Head Attention, and GRU models, and the LSTM-Multi-Head Attention model proposed in this paper was found to have the best forecasting performance. To further verify the performance of the LSTM-Multi-Head Attention model, the model was validated on turning motion data of 8°/8° and 15°/15° and zigzag motion data of 5°/5°. During the verification process, it was found that there was a significant deviation in the heading angle forecast for the zigzag motion data. The model was further modified to address this deviation, and after amplifying the data in this dimension, the forecasting performance was further improved. Finally, experimental data were used to verify the performance of the black-box model.

Future studies should investigate combining the proposed model with reinforcement learning algorithms to develop model-based deep reinforcement learning control systems for stable control of ships in complex environments.

Author Contributions

Methodology, D.L., X.G. and C.H.; Software, D.L. and W.S.; Validation, D.L.; Formal analysis, D.L.; Investigation, C.H.; Data curation, X.G.; Writing—original draft, D.L.; Writing—review & editing, D.L., X.G. and C.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Montewka, J.; Hinz, T.; Kujala, P.; Matusiak, J. Probability modelling of vessel collisions. Reliab. Eng. Syst. Saf. 2010, 95, 573–589. [Google Scholar] [CrossRef]
Fan, S.Y. Ship Maneuverability; National Defence Industry Press: Beijing, China, 1988. [Google Scholar]
Nomoto, K.; Taguchi, T.; Honda, K.; Hirano, S. On the steering qualities of ships. Int. Shipbuild. Prog. 1957, 4, 354–370. [Google Scholar] [CrossRef]
Abkowitz, M.A. Measurement of Hydrodynamic Characteristics from Ship Maneuvering Trials by System Dentification. 1980. Available online: https://trid.trb.org/View/157366 (accessed on 2 March 2025).
Xu, H.; Soares, C.G. Hydrodynamic coefficient estimation for ship maneuvering in shallow water using an optimal truncated LS-SVM. Ocean Eng. 2019, 191, 106488. [Google Scholar] [CrossRef]
Araki, M.; Sadat-Hosseini, H.; Sanada, Y.; Tanimoto, K.; Umeda, N.; Stern, F. Estimating coefficients using system identification methods with experimental, system-based, and CFD free-running trial data. Ocean Eng. 2012, 51, 63–84. [Google Scholar] [CrossRef]
Åström, K.J. Maximum likelihood and prediction error methods. Automatic 1980, 16, 551–574. [Google Scholar] [CrossRef]
Luo, W.L.; Zou, Z.J. Parametric Identification of Ship Models by Using Support Vector Machine. J. Ship Res. 2009, 53, 19–30. [Google Scholar] [CrossRef]
Luo, W.L.; Moreira, L.; Soares, C.G. Maneuvering simulation of catamaran by using implicit models based on support vector machines. Ocean Eng. 2014, 82, 150–159. [Google Scholar] [CrossRef]
Luo, W.L.; Soares, C.G.; Zou, Z.J. Parameter Identification of Ship Model Based on Support Vector Machines and Particle Swarm Optimization. J. Offshore. Mech. Arct. Eng. 2016, 138, 031101. [Google Scholar] [CrossRef]
Zhang, X.; Zou, Z. Identification of Abkowitz model for ship manoeuvring motion using ε-support vector regression. J. Hydrodyn. 2011, 23, 353–360. [Google Scholar] [CrossRef]
Wang, Z.; Xu, H.; Xia, L.; Zou, Z.; Soares, C.G. Kernel-based support vector regression for nonparametric modeling of ship maneuvering motion. Ocean Eng. 2020, 216, 107994. [Google Scholar] [CrossRef]
Zhang, Y.-N.; Li, W.; Cai, B.; Li, K.N. Direct weight determination Method of Chebyshev orthogonal basis neural Networks. Comput. Simul. 2009, 26, 157–161. [Google Scholar]
Silva, K.M.; Maki, K.J. Data-Driven system identification of 6-DoF ship motion in waves with neural networks. Appl. Ocean Res. 2022, 125, 103222. [Google Scholar] [CrossRef]
Zhang, Z.; Ren, J.; Bai, W. MIMO non-parametric modeling of ship maneuvering motion for marine simulator using adaptive moment estimation locally weighted learning. Ocean Eng. 2022, 261, 112103. [Google Scholar] [CrossRef]
Moreira, L.; Soares, C.G. Dynamic Model of Maneuverability Using Recursive Neural Networks. Ocean Eng. 2003, 30, 1669–1697. [Google Scholar] [CrossRef]
Hao, L.; Han, Y.; Shi, C.; Pan, Z. Recurrent Neural Networks for Nonparametric Modeling of Ship Motion. Int. J. Nav. Archit. Ocean Eng. 2022, 14, 100436. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Cho, K. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
Vaswani, A. Attention Is All You Need; Advances in Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2017. [Google Scholar]
Woo, J.; Park, J.; Yu, C.; Kim, N. Dynamic Model Identification of Unmanned Surface Vehicles Using Deep Learning Network. Appl. Ocean Res. 2018, 78, 123–133. [Google Scholar] [CrossRef]
Guo, X.; Zhang, X.; Tian, X.; Li, X.; Lu, W. Predicting heave and surge motions of a semi-submersible with neural networks. Appl. Ocean Res. 2021, 112, 102708. [Google Scholar] [CrossRef]
Jiang, Y.; Hou, X.R.; Wang, X.G.; Wang, Z.-H.; Yang, Z.-L. Identification modeling and prediction of ship maneuvering motion based on LSTM deep neural network. J. Mar. Sci. Technol. 2022, 27, 125–137. [Google Scholar] [CrossRef]
Kaadoud, I.C.; Rougier, N.P.; Alexandre, F. Knowledge extraction from the learning of sequences in a long short-term memory (LSTM) architecture. Knowl. -Based Syst. 2022, 235, 107657. [Google Scholar] [CrossRef]
Wang, S.; Li, Y.; Xing, H. A novel method for ship trajectory prediction in complex scenarios based on spatio-temporal features extraction of AIS data. Ocean Eng. 2023, 281, 114846. [Google Scholar] [CrossRef]
Jia, H.; Yang, Y.; An, J.; Fu, R. A ship trajectory prediction model based on attention-BILSTM optimized by the whale optimization algorithm. Appl. Sci. 2023, 13, 4907. [Google Scholar] [CrossRef]
Wang, N.; Kong, X.; Ren, B.; Hao, L.; Han, B. SeaBil: Self-attention-weighted ultrashort-term deep learning prediction of ship maneuvering motion. Ocean Eng. 2023, 287, 115890. [Google Scholar] [CrossRef]
Dong, L.; Wang, H.; Lou, J. A temporal prediction model for ship maneuvering motion based on multi-head attention mechanism. Ocean Eng. 2024, 309, 118464. [Google Scholar] [CrossRef]
Yi, L.; Lu, Z.; Zaojian, Z.; Haipeng, G. Predictions of Ship Maneuverability Based on Virtual Captive Model Tests. Eng. Appl. Comput. Fluid Mech. 2018, 12, 334–353. [Google Scholar]
Maller, R.A.; Müller, G.; Szimayer, A. Ornstein–Uhlenbeck Processes and Extensions; Handbook of Financial Time Series; Springer: Berlin/Heidelberg, Germany, 2009; pp. 421–437. [Google Scholar]
Babaee, M.; Li, Z.; Rigoll, G. Occlusion handling in tracking multiple people using RNN. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 2715–2719. [Google Scholar]
Goonko, A.V.; Seroklinov, G.V.; Devyatkin, E.S.; Soloveichik, Y.G.; Yakimenko, A.A.; Mengliev, D.B.; Karimov, M.K.; Barakhnin, V.B.; Markov, S.I.; Serov, A.N.; et al. Application of data mining technologies for processing results of experimental studies. In Proceedings of the 2023 IEEE XVI International Scientific and Technical Conference Actual Problems of Electronic Instrument Engineering (APEIE), Novosibirsk, Russia, 10–12 November 2023; pp. 860–863. [Google Scholar]
Tao, C.; Gao, S.; Shang, M.; Wu, W.; Zhao, D.; Yan, R. Get the Point of My Utterance! Learning Towards Effective Responses with Multi-Head Attention Mechanism. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 4418–4424. [Google Scholar]
Hodson, T.O. Root mean square error (RMSE) or mean absolute error (MAE): When to use them or not. Geosci. Model Dev. Discuss. 2022, 15, 5481–5487. [Google Scholar] [CrossRef]

Figure 1. Define the ship’s coordinate system of motion.

Figure 2. Turning motion data collection: (a) u in still water; (b) v in still water; (c) r in still water; (d) u in wave environment; (e) v in wave environment; and (f) r in wave environment.

Figure 3. Zigzag motion data collection: (a) u in still water; (b) v in still water; (c) r in still water; (d) u in wave environment; (e) v in wave environment; and (f) r in wave environment.

Figure 4. Training, validation, and testing sets.

Figure 5. LSTM model unit structure.

Figure 6. Multi-Head Attention Mechanism structure.

Figure 7. LSTM-Multi-Head Attention-1 Model Framework.

Figure 8. LSTM-Multi-Head Attention-2 Model Framework.

Figure 9. LSTM-Multi-Head Attention-3 Model Framework.

Figure 10. Forecasting effects of the proposed models.

Figure 11. RMSE and loss curves of the proposed models.

Figure 12. Forecasting effects of models with different regularization methods.

Figure 13. RMSE and loss curves with different regularization methods.

Figure 14. Forecasting effects of models with different numbers of heads.

Figure 15. RMSE and loss curves with different numbers of heads.

Figure 16. Analysis of the impact of the number of neurons on model performance.

Figure 17. RMSE and loss curves with different number of neurons.

Figure 18. Forecasting effects of models with different training batch sizes.

Figure 19. RMSE and loss curves with different training batch sizes.

Figure 20. Analysis of the Impact of sliding window size.

Figure 21. RMSE and loss curves with different sliding window size.

Figure 22. Comparison of prediction effects among LSTM, GRU, Multi-Head Attention, Transformer, and LSTM-Multi-Head Attention-2 models.

Figure 23. RMSE and loss curves of LSTM, GRU, Multi-Head Attention, Transformer, and LSTM-Multi-Head Attention-2 models.

Figure 24. Prediction of u, v, r, and heading for an 8-degree turning movement.

Figure 25. Prediction of u, v, r, and heading for a 15-degree turning movement.

Figure 26. Prediction of trajectory for 8-degree and 15-degree turning movement.

Figure 27. Prediction of u, v, r, and heading for 5°/5° Zigzag.

Figure 28. The optimized forecasting effect.

Table 1. Ship dimension.

Description	Parameter	Unit	Value
Ship Length	L	m	3
Ship Breadth	B	m	1.5
Ship Depth	D	m	0.44
Design Draft	t	m	0.28
Design Displacement	Δ	m³	0.666

Table 2. Environmental information.

Wave Height (m)	Wind Speed (m/s)	Wind Direction
0.3	2	Southwest

Table 3. Condition division.

Dataset	Condition
Training Set	Turning: −10°, −8°,−5°, 10°, 13°, 14°; Zigzag: 10°/10°, 15°/15°
Validation Set	Turning: −13°, 5°; Zigzag: 20°/20°
Testing Set	Turning: −14°, 8°; Zigzag: 5°/5°

Table 4. Hyperparameter setup.

Hyperparameter Name	Value
Iteration Steps	500
Number of Neurons	128
Training Batch	2048
Heads Number	8
Learning Rate	0.0001
Regularization Degree	0.2

Table 5. Optimal hyperparameters of the model.

Parameter Name	Parameter Value
Dropout Rate	0.4
Heads Number	8
Number of Neurons	128
Training Batch	512
Sliding Window Width	30

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, D.; Gao, X.; Huo, C.; Su, W. Research on Maneuvering Motion Prediction for Intelligent Ships Based on LSTM-Multi-Head Attention Model. J. Mar. Sci. Eng. 2025, 13, 503. https://doi.org/10.3390/jmse13030503

AMA Style

Liu D, Gao X, Huo C, Su W. Research on Maneuvering Motion Prediction for Intelligent Ships Based on LSTM-Multi-Head Attention Model. Journal of Marine Science and Engineering. 2025; 13(3):503. https://doi.org/10.3390/jmse13030503

Chicago/Turabian Style

Liu, Dongyu, Xiaopeng Gao, Cong Huo, and Wentao Su. 2025. "Research on Maneuvering Motion Prediction for Intelligent Ships Based on LSTM-Multi-Head Attention Model" Journal of Marine Science and Engineering 13, no. 3: 503. https://doi.org/10.3390/jmse13030503

APA Style

Liu, D., Gao, X., Huo, C., & Su, W. (2025). Research on Maneuvering Motion Prediction for Intelligent Ships Based on LSTM-Multi-Head Attention Model. Journal of Marine Science and Engineering, 13(3), 503. https://doi.org/10.3390/jmse13030503

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Maneuvering Motion Prediction for Intelligent Ships Based on LSTM-Multi-Head Attention Model

Abstract

1. Introduction

2. Construction and Data Processing of Ship Motion Dataset

2.1. Maneuvering Mathematical Model of USV

2.2. Sample Data Collection

2.3. Data Preprocessing

3. Black-Box Prediction Model of USV Motion Based on LSTM and Multi-Head Attention Mechanism

3.1. LSTM Algorithm

3.2. Multi-Head Attention Mechanism

3.3. LSTM-Multi-Head Attention Model Framework

3.4. Model Training Process

4. Model Structure Analysis and Comparative Verification

4.1. Model Structure Analysis

4.1.1. Impact of Network Structure on Forecasting Accuracy

4.1.2. Impact of the Regularization Degree

4.1.3. Impact of the Number of Attention Heads

4.1.4. Impact of the Number of Neurons

4.1.5. Impact of Training Batch Size

4.1.6. Impact of Sliding Window Width

4.2. Comparative Modeling Accuracy of LSTM, Multi-Head Attention, LSTM-Multi-Head Attention, Transformer and GRU

4.3. Model Simulation

5. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI