Open AccessArticle

LSAF-LSTM-Based Self-Adaptive Multi-Sensor Fusion for Robust UAV State Estimation in Challenging Environments^†

Mahammad Irfan

^1,*

Sagar Dalai

Petar Trslic

²,

James Riordan

³ and

Gerard Dooly

CRIS Research Group, Department of Electronic and Computer Engineering, University of Limerick, V94 T9PX Limerick, Ireland

School of Engineering, University of Limerick, The Lonsdale Building, V94 T9PX Limerick, Ireland

Drone Systems Lab, School of Computing Engineering and Physical Science, University of the West of Scotland, Glasgow G72 0LH, UK

Author to whom correspondence should be addressed.

^†

This paper is an extended version of our paper published in Irfan, M.; Dalai, S.; Vishwakarma, K.; Trslic, P.; Riordan, J.; Dooly, G. Multi-Sensor Fusion for Efficient and Robust UAV State Estimation. In Proceedings of the 2024 12th International Conference on Control, Mechatronics and Automation (ICCMA), London, UK, 11–13 November 2024.

Machines 2025, 13(2), 130; https://doi.org/10.3390/machines13020130

Submission received: 15 January 2025 / Revised: 31 January 2025 / Accepted: 7 February 2025 / Published: 9 February 2025

(This article belongs to the Special Issue Selected Papers from the 12th International Conference on Control, Mechatronics and Automation (ICCMA 2024))

Download

Browse Figures

Figure 1
Proposed architecture for LSTM-based self-adaptive multi-sensor fusion (LSAF). "> Figure 2
An illustration of the proposed LSAF framework. The global estimator combines local estimations from various global sensors to achieve precise local accuracy and globally drift free pose estimation, which builds upon our previous work [<a href="#B28-machines-13-00130" class="html-bibr">28</a>]. "> Figure 3
Proposed LSTM-based multi-sensor fusion architecture for UAV state estimation. "> Figure 4
LSTM cell architecture for adaptive multi-sensor fusion. "> Figure 5
Training and validation loss of the proposed LSTM-based self-adaptive multi-sensor fusion (LSAF) framework over 1000 epochs. "> Figure 6
Training and validation MAE of the proposed LSTM-based self-adaptive multi-sensor fusion (LSAF) framework over 1000 epochs. "> Figure 7
Proposed block diagram for LSTM-based self-adaptive multi-sensor fusion (LSAF). "> Figure 8
The experimental environment in different scenarios during the data collection. Panel (a,b) represent the UAV hardware along with sensor integration and panel (c,d) are the open-field dataset environment view from stereo and LiDAR sensors, respectively, which build upon our previous work [<a href="#B28-machines-13-00130" class="html-bibr">28</a>]. "> Figure 9
Trajectory plots of the proposed LSAF method and comparison with FASTLIO2 and VINS-Fusion. "> Figure 10
Box plots showing the overall APE of each strategy. "> Figure 11
Absolute estimated position of x, y, and z axes showing plots of various methods on the UAV car parking dataset. "> Figure 12
Absolute position error of roll, yaw, and pitch showing the plots of various methods on the UAV car parking dataset. "> Figure 13
Trajectory plots of the proposed LSAF method and comparison with FASTLIO2 and VINS-Fusion on the UL outdoor handheld dataset. "> Figure 14
Box plots showing the overall APE of each strategy. "> Figure 15
Absolute estimated position of x, y, and z axes showing the plots of various methods on the UL outdoor handheld dataset. "> Figure 16
Absolute position error of roll, yaw, and pitch showing the plots of various methods on the UL outdoor handheld dataset. "> Figure 17
Trajectory plots of the proposed LSAF method and comparison with FASTLIO2 and VINS-Fusion. "> Figure 18
Absolute estimated position of the x, y, and z axes showing the plots of various methods on the UAV car bridge dataset. "> Figure 19
Absolute position error of roll, yaw, and pitch showing plots of various methods on the UAV car bridge dataset. "> Figure 20
Box plots showing the overall APE of each strategy. ">

Versions Notes

Abstract

Unmanned aerial vehicle (UAV) state estimation is fundamental across applications like robot navigation, autonomous driving, virtual reality (VR), and augmented reality (AR). This research highlights the critical role of robust state estimation in ensuring safe and efficient autonomous UAV navigation, particularly in challenging environments. We propose a deep learning-based adaptive sensor fusion framework for UAV state estimation, integrating multi-sensor data from stereo cameras, an IMU, two 3D LiDAR’s, and GPS. The framework dynamically adjusts fusion weights in real time using a long short-term memory (LSTM) model, enhancing robustness under diverse conditions such as illumination changes, structureless environments, degraded GPS signals, or complete signal loss where traditional single-sensor SLAM methods often fail. Validated on an in-house integrated UAV platform and evaluated against high-precision RTK ground truth, the algorithm incorporates deep learning-predicted fusion weights into an optimization-based odometry pipeline. The system delivers robust, consistent, and accurate state estimation, outperforming state-of-the-art techniques. Experimental results demonstrate its adaptability and effectiveness across challenging scenarios, showcasing significant advancements in UAV autonomy and reliability through the synergistic integration of deep learning and sensor fusion.

Keywords:

ROS; LSTM; adaptive fusion; multi-sensor fusion; state estimation; UAV; LiDAR-visual-inertial odometry; MSCKF

1. Introduction

Recent advances in computing power, sensor technologies, and machine learning have significantly fueled interest in autonomous unmanned aerial vehicles (UAVs), also known as drones. These systems have become indispensable across a wide range of applications, including robot navigation, autonomous driving, virtual reality (VR), augmented reality (AR), environmental monitoring, delivery services, and disaster response. In such contexts, navigation and positioning are essential to ensuring the UAV’s operational accuracy, safety, and efficiency. Modern UAVs heavily rely on sensor fusion techniques to provide robust state estimation that enables them to operate autonomously, even in complex or dynamic environments. Beyond UAVs, sensor fusion plays a vital role in the Internet of Vehicles (IoV), autonomous robots, and other emerging technologies [1,2].

The field of state estimation in navigation and control systems for autonomous robots has evolved significantly over the years, driven by technological advancements in sensor hardware and computational algorithms. State estimation involves deriving accurate information about a system’s position, velocity, and orientation based on sensor data. While single-sensor solutions have been extensively studied, their limitations have increasingly motivated research into multi-sensor fusion approaches. These approaches leverage the complementary characteristics of diverse sensors to overcome the constraints of individual sensors and enhance the accuracy, robustness, and resilience of state estimation systems [3].

Despite the progress made, achieving robust, accurate, and seamless navigation and positioning solutions remains a major challenge when relying solely on single-sensor systems. For example, the inertial navigation system (INS), which relies on accelerometers and gyroscopes to compute relative positions, is highly accurate only for short durations. Over time, the accumulation of sensor noise and integration errors causes significant drift. Similarly, GPS, while offering absolute positioning data, is effective primarily in open sky environments but is prone to signal blockage, multipath interference, and degraded performance in urban canyons, dense forests, or indoor environments. These limitations demand the integration of additional sensor types, such as cameras, LiDAR, and IMU, to ensure robust state estimation with enhanced spatial and temporal coverage.

Visual inertial navigation systems (VINS) [4] have emerged as a cost effective and practical solution for state estimation in UAVs, combining visual and inertial data to achieve higher accuracy. However, VINS performance in complex environments is often hindered by its susceptibility to changing illumination, low texture regions, and dynamic obstacles. LiDAR, on the other hand, provides accurate distance measurements and operates independently of lighting conditions. Its growing affordability and precision have made it a popular choice for UAVs. Nonetheless, LiDAR systems face challenges related to sparse data and difficulty in extracting semantic information. Similarly, vision-based approaches using monocular or stereo cameras struggle with initialization, sensitivity to illumination changes, and distance variability. These challenges highlight the need for multi-sensor fusion, where the strengths of different sensors are combined to overcome individual shortcomings.

In recent years, multi sensor fusion approaches have advanced significantly, enabling UAVs to achieve real-time, high-precision positioning and mapping. For example, integrating GPS with IMU data mitigates inertial navigation drift and improves noise filtering in complex environments. Incorporating LiDAR and visual data further enhances accuracy by providing rich spatial and semantic information. However, traditional sensor fusion methods often rely on static weighting of sensor inputs, which can lead to suboptimal performance in dynamic or degraded scenarios. These limitations have driven research toward adaptive sensor fusion techniques that dynamically adjust sensor contributions based on real-time environmental conditions and sensor reliability [5,6].

Recent advancements in deep learning have introduced a powerful paradigm for adaptive sensor fusion. Deep learning models, such as long short-term memory (LSTM) networks, can effectively learn temporal dependencies in sensor data and adaptively compute fusion weights based on real-time input. This capability allows UAVs to dynamically prioritize reliable sensors and minimize the impact of degraded or faulty sensor data. Such adaptability is particularly valuable in scenarios involving sudden illumination changes, feature deprived environments, degraded GPS signals, or complete signal loss where traditional single-sensor systems and static-weight fusion approaches often fail.

This paper presents a novel, deep learning-based adaptive multi-sensor fusion framework for UAV state estimation. The proposed framework integrates stereo cameras, IMU, LiDAR sensors, and GPS-RTK data into a unified system, which is depicted in Figure 1. A long short-term memory (LSTM) model is used to dynamically compute sensor fusion weights in real time, ensuring robust, accurate, and consistent state estimation under diverse conditions. Unlike conventional methods that rely on fixed sensor weights, our approach leverages the real-time adaptability of deep learning to optimize sensor contributions based on environmental and operational factors.

Our approach is validated on an in-house UAV platform equipped with an internally integrated and calibrated sensor suite. The system is evaluated against high-precision RTK ground truth, demonstrating its ability to maintain robust state estimation in both GPS-enabled and GPS-denied scenarios. The algorithms autonomously determine relevant sensor data, leveraging stereo inertial or LiDAR inertial odometry outputs to ensure global positioning in the absence of GPS.

The major contributions of this research are as follows:

We propose an innovative multi-sensor fusion system integrating a VGA stereo camera, two 3D LiDAR sensors, a nine-degree-of-freedom IMU, and optimized GPS-RTK networking to achieve precise UAV state estimation.
A deep learning-based adaptive weighting mechanism is implemented using LSTM to dynamically adjust sensor contributions, ensuring robust state estimation across diverse and challenging environments.
A commercial UAV equipped with an internally integrated and calibrated sensor platform is used to collect complex datasets, enabling robust evaluation of the proposed method.
Extensive evaluations confirm the efficacy and performance of the stereo-visual-LiDAR fusion framework, demonstrating high efficiency, robustness, consistency, and accuracy in challenging scenarios.

By addressing the limitations of traditional methods and introducing dynamic adaptability through deep learning, this work significantly advances the field of UAV state estimation, paving the way for more reliable autonomous navigation systems.

2. Related Work

In recent decades, many innovative approaches for UAV state estimation have been proposed, leveraging different types of sensors. Among these, vision-based and LiDAR-based methods have gained substantial attention due to their ability to provide rich environmental data for accurate localization and mapping. Researchers have extensively explored the fusion of visual and inertial sensors, given their complementary properties in addressing UAV navigation challenges [7].

For state estimation, sensors such as IMUs are frequently used in fusion designs that can be broadly categorized into loosely coupled and tightly coupled approaches. In loosely coupled systems, sensor outputs are independently processed and subsequently fused, offering simplicity and flexibility when integrating diverse sensors. However, tightly coupled systems have gained increasing preference due to their ability to process raw sensor data directly, such as utilizing raw IMU measurements in pose estimation. This allows for more accurate state estimation, especially in scenarios with high dynamic motion or challenging environmental conditions. Papers [8,9] propose tightly coupled methods that integrate visual and inertial data for efficient and robust state estimation. By exploiting the raw data from IMU and cameras, these methods address issues like drift and improve system robustness compared to loosely coupled alternatives.

2.1. Multi-Sensor Fusion Approaches

Current multi-sensor fusion methods can be broadly classified into filtering-based, optimization-based, and deep learning-based approaches [10].

2.1.1. Filtering-Based Methods

Filtering-based methods, such as the extended Kalman filter (EKF) and unscented Kalman filter (UKF), have been widely adopted for sensor fusion due to their computational efficiency and ability to handle real-time applications. These methods assume Gaussian noise and rely on linearization techniques to model system dynamics. However, their performance deteriorates in the presence of nonlinear models or non-Gaussian noise distributions. Furthermore, their reliance on static sensor weightings can result in suboptimal performance in dynamic and unpredictable environments.

2.1.2. Optimization-Based Methods

Optimization-based approaches address the limitations of filtering methods by formulating the state estimation problem as an optimization task. These methods, such as bundle adjustment (BA) and factor graph optimization (FGO), are well suited for handling nonlinearities and non-Gaussian noise. Although optimization methods are computationally more demanding, they provide higher precision and robustness, making them popular for applications requiring high accuracy, such as simultaneous localization and mapping (SLAM). For example, techniques that combine visual, inertial, and LiDAR data in optimization frameworks have demonstrated significant improvements in state estimation accuracy in diverse scenarios.

2.1.3. Deep Learning-Based Methods

With the rapid advancements in deep learning, researchers have increasingly explored neural network-based algorithms for sensor fusion and state estimation [11]. These methods leverage the ability of neural networks to learn complex, nonlinear relationships directly from data. For instance, networks designed for depth estimation and motion representation from image sequences have shown promise in improving pose estimation accuracy and robustness. Furthermore, neural networks can dynamically adapt sensor fusion weights based on real-time sensor reliability, enabling more robust state estimation in dynamic environments. However, the high computational cost and the need for extensive training data remain significant challenges for deploying deep learning-based methods in real-time UAV applications.

2.2. Sensor-Specific Contributions

2.2.1. Vision-Based SLAM

Vision based approaches, such as monocular or stereo visual SLAM, utilize cameras to map the environment and estimate the UAV’s pose. These methods offer a cost-effective solution but are highly sensitive to illumination changes, feature poor environments, and dynamic objects. Moreover, challenges such as scale ambiguity in monocular systems and computational overhead in stereo systems limit their widespread application.

2.2.2. LiDAR-Based SLAM

LiDAR systems generate dense 3D point clouds of the environment, providing high-precision spatial information that is resilient to lighting variations. Compared to vision-based SLAM, LiDAR-based SLAM demonstrates superior performance in feature-poor or dynamic environments [12]. However, LiDAR data are inherently sparse and lacks semantic information, necessitating integration with other sensors such as cameras and IMUs for robust state estimation.

2.2.3. Multi-Sensor Fusion for SLAM

Recent studies highlight the importance of integrating complementary sensor types, such as cameras, LiDAR, IMU, and GNSS, to achieve robust and efficient SLAM-based navigation [13,14]. For instance, adding visual, LiDAR, or inertial factors enhances SLAM systems by improving robustness and state estimation accuracy [15,16]. Combining LiDAR and visual data mitigates the limitations of each sensor, while IMUs provide continuous data for motion prediction and noise filtering. The integration of GPS and GNSS further ensures resilience against environmental variability and provides accurate global positioning to address drift in large-scale environments.

2.3. Challenges and Opportunities

While current state estimation techniques show significant promise, challenges such as accumulated drift, sensitivity to environmental factors, and limited adaptability in dynamic scenarios persist. To address these issues, adaptive multi-sensor fusion techniques that dynamically adjust sensor weights based on environmental and operational factors have emerged as a promising solution. For example, learning-based frameworks leverage the adaptability of neural networks to dynamically compute sensor fusion weights, improving resilience and robustness in degraded conditions. Table 1. represents the summary of the comparison between LSAF and other state-of-the-art methods.

The proposed (LSAF) LSTM-based dynamic weight adjustment differs from existing methods by integrating LSTM-derived adaptive weights into MSCKF for real-time UAV state estimation, rather than just optimizing fusion weights offline. Unlike prior works, our approach employs an attention-based mechanism within LSTM to dynamically prioritize sensor reliability at each time step, ensuring robustness in SLAM-based pose estimation. Additionally, our hierarchical fusion strategy combines LSTM, SLAM, and MSCKF, making it more adaptable to real-world UAV applications, especially in GPS-denied and dynamic environments. These innovations differentiate our work from conventional LSTM-based fusion techniques.

This paper builds on this body of work by proposing a novel framework that combines the strengths of optimization-based and deep learning-based approaches. Using long-short-term memory (LSTM) networks, our method dynamically computes sensor fusion weights in real time, adapting to environmental conditions and sensor reliability. This framework integrates stereo cameras, LiDAR, IMU, and GPS-RTK data into a unified system, achieving superior performance in both GPS-enabled and GPS-denied scenarios.

3. Methodology

This research aims to achieve robust and accurate UAV state estimation by integrating measurements from multiple sensors, including GPS, stereo cameras, LiDARs, and IMUs, into a unified framework. The proposed system combines a multi-state constraint Kalman filter (MSCKF) [27] with a long short-term memory (LSTM)-based self-adaptive sensor fusion mechanism. This hybrid framework dynamically adjusts sensor fusion weights based on real-time environmental conditions and sensor reliability, ensuring consistent performance in challenging scenarios, such as GPS-degraded environments, rapid motion, and feature-deprived areas.

3.1. Coordinate Systems and Sensor Calibration

To ensure consistency across multi-sensor measurements, the system defines two primary coordinate systems—the world frame (W) and the UAV body frame (B)—as can be seen in the Figure 2. These systems represent the proposed LSAF framework. The body frame is aligned with the IMU frame for simplicity, as the IMU serves as the central reference for state propagation. Local sensors, such as stereo cameras, LiDARs, and IMUs, measure relative motion and require initialization of their reference frames. Initialization is typically performed by setting the UAV’s first pose as the origin. Global sensors, such as GPS, operate in an Earth-centered global coordinate frame and provide absolute positioning measurements. GPS data, expressed as latitude, longitude, and altitude, are converted into Cartesian coordinates (

x, y, z

) for consistency with local sensor measurements.

Offline calibration of all sensors is performed to reduce measurement biases, align coordinate frames, and ensure accurate fusion of data. This calibration accounts for sensor-specific offsets, such as biases in IMU accelerometers and gyroscopes, misalignment of LiDAR and camera frames, and GPS inaccuracies due to multipath effects or environmental interference. The calibration process ensures that measurements from all sensors are consistent and directly comparable within the fusion framework.

3.2. State Representation and Propagation

The UAV’s motion is modeled using a six-degree-of-freedom (6-DOF) representation, including position, velocity, orientation, and sensor biases. The state vector x is defined as follows:

x = [\begin{matrix} p \\ v \\ q \\ b_{a} \\ b_{g} \end{matrix}],

(1)

where

p \in R^{3}

is the position of the UAV,

v \in R^{3}

is the velocity,

q \in R^{4}

is the orientation represented as a quaternion,

b_{a} \in R^{3}

is the accelerometer bias, and

b_{g} \in R^{3}

is the gyroscope bias. The state is propagated forward in time using IMU measurements of linear acceleration (

a_{m}

) and angular velocity (

ω_{m}

) as follows:

\begin{matrix} \dot{p} & = v, \end{matrix}

(2)

\begin{matrix} \dot{v} & = R (q) (a_{m} - b_{a} - n_{a}) + g, \end{matrix}

(3)

\begin{matrix} \dot{q} & = \frac{1}{2} q \otimes (ω_{m} - b_{g} - n_{g}), \end{matrix}

(4)

\begin{matrix} {\dot{b}}_{a} & = n_{b a}, \end{matrix}

(5)

\begin{matrix} {\dot{b}}_{g} & = n_{b g} . \end{matrix}

(6)

Here,

R (q)

represents the rotation matrix derived from the quaternion

q

, and

g

is the gravity vector. The terms

n_{a}

n_{g}

n_{b a}

, and

n_{b g}

denote process noise, modeled as zero-mean Gaussian distributions.

3.3. Measurement Models for Multi-Sensor Integration

Each sensor provides measurements that are incorporated into the fusion framework through dedicated measurement models. These models relate sensor observations to the UAV’s state, ensuring accurate integration. The key measurement models are as described below.

1. IMU measurements provide linear acceleration and angular velocity. These are modeled as follows:

\begin{matrix} a_{m} & = R {(q)}^{T} (\dot{v} - g) + b_{a} + n_{a}, \end{matrix}

(7)

\begin{matrix} ω_{m} & = ω + b_{g} + n_{g} . \end{matrix}

(8)

2. GPS provides absolute position measurements in the global frame:

z_{GPS} = p + n_{GPS},

(9)

where

n_{GPS}

denotes measurement noise.

3. LiDAR generates 3D point clouds, providing precise spatial measurements:

p_{L} = R_{I L} (p - p_{I}) + n_{LiDAR},

(10)

where

R_{I L}

is the rotation matrix from the IMU to LiDAR frame, and

p_{I}

is the IMU’s position.

4. Stereo Camera provide 2D projections of 3D feature points:

\begin{matrix} u & = f_{x} \frac{x}{z} + c_{x}, v = f_{y} \frac{y}{z} + c_{y}, \end{matrix}

(11)

where

(x, y, z)

are the 3D coordinates of a feature in the camera frame, and

(u, v)

are the corresponding pixel coordinates.

3.4. Self-Adaptive Fusion with LSTM

Accurate state estimation for autonomous UAVs in dynamic and uncertain environments remains a critical challenge. Traditional sensor fusion methods such as the multi-state constraint Kalman filter (MSCKF) assume fixed measurement noise covariance (

R

), which limits their ability to adapt to varying sensor reliability. To address this limitation, this work introduces a long short-term memory (LSTM)-based self-adaptive fusion framework, which dynamically adjusts the measurement noise covariance for each sensor based on real-time reliability assessments. By leveraging temporal dependencies in sensor data, the proposed approach improves robustness to environmental variations, sensor degradation, and measurement inconsistencies.

The LSTM model takes the following characteristics as input key features indicative of sensor reliability: GPS signal strength, visual feature density, LiDAR point cloud density, and IMU noise levels. These features are processed over time to generate adaptive fusion weights, which are used to modify the sensor measurement models dynamically. The LSTM network is trained offline on a dataset comprising diverse environmental conditions, including urban landscapes, forested areas, and GPS-denied spaces, with ground truth obtained from GPS-RTK and high-accuracy SLAM systems. The ability of the LSTM to learn and generalize from these varied conditions enables it to adjust sensor fusion parameters optimally in real time, improving the overall accuracy and robustness of UAV state estimation.

Table 2 summarizes the LSTM-based self-adaptive multi-sensor fusion (LSAF) framework, which enhances UAV state estimation by dynamically weighting multi-sensor inputs. The framework integrates data from GPS, IMU, stereo cameras, and LiDAR, leveraging an LSTM model to extract temporal dependencies and compute adaptive sensor reliability scores. These weights dynamically adjust sensor contributions to SLAM-based pose estimation and MSCKF-based state correction, improving accuracy and robustness. The model architecture comprises two LSTM layers followed by a time-distributed dense layer, trained using the mean squared error (MSE) loss function and optimized via the Adam optimizer over 1000 epochs. Unlike traditional fusion techniques, the LSTM updates sensor weights at each time step, allowing for real-time adaptation to environmental variations. By assigning higher weightage to more reliable sensors, the system ensures precise state estimation, particularly in GPS-denied environments, high-speed maneuvers, and featureless conditions, ultimately enhancing UAV navigation and autonomous flight performance. Algorithm 1 represent the proposed LSAF process training phase steps.

Algorithm 1 Proposed LSTM-based self-adaptive multi-sensor fusion (LSAF) training phase

1:

Input:

$S_{t} = {s_{t 1}, s_{t 2}, \dots, s_{t S N}}$ : Multi-sensor measurements
G: Ground truth values
$ϕ_{LSTM}$ : LSTM model parameters
${\hat{x}}_{t - 1 | t - 1}, P_{t - 1 | t - 1}$ : Initial state estimate and covariance
$R, Q$ : Measurement and process noise covariance
$η$ : Convergence threshold

2:

Output:

Final estimated state: ${\hat{x}}_{t | t}$
Final covariance: $P_{t | t}$

3:

Step 1: Initialization

Initialize LSTM model parameters: $ϕ_{LSTM}$
Set noise covariances: $R, Q$
Define training parameters: number of epochs N, learning rate $α$

4:

Step 2: Training Phase (For each epoch N)

5:

for

e p o c h = 1

to N do

Encode Sensor Data: Compute hidden states using LSTM:

$h_{t} = LSTM (S_{t}; ϕ_{LSTM})$
Compute Adaptive Weights: Use attention mechanism:

$w_{t, i} = Attention (h_{t}, s_{t, i})$
Update Noise Covariance: Adjust sensor uncertainty:

$R_{t} = \sum_{i = 1}^{N} w_{t, i} \cdot R_{i}$
Predict Next State: Using motion model:

$x_{t | t - 1} = f (x_{t - 1}), P_{t | t - 1} = F_{t - 1} P_{t - 1} F_{t - 1}^{T} + Q$
Compute Kalman Gain: Optimize state estimation:

$K_{t} = P_{t | t - 1} H_{t}^{T} {(H_{t} P_{t | t - 1} H_{t}^{T} + R_{t})}^{- 1}$
State and Covariance Update: Using Kalman filter:

$x_{t | t} = x_{t | t - 1} + K_{t} (z_{t} - h (x_{t | t - 1}))$

$P_{t | t} = (I - K_{t} H_{t}) P_{t | t - 1}$
Compute Loss: Evaluate using Mean Squared Error (MSE):

$L ({\hat{x}}_{t}, g_{t}) = \frac{1}{N} \sum_{t = 1}^{N} {({\hat{x}}_{t} - g_{t})}^{2}$
Update Model Parameters: Adjust LSTM weights using Adam optimizer:

$ϕ_{LSTM} \leftarrow ϕ_{LSTM} - α \cdot \nabla L$

6:

end for

7:

Step 3: Output the Final State Estimation

Return final estimated state ${\hat{x}}_{t | t}$ and covariance $P_{t | t}$

3.5. Proposed LSTM-Based Multi-Sensor Fusion Architecture

The proposed LSTM-based multi-sensor fusion framework is designed to effectively integrate long-term temporal dependencies into sensor data, enabling robust and adaptive fusion. The architecture, illustrated in Figure 3, consists of two sequential LSTM layers followed by a time-distributed dense layer, ensuring optimal processing of time-series sensor inputs.

The proposed architecture is designed to efficiently process sequential multi-sensor data for adaptive state estimation in UAV applications. At the core of this framework is the multi-sensor input layer, which aggregates data from various sources, including inertial measurement units (IMU), LiDAR, GPS, and stereo cameras. This structured representation ensures that the model can effectively capture variations in sensor reliability over time, providing a robust foundation for subsequent processing. By concatenating information from different sensor modalities, the input layer creates a time-series feature space that allows the network to analyze both spatial and temporal correlations in sensor data.

The first LSTM layer, comprising 128 units, plays a crucial role in capturing long-term dependencies in sensor reliability. Since real-world sensor data exhibit complex temporal dynamics, this layer enables the model to recognize patterns related to sensor degradation, noise fluctuations, and environmental interference. By leveraging its ability to retain past information through memory cells, the LSTM network ensures that historical context is incorporated into the state estimation process, allowing for more informed predictions. This is particularly valuable in scenarios where certain sensors intermittently provide unreliable measurements due to external disturbances or occlusions.

Following this, the second LSTM layer, consisting of 64 units, is responsible for refining the temporal features extracted by the first layer. This secondary processing stage reduces the dimensionality of the extracted feature set while preserving the most relevant sequential information. By compressing high-dimensional sensor data into a more compact representation, the network becomes more efficient in distinguishing meaningful trends from noise. The stacking of LSTM layers further enhances the model’s ability to discern complex dependencies between different sensor modalities, leading to improved estimation accuracy. To maintain temporal consistency in the output, the architecture incorporates a time-distributed dense layer. Unlike conventional fully connected layers, which process entire input sequences at once, this layer applies dense transformations independently to each time step. This ensures that the predicted UAV states remain aligned with the corresponding sensor measurements, preserving the sequential integrity of the data. The time-distributed nature of this layer allows the model to generate real-time predictions without disrupting the temporal structure of the input.

The final output layer provides the estimated UAV state by incorporating adaptive fusion weights derived from past sensor behavior. These weights are dynamically adjusted based on the learned temporal dependencies, allowing the system to prioritize the most reliable sensors under varying operational conditions. The model continuously refines its predictions by leveraging historical patterns of sensor accuracy, leading to more robust and adaptive state estimation. This approach is particularly beneficial in GPS-denied environments, highly dynamic conditions, and scenarios where individual sensors experience intermittent failures. Through this structured design, the architecture effectively integrates sequential information to enhance UAV navigation and state estimation accuracy in challenging environments. The proposed model is optimized using the mean squared error (MSE) loss function, which is well suited for regression tasks as it minimizes the squared differences between predicted and actual values. This approach ensures that larger errors are penalized more heavily, leading to more precise predictions. For optimization, the Adam optimizer is utilized due to its adaptive learning rate and ability to efficiently handle complex datasets. Adam’s combination of momentum and adaptive gradient-based optimization contributes to faster and more stable convergence. To evaluate the model’s predictive performance, the mean absolute error (MAE) metric is employed, as it provides a straightforward measure of the average prediction error magnitude. The training process spans 1000 epochs with a batch size of 32, ensuring effective learning without overfitting. Additionally, techniques such as early stopping or validation loss monitoring can be incorporated to enhance model robustness and prevent unnecessary overtraining.

3.6. LSTM Cell Mechanism for Self-Adaptive Fusion

To achieve real time adaptation of sensor fusion weights, the LSTM cell operates at each time step to adjust the measurement noise covariance matrix dynamically. Figure 4 illustrates the internal mechanism of the LSTM cell, detailing its role in self-adaptive sensor fusion.

3.6.1. LSTM Training and Validation Loss

The training and validation loss curves, as shown in Figure 5, display a steady and consistent decrease over the course of 1000 epochs. This behavior signifies the model’s effective learning of temporal patterns from the multi-sensor dataset. The training loss starts with a high initial value, reflecting the model’s early attempts to understand the complexities of the dataset. Over successive epochs, the loss steadily declines as the LSTM architecture refines its understanding of the data.

The minimal gap between the training and validation loss curves demonstrates effective generalization, indicating that the model avoids overfitting to the training data. This alignment underscores the robustness of the chosen hyperparameters, including the learning rate, batch size, and architecture depth, in achieving optimal learning performance. The observed convergence validates the model’s suitability for capturing sequential patterns in multi-sensor data, making it highly reliable for downstream applications.

3.6.2. Mean Absolute Error (MAE) Analysis

The MAE curves for training and validation, depicted in Figure 6, reveal a consistent decline over 1000 epochs, highlighting the model’s ability to minimize prediction errors. The MAE metric evaluates the absolute difference between the predicted and actual values, making it an effective measure for assessing prediction accuracy.

The training and validation MAE curves are closely aligned, indicating that the model generalizes well to unseen data without significant overfitting. The steady convergence of these curves suggests that the proposed LSTM-based framework is highly effective in learning the temporal dependencies in the multi-sensor dataset. This highlights the model’s ability to accurately predict sequential data, even in the presence of noise and variability in the sensor measurements.

3.6.3. Validation of the Proposed Framework

The results validate the efficacy of the proposed LSTM-based self-adaptive multi-sensor fusion (LSAF) framework. The combination of temporal pattern learning via the LSTM and its ability to minimize both loss and MAE ensures a comprehensive solution for dynamic system modeling. The ability of the LSAF framework to generalize to unseen data while maintaining precise predictions makes it highly reliable for complex applications such as autonomous navigation and simultaneous localization and mapping (SLAM).

The convergence of training and validation metrics highlights the robustness and adaptability of the system. These attributes make the proposed pipeline a reliable approach for handling real-world multi-sensor fusion challenges in dynamic environments.

3.7. Fusion Framework Using MSCKF

The proposed LSTM-based self-adaptive multi-sensor fusion (LSAF) framework is designed for real-time UAV state estimation by dynamically integrating data from GPS, IMU, stereo cameras, and LiDAR. As illustrated in Figure 7, the system employs multiple onboard sensors, including two Livox MID-360 LiDARs, a DJI front stereo camera, a DJI IMU, and a GPS-RTK system, ensuring a comprehensive perception of the environment. The IMU provides high-frequency motion tracking, while GPS-RTK offers precise global positioning. The LiDARs generate dense 3D environmental maps, and the stereo camera enhances spatial perception, particularly in visually rich environments. To efficiently process these multimodal sensor data, an LSTM network extracts temporal dependencies and evaluates sensor reliability. The attention-based mechanism within the LSTM model computes adaptive fusion weights, dynamically adjusting the measurement noise covariance to prioritize the most reliable sensors in real time. The weighted multi-sensor measurements are then passed to a SLAM-based pose estimation module, which fuses all available sensors’ stereo cameras, LiDAR, IMU, and GPS-RTK, ensuring robust localization. The proposed algorithm is based on the enhancement of VINS SLAM [4]. The LSTM-derived reliability scores influence SLAM by assigning higher weightage to more reliable sensors, thereby enhancing pose estimation accuracy. When all sensors are available, SLAM produces an optimal UAV state estimate. Following SLAM-based pose estimation, the UAV state is further refined using the multi-state constraint Kalman filter (MSCKF), which ensures consistency in state propagation and correction. The Kalman gain is computed dynamically, leveraging the LSTM-adapted fusion weights to optimally integrate new observations. This adaptive approach mitigates the effects of sensor degradation, noise, and environmental uncertainties, improving the UAV’s robustness in GPS-denied areas and high-speed motion conditions. Following SLAM-based pose estimation, the multi-state constraint Kalman filter (MSCKF) is employed to propagate the UAV state and refine it based on the LSTM-adapted fusion weights. The state propagation step follows the motion model:

\begin{matrix} x_{k | k - 1} & = f (x_{k - 1}), \end{matrix}

(12)

\begin{matrix} P_{k | k - 1} & = F_{k - 1} P_{k - 1} F_{k - 1}^{T} + Q . \end{matrix}

(13)

Once new sensor measurements are available, the Kalman gain is computed to optimally integrate observations:

\begin{matrix} K_{k} & = P_{k | k - 1} H_{k}^{T} {(H_{k} P_{k | k - 1} H_{k}^{T} + R)}^{- 1}, \end{matrix}

(14)

\begin{matrix} x_{k} & = x_{k | k - 1} + K_{k} (z_{k} - h (x_{k | k - 1})), \end{matrix}

(15)

\begin{matrix} P_{k} & = (I - K_{k} H_{k}) P_{k | k - 1} . \end{matrix}

(16)

By incorporating LSTM-based sensor reliability assessments into the MSCKF, the framework dynamically adapts to changing sensor conditions, enhancing robustness in GPS-denied environments and complex dynamic flight scenarios. The complete algorithmic workflow is detailed in Algorithm 2, outlining sensor preprocessing, adaptive sensor fusion, SLAM-based pose estimation, and MSCKF-based state correction.

Algorithm 2 LSTM-based self-adaptive multi-sensor fusion (LSAF) algorithm

Input:

$S_{t} = {s_{t 1}, s_{t 2}, . . ., s_{t N}}$ : Multi-sensor inputs (GPS-RTK, IMU, Stereo Camera, LiDAR)
$ϕ_{LSTM}$ : Pre-trained LSTM model for adaptive fusion
$x_{t - 1}$ : Previous UAV state estimate (position, velocity, orientation)
$R, Q$ : Measurement and process noise covariance

Output: Updated UAV state $x_{t} = (p_{t}, v_{t}, q_{t})$ .

Step 1: Sensor Data Preprocessing
- Synchronize, filter, and normalize multi-sensor inputs.
- Extract temporal dependencies and sensor reliability via LSTM:
  
  $h_{t} = LSTM (S_{t}; ϕ_{LSTM})$
Step 2: Adaptive Sensor Fusion
- Compute dynamic sensor reliability weights:
  
  $w_{t, i} = Attention (h_{t}, s_{t, i})$
- Adjust measurement noise covariance dynamically:
  
  $R_{t} = \sum_{i = 1}^{N} w_{t, i} \cdot R_{i}$
- Compute fused sensor measurement:
  
  $z_{t} = \sum_{i = 1}^{N} w_{t, i} \cdot s_{t, i}$
Step 3: UAV State Prediction
- Predict initial UAV state using IMU measurements:
  
  $x_{t | t - 1} = f (x_{t - 1}, {IMU}_{t})$
- Propagate state covariance:
  
  $P_{t | t - 1} = F_{t - 1} P_{t - 1} F_{t - 1}^{T} + Q$
Step 4: LSTM-Guided Multi-Sensor SLAM-Based Pose Estimation
- Fuse all available sensors (Stereo Camera, LiDAR, IMU, and GPS-RTK) for SLAM-based pose estimation.
- Assign higher weightage to more reliable sensors based on LSTM reliability scores:
  
  $w_{SLAM, i} = \frac{w_{t, i}}{\sum_{j = 1}^{N} w_{t, j}} \forall i \in {GPS - RTK, IMU, Stereo, LiDAR}$
- Compute weighted SLAM pose estimation:
  
  ${\hat{p}}_{t}, {\hat{q}}_{t} = \sum_{i = 1}^{N} w_{SLAM, i} \cdot SLAM (s_{t, i})$
- Ensure global consistency using GPS-RTK when available.
- If all sensors degrade, rely on IMU-based odometry:
  
  $p_{t} = p_{t - 1} + v_{t - 1} \cdot Δ t$
Step 5: State Correction using MSCKF
- Compute Kalman gain:
  
  $K_{t} = P_{t | t - 1} H_{t}^{T} {(H_{t} P_{t | t - 1} H_{t}^{T} + R_{t})}^{- 1}$
- Update UAV state:
  
  $x_{t} = x_{t | t - 1} + K_{t} (z_{t} - h (x_{t | t - 1}))$
- Update state covariance:
  
  $P_{t} = (I - K_{t} H_{t}) P_{t | t - 1}$

Algorithm 2 details the complete computational pipeline, outlining key steps such as sensor preprocessing, LSTM-based adaptive fusion, SLAM-based pose estimation, and MSCKF-based state correction. The proposed framework significantly improves UAV navigation accuracy by enabling real-time adaptation to sensor reliability, making it well suited for challenging flight conditions, including environments with limited GPS visibility, rapid motion dynamics, and feature deprived landscapes.

3.8. Advantages of the Proposed Framework

The proposed framework combines the strengths of traditional filtering methods with modern deep learning techniques, enabling robust UAV state estimation in real time. The LSTM-based self-adaptive fusion mechanism allows the system to dynamically prioritize sensor contributions based on their reliability, improving robustness in challenging environments. The integration of the MSCKF ensures computational efficiency and consistency, making the system suitable for real-time UAV operations in diverse scenarios.

3.9. Experimental Setup and Dataset

The experiments were carried out in an open-field outdoor environment, as shown in Figure 8. The dataset was collected in a wide open lawn area with minimal features, such as sparse distant trees and limited structural elements. The environment presented significant challenges for single-sensor SLAM approaches due to the lack of features and bright, sunny conditions that degraded stereo and LiDAR based odometry. The UAV platform was handheld during data collection to simulate various motion patterns, and the dataset included asynchronous measurements from all sensors. Sensor data were fused using event-based updates, where the state was propagated to the timestamp of each measurement. Calibration parameters, such as camera-LiDAR-IMU extrinsics, were estimated offline and incorporated into the extended state vector for accurate fusion.

The offline calibration of the proposed system consists of three key components: estimation of the stereo camera’s intrinsic and extrinsic parameters, determination of the IMU-camera extrinsic offset, and calibration of the LiDAR-IMU transformation. To estimate both intrinsic and extrinsic parameters of the stereo camera, we employ the well-established Kalibr calibration toolbox [29], ensuring precise alignment between the camera and IMU. For 3D LiDAR-IMU calibration, we utilize the state-of-the-art LI-Init toolbox [30], which provides a robust real-time initialization framework for LiDAR-inertial systems, compensating for temporal offsets and extrinsic misalignments. To evaluate the robustness of the proposed approach under diverse conditions, we collected multiple datasets across three scenarios, such as handheld and UAV mounted configurations. The datasets, referred to as UL Outdoor Car Parking Dataset, UL Outdoor Handheld Dataset, and UL Car Bridge Dataset, were recorded at the University of Limerick Campus within the CRIS Lab research group. Figure 8 illustrates the experimental environments, while Table 3 presents detailed a dataset of UAV hardware sensor specifications. To address the challenge of asynchronous sensor data, we employ first-order linear interpolation to estimate the IMU pose at each sensor’s measurement time, mitigating time bias without significantly increasing computational overhead. Instead of direct event-based updates, this method ensures that sensor data are aligned with a consistent reference frame, preventing oversampling of high-frequency IMU data or undersampling of low-frequency GPS and LiDAR measurements. Additionally, ROS-based timestamp synchronization of DJI-OSDK and Livox LiDAR nodes further minimizes timing inconsistencies, enhancing fusion accuracy and reducing drift in state estimation.

The proposed method was evaluated without loop closure mode to assess its consistency and robustness. Performance metrics, including the absolute pose error (APE) and the root mean square error (RMSE), were calculated to quantify the accuracy of the estimated trajectory [31]. The comparison focused on the ability to mitigate cumulative errors and maintain robust state estimation across large-scale environments. The details of the hardware used during the experiments are listed in Table 3.

4. Results and Comparison

The evaluation of the proposed LSTM-based self-adaptive multi-sensor fusion system was conducted on a collected dataset using a UAV equipped with state-of-the-art sensors, including two Livox Mid360 LiDARs (facing forward and downward), front-facing stereo cameras, an IMU, and GPS-RTK. The hardware configuration is summarized in Table 3. The UAV configuration and experimental setup are shown in Figure 8. These sensors provide complementary modalities that are dynamically fused using the proposed system, leveraging the LSTM-based approach to adaptively weigh sensor contributions based on their reliability and environmental conditions.

4.1. UL Outdoor Car Parking Dataset

The car parking dataset provides a complex testing environment with open air spaces, tree shadows, and dynamic illumination changes, as depicted in Figure 8. This experiment assessed the LSAF approach in a large-scale outdoor setting without loop closure to verify the robustness of the proposed methodology. The UAV operated in a vast, open lawn with minimal tree coverage and bright sunlight conditions that significantly challenge stereo-based odometry systems. These scenarios lead to frequent failures in vision-only or LiDAR-only methods.

During the UAV’s navigation over the parking area, most of the LiDAR-detected features were confined to the ground, resulting in degraded motion estimation. The trajectory plots, when compared to ground truth RTK data, showed that FASTLIO2 [17] suffered from substantial errors due to LiDAR degradation. Additionally, VINS-Fusion (stereo-inertial) [4] performed poorly, exhibiting the highest position drifts, while VINS-Fusion (stereo-IMU-GPS) [4] showed noticeable drifts under these conditions. Sparse LiDAR features in the dataset further impacted LiDAR-based methods like FASTLIO2. However, the proposed LSAF system, leveraging stereo, IMU, LiDAR, and GPS with a pre-trained LSTM-based deep learning model, provided enhanced UAV state estimation and consistently smoother trajectories in this challenging environment. Figure 9 displays the trajectories obtained using different methods, while Figure 10 highlights the box plots showing the overall APE of each strategy. Table 4 provides the RMSE values for each method and Figure 11 and Figure 12 represent absolute position errors (x, y, z) and orientation errors (roll, pitch, and yaw) for the UL outdoor car parking dataset.

4.2. UL Outdoor Handheld Dataset

In this experiment, we employed a custom-designed UAV sensor suite to evaluate the capabilities of our proposed framework. The RTK position was used as ground truth, leveraging the high-quality GPS signal recorded throughout the experiment. Data collection was performed using a handheld UAV method while navigating an outdoor environment. This setup presented challenges such as image degradation, structureless surroundings, dynamic targets, and unstable feature conditions that are particularly difficult for vision-based and LiDAR-based methods.

To validate the consistency of the proposed LSAF framework, the experiment was conducted without loop closure. The handheld mode eliminated the noise typically introduced during flight missions, providing a clean dataset to assess the proposed LSAF under challenging conditions. State estimation was performed using LSAF across various sensor combinations and compared with state-of-the-art (SOTA) methods such as VINS-Fusion [4] and FASTLIO2 [17]. Figure 13 displays the trajectories obtained using different methods, while Figure 14 highlights the box plots showing the overall APE of each of the strategies. Table 5 provides the RMSE values for each method, and Figure 15 and Figure 16 present the absolute position errors (x, y, z) and orientation errors (roll, pitch, and yaw) for the handheld UAV dataset.

The results demonstrate that significant position drifts occurred in the stereo IMU-only scenario. However, accuracy improved considerably when LiDAR, GPS, or their combination was integrated. VINS-Fusion exhibited growing errors due to accumulated drift, whereas LSAF maintained a smooth trajectory consistent with the ground truth. Unlike VINS-Fusion and FASTLIO2, which failed to align precisely with the reference data, LSAF achieved superior performance by leveraging the LSTM-based self-adaptive multi-sensor fusion (LSAF) framework and MSCKF fusion.

The system was compared with state-of-the-art algorithms, including VINS-Fusion [4] and FASTLIO2 [17]. VINS-Fusion integrates visual inertial odometry with/without GPS data, while FASTLIO2 employs LiDAR inertial odometry combined with global optimization, including loop closure. In comparison, the proposed method utilizes an LSTM-based adaptive weighting mechanism to enhance robustness against sensor degradation and environmental variability, ensuring accurate and reliable state estimation in dynamic conditions.

Quantitative Analysis

Table 4 summarizes the results of the accuracy evaluation for the proposed method and the benchmark algorithms on the outdoor dataset. The proposed system, which incorporates adaptive fusion based on LSTM, outperformed both VINS-Fusion and FASTLIO2 in terms of maximum, mean, and RMSE metrics. Specifically, the proposed method achieved the lowest RMSE of 0.328436 and 0.385019, significantly outperforming FASTLIO2 (0.385019), VINS-Fusion (S+I) 9.45291, and VINS-Fusion (S+I+G) 9.438278 on the UL outdoor car parking dataset. The maximum error for the proposed system was 0.889442, which was notably lower than that of the other methods, indicating better robustness to outliers and sensor degradation.

Table 5 presents the accuracy evaluation results for the proposed method and benchmark algorithms on the UL outdoor handheld dataset, highlighting the superior performance of the proposed system incorporating LSTM-based self-adaptive fusion (LSAF). The proposed method achieved the lowest RMSE of 0.598172, significantly outperforming benchmark methods such as FASTLIO2 (6.830505), VINS-Fusion (S+I+G) (6.846302), and VINS-Fusion (S+I) (7.18024). Additionally, the proposed system demonstrated a maximum error of 2.982927, which is substantially lower compared to the other methods, reflecting its robustness to outliers and sensor degradation.

The mean and median errors for the proposed method were also the lowest, at 0.525667 and 0.450802, respectively, showcasing its consistent accuracy. In contrast, methods like VINS-Fusion and FASTLIO2 exhibited significantly higher errors due to their limitations in handling dynamic environments and sensor noise. The results further emphasize the advantages of incorporating LSTM-based adaptive fusion for enhanced performance in challenging real-world scenarios. The performance gap between the proposed system with and without LSAF also highlights the critical role of adaptive fusion in reducing positional errors and ensuring reliable state estimation.

4.3. UL Car Bridge Dataset

The experimental evaluation was conducted at the University of Limerick’s Car Bridge, where the UAV was flown both above and beneath the bridge to assess the performance of the proposed LSAF framework under varying environmental conditions, as shown in Figure 17. Ground truth reference data for LSAF was obtained using an RTK system, while the UAV was manually operated by a trained pilot. To validate the global consistency of LSAF, the experiment was performed without loop closure. The test environment presented significant challenges, including rapid illumination changes, agile UAV maneuvers, and GPS-degraded conditions, as illustrated in the trajectory (Figure 17). These conditions are particularly demanding for visual-inertial odometry (VIO) and LiDAR odometry (LO) methods, where the proposed LSAF fusion approach demonstrated superior performance compared to state-of-the-art techniques. Throughout the experiment, the UAV maintained a strong RTK signal lock, ensuring fixed position accuracy for most of the flight. However, while navigating under the bridge, the number of visible GPS satellites temporarily dropped to 11, causing intermittent signal degradation in this challenging environment.

This experiment evaluates the global consistency of the proposed LSAF framework in a challenging environment with unstable and noisy GPS signals, particularly under a bridge, where localization accuracy and trajectory smoothness are significantly affected. The results demonstrate that LSAF effectively mitigates single sensor drift, maintaining global consistency and ensuring smooth local trajectory estimation despite degraded GPS conditions. Figure 18 and Figure 19 illustrate the absolute position errors in the x, y, and z coordinates and the roll, pitch, and yaw angles, comparing multiple methods on the UAV car bridge dataset, while Table 6 presents the corresponding RMSE values for each approach. Figure 20 shows a box plot of the overall relative pose error (RPE) for five different strategies, demonstrating that LSAF outperforms other state-of-the-art (SOTA) methods.

4.4. Qualitative Analysis and Trajectory Comparison

The trajectory plots illustrated in Figure 9, Figure 13, and Figure 17 compare the estimated trajectories of the proposed method, FASTLIO2, and VINS-Fusion. The proposed method demonstrates superior consistency and alignment with the ground truth provided by RTK, particularly in regions with sparse LiDAR and stereo features. In contrast, FASTLIO2 exhibits significant drift in regions with degraded LiDAR feature density, while VINS-Fusion suffers from cumulative errors due to visual degradation under high illumination.

The absolute pose error (APE) plots for each axis in Figure 11, Figure 15 and Figure 18 further highlight the advantages of the proposed system. The box plots in Figure 10, Figure 14, and Figure 20 compare the overall APE distributions for all methods, showing that the proposed method achieves the smallest error spread and highest accuracy across the dataset. The LSTM-based adaptive fusion mechanism effectively mitigates sensor-specific errors by dynamically adjusting sensor contributions in real time. For instance, in regions where LiDAR features are sparse, the LSTM assigns higher weights to IMU or GPS or stereo camera data, thereby maintaining accurate state estimation.

4.5. Analysis of LSTM-Based Adaptive Fusion

The inclusion of the LSTM-based adaptive fusion mechanism introduces several advantages over traditional fixed-weight fusion approaches. The dynamic weighting process enables the system to adapt to environmental changes and sensor degradation. For example, in bright outdoor conditions, the LSTM down-weights stereo camera data when visual feature density is low, prioritizing LiDAR and IMU measurements instead. Similarly, in areas with sparse LiDAR features, GPS data are weighted more heavily to mitigate drift. Figure 9, Figure 13, and Figure 17 demonstrate the proposed method’s ability to maintain trajectory accuracy despite varying sensor reliability. This is further supported by the quantitative metrics in Table 4, Table 5 and Table 6, which show that the proposed system consistently outperforms the benchmark methods in all scenarios. The adaptive nature of the LSTM allows the system to handle asynchronous and noisy sensor measurements more effectively than traditional Kalman filter-based fusion approaches.

4.6. System Robustness and Computational Efficiency

The proposed system is designed to be robust to individual sensor failures, ensuring continuous operation in challenging scenarios. For instance, temporary GPS signal loss or degraded LiDAR performance does not significantly impact the overall state estimation due to the self-adaptive fusion mechanism. This resilience is critical for real-world UAV applications, where sensor reliability can vary due to environmental factors.

The computational efficiency of the proposed system was validated on an Ubuntu Linux laptop equipped with an Intel Core(TM) i7-10750H CPU (3.70 GHz) and 32GB of memory. The implementation utilized C++ and ROS, ensuring real-time performance with minimal latency. The inclusion of the LSTM, while adding computational complexity, was optimized using hardware acceleration, ensuring that the system operates within real-time constraints.

5. Discussion

This paper builds upon the work presented at the 12th International Conference on Control, Mechatronics, and Automation (ICCMA 2024) [28] by introducing a significant enhancement that includes an LSTM-based self-adaptive fusion technique. This addition allows the system to dynamically adjust sensor contributions in real time, making it more robust to challenging environmental conditions and improving sensor reliability compared to the earlier fixed-weight fusion approach. The LSTM mechanism ensures that the most reliable sensors are prioritized during operation. For example, in GPS-degraded areas, the system gives more weight to LiDAR, IMU, GPS and stereo cameras. In contrast, in environments with sparse LiDAR features, GPS data becomes more influential. This flexibility ensures accurate UAV state estimation, even in challenging scenarios like bright outdoor conditions or feature-poor environments. Extensive testing on real-world datasets confirmed the effectiveness of the proposed system. The results showed that the LSTM-based fusion method outperformed state-of-the-art algorithms such as VINS-Fusion and FASTLIO2, as well as fusion approaches without LSAF, in terms of accuracy and resilience to sensor degradation. The system achieved lower trajectory errors and demonstrated its ability to handle complex environments with minimal cumulative errors. Overall, this work successfully combines traditional Kalman filtering with modern deep learning to improve UAV state estimation. The LSTM-based adaptive fusion framework sets a strong foundation for future research and practical UAV applications in complex and diverse environments.

6. Conclusions and Future Work

This study introduces a novel LSTM-based self-adaptive multi-sensor fusion framework aimed at improving UAV state estimation accuracy and robustness. The proposed approach dynamically adjusts sensor fusion weights in real time, leveraging an LSTM network to account for varying environmental conditions and sensor reliability. By integrating measurements from GPS, LiDAR, stereo cameras, and IMU, the system effectively addresses challenges posed by GPS-degraded environments, sparse feature areas, and high motion dynamics. The framework was validated on real-world datasets collected using a UAV platform in challenging outdoor environments. Experimental results demonstrate that the proposed fusion framework outperforms state-of-the-art methods such as VINS-Fusion (S+I), VINS-Fusion (S+I+G) and FASTLIO2 (L+I), as well as approaches without LSAF fusion, achieving superior trajectory accuracy and consistency. The incorporation of the LSTM-based adaptive weighting mechanism significantly enhances the system’s ability to handle sensor degradation and environmental variability. In scenarios where traditional fixed-weight fusion methods struggle, such as in bright, sunny conditions with degraded stereo or sparse LiDAR features, the LSTM dynamically prioritizes the most reliable sensors, ensuring robust and accurate state estimation. This adaptability is a key innovation that bridges the gap between traditional filtering techniques and modern deep learning approaches for UAV navigation.

Despite the system’s demonstrated success, there remain opportunities for further enhancement. Future work could focus on extending the proposed framework to broader and more diverse datasets, particularly in GPS-denied environments such as dense urban canyons, forested areas, or indoor spaces. Incorporating additional sensor modalities, such as radar or thermal cameras, could further enhance robustness in low-visibility conditions or adverse weather. Moreover, improving the LSTM model by incorporating uncertainty estimation techniques, such as Bayesian neural networks, could provide better confidence measures for the adaptive weighting process. Another promising direction lies in the optimization of computational efficiency to enable deployment on smaller, resource-constrained UAV platforms. Techniques such as model pruning, quantization, or the use of edge AI hardware could be explored to reduce the computational overhead of the LSTM while maintaining real-time performance. Additionally, investigating the integration of reinforcement learning into the fusion framework could enable the system to autonomously adapt to new environments during operation without the need for extensive retraining.

In conclusion, the proposed LSTM-based self-adaptive fusion framework represents a significant advancement in UAV state estimation, combining the strengths of traditional Kalman filtering with the flexibility of modern deep learning. The demonstrated robustness and adaptability of the system position it as a valuable contribution to the field of autonomous UAV navigation, with the potential for further enhancements and applications in diverse operational scenarios.

Author Contributions

Conceptualization, methodology, validation, format analysis, writing—original draft preparation, visualization: M.I. and S.D.; writing—review and editing, supervision: P.T., J.R. and G.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the European Commission’s Horizon 2020 Project RAPID under Grant 861211 (EU RAPD N° 861211), and in part by the Enterprise Ireland’s Disruptive Technologies Innovation Fund (DTIF) Project GUARD under Grant DT2020 0286B.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

UAV	Unmanned aerial vehicles
LiDAR	Light detection and ranging
CNN	Convolutional neural network
LSTM	Long short-term memory
GNSS	Global navigation satellite system
AVs	Autonomous vehicles
IMU	Inertial measurement unit
RTK	Real-time kinematic
GPS	Global positioning system

References

Ye, X.; Song, F.; Zhang, Z.; Zeng, Q. A Review of Small UAV Navigation System Based on Multisource Sensor Fusion. IEEE Sens. J. 2023, 23, 18926–18948. [Google Scholar] [CrossRef]
Irfan, M.; Kishore, K.; Chhabra, V.A. Smart Vehicle Management System Using Internet of Vehicles (IoV). In Proceedings of the International Conference on Advanced Computing Applications, Advances in Intelligent Systems and Computing, Virtually, 27–28 March 2021; Mandal, J.K., Buyya, R., De, D., Eds.; Springer: Singapore, 2022; Volume 1406. [Google Scholar]
Wang, Z.; Wu, Y.; Niu, Q. Multi-Sensor Fusion in Automated Driving: A Survey. IEEE Access 2020, 8, 2847–2868. [Google Scholar] [CrossRef]
Qin, T.; Cao, S.; Pan, J.; Shen, S. A general optimization-based framework for global pose estimation with multiple sensors. arXiv 2019, arXiv:1901.03642. [Google Scholar]
Lee, W.; Geneva, P.; Chen, C.; Huang, G. Mins: Efficient and robust multisensor-aided inertial navigation system. arXiv 2023, arXiv:2309.15390. [Google Scholar]
Irfan, M.; Dalai, S.; Trslic, P.; Santos, M.C.; Riordan, J.; Dooly, G. LGVINS: LiDAR-GPS-Visual and Inertial System Based Multi-Sensor Fusion for Smooth and Reliable UAV State Estimation. IEEE Trans. Intell. Veh. 2024. [Google Scholar] [CrossRef]
Zhu, J.; Li, H.; Zhang, T. Camera, LiDAR, and IMU Based Multi-Sensor Fusion SLAM: A Survey. Tsinghua Sci. Technol. 2024, 29, 415–429. [Google Scholar] [CrossRef]
Irfan, M.; Dalai, S.; Kishore, K.; Singh, S.; Akbar, S.A. Vision-based Guidance and Navigation for Autonomous MAV in Indoor Environment. In Proceedings of the 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India, 1–3 July 2020; pp. 1–5. [Google Scholar] [CrossRef]
Harun, M.H.; Abdullah, S.S.; Aras, M.S.M.; Bahar, M.B. Sensor Fusion Technology for Unmanned Autonomous Vehicles (UAV): A Review of Methods and Applications. In Proceedings of the 2022 IEEE 9th International Conference on Underwater System Technology: Theory and Applications (USYS), Kuala Lumpur, Malaysia, 5–6 December 2022; pp. 1–8. [Google Scholar] [CrossRef]
Geneva, P.; Eckenhoff, K.; Lee, W.; Yang, Y.; Huang, G. Openvins: A research platform for visual-inertial estimation. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020. [Google Scholar]
Fei, S.; Hassan, M.A.; Xiao, Y.; Su, X.; Chen, Z.; Cheng, Q.; Duan, F.; Chen, R.; Ma, Y. UAV-based multi-sensor data fusion and machine learning algorithm for yield prediction in wheat. Precis. Agric. 2023, 24, 187–212. [Google Scholar] [CrossRef] [PubMed]
Wu, Y.; Li, Y.; Li, W.; Li, H.; Lu, R. Robust LiDAR-based localization scheme for unmanned ground vehicle via multisensor fusion. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 5633–5643. [Google Scholar] [CrossRef]
Singh, S.; Kishore, K.; Dalai, S.; Irfan, M.; Singh, S.; Akbar, S.A.; Sachdeva, G.; Yechangunja, R. CACLA-Based Local Path Planner for Drones Navigating Unknown Indoor Corridors. IEEE Intell. Syst. 2022, 37, 32–41. [Google Scholar] [CrossRef]
Dalai, S.; O’Connell, E.; Newe, T.; Trslic, P.; Manduhu, M.; Irfan, M.; Riordan, J.; Dooly, G. CDDQN based efficient path planning for Aerial surveillance in high wind scenarios. In Proceedings of the OCEANS 2023—Limerick, Limerick, Ireland, 5–8 June 2023; pp. 1–7. [Google Scholar] [CrossRef]
Kazerouni, I.A.; Fitzgerald, L.; Dooly, G.; Toal, D. A survey of state-of-the-art on visual SLAM. Expert Syst. Appl. 2022, 205, 117734. [Google Scholar] [CrossRef]
O’Riordan, A.; Newe, T.; Dooly, G.; Toal, D. Stereo vision sensing: Review of existing systems. In Proceedings of the 2018 12th International Conference on Sensing Technology (ICST), Limerick, Ireland, 4–6 December 2018. [Google Scholar]
Xu, W.; Cai, Y.; He, D.; Lin, J.; Zhang, F. Fast-lio2: Fast direct lidar-inertial odometry. IEEE Trans. Robot. 2022, 38, 2053–2073. [Google Scholar] [CrossRef]
Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.; Tardós, J.D. Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
Sun, K.; Mohta, K.; Pfrommer, B.; Watterson, M.; Liu, S.; Mulgaonkar, Y.; Taylor, C.J.; Kumar, V. Robust stereo visual inertial odometry for fast autonomous flight. IEEE Robot. Autom. Lett. 2018, 3, 965–972. [Google Scholar] [CrossRef]
Leutenegger, S.; Lynen, S.; Bosse, M.; Siegwart, R.; Furgale, P. Keyframe-based visual–inertial odometry using nonlinear optimization. Int. J. Robot. Res. 2015, 34, 314–334. [Google Scholar] [CrossRef]
Zhang, J.; Singh, S. LOAM: Lidar odometry and mapping in real-time. In Proceedings of the Robotics: Science and Systems, Berkeley, CA, USA, 12–16 July 2014; Volume 2. [Google Scholar]
Devarajan, H.; Zheng, H.; Kougkas, A.; Sun, X.H.; Vishwanath, V. Dlio: A data-centric benchmark for scientific deep learning applications. In Proceedings of the 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid), Melbourne, Australia, 10–13 May 2021. [Google Scholar]
Huai, Z.; Huang, G. Robocentric visual–inertial odometry. Int. J. Robot. Res. 2022, 41, 667–689. [Google Scholar] [CrossRef]
Shan, T.; Englot, B.; Ratti, C.; Rus, D. Lvi-sam: Tightly-coupled lidar-visual-inertial odometry via smoothing and mapping. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021. [Google Scholar]
Han, L.; Lin, Y.; Du, G.; Lian, S. DeepVIO: Self-supervised Deep Learning of Monocular Visual Inertial Odometry using 3D Geometric Constraints. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 6906–6913. [Google Scholar] [CrossRef]
Shan, T.; Englot, B.; Meyers, D.; Wang, W.; Ratti, C.; Rus, D. Lio-sam: Tightly-coupled lidar inertial odometry via smoothing and mapping. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October–24 January 2020. [Google Scholar]
Mourikis, A.I.; Roumeliotis, S.I. A multi-state constraint Kalman filter for vision-aided inertial navigation. In Proceedings of the 2007 IEEE International Conference on Robotics and Automation, Roma, Italy, 10–14 April 2007. [Google Scholar]
Irfan, M.; Dalai, S.; Vishwakarma, K.; Trslic, P.; Riordan, J.; Dooly, G. Multi-Sensor Fusion for Efficient and Robust UAV State Estimation. In Proceedings of the 2024 12th International Conference on Control, Mechatronics and Automation (ICCMA), London, UK, 11–13 November 2024. [Google Scholar]
Rehder, J.; Nikolic, J.; Schneider, T.; Hinzmann, T.; Siegwart, R. Extending kalibr: Calibrating the extrinsics of multiple IMUs and of individual axes. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016. [Google Scholar]
Zhu, F.; Ren, Y.; Zhang, F. Robust real-time lidar-inertial initialization. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022. [Google Scholar]
Grupp, M. evo: Python Package for the Evaluation of Odometry and Slam. 2017. Available online: https://github.com/MichaelGrupp/evo (accessed on 6 February 2025).

Figure 1. Proposed architecture for LSTM-based self-adaptive multi-sensor fusion (LSAF).

Figure 2. An illustration of the proposed LSAF framework. The global estimator combines local estimations from various global sensors to achieve precise local accuracy and globally drift free pose estimation, which builds upon our previous work [28].

Figure 3. Proposed LSTM-based multi-sensor fusion architecture for UAV state estimation.

Figure 4. LSTM cell architecture for adaptive multi-sensor fusion.

Figure 5. Training and validation loss of the proposed LSTM-based self-adaptive multi-sensor fusion (LSAF) framework over 1000 epochs.

Figure 6. Training and validation MAE of the proposed LSTM-based self-adaptive multi-sensor fusion (LSAF) framework over 1000 epochs.

Figure 7. Proposed block diagram for LSTM-based self-adaptive multi-sensor fusion (LSAF).

Figure 8. The experimental environment in different scenarios during the data collection. Panel (a,b) represent the UAV hardware along with sensor integration and panel (c,d) are the open-field dataset environment view from stereo and LiDAR sensors, respectively, which build upon our previous work [28].

Figure 9. Trajectory plots of the proposed LSAF method and comparison with FASTLIO2 and VINS-Fusion.

Figure 10. Box plots showing the overall APE of each strategy.

Figure 11. Absolute estimated position of x, y, and z axes showing plots of various methods on the UAV car parking dataset.

Figure 12. Absolute position error of roll, yaw, and pitch showing the plots of various methods on the UAV car parking dataset.

Figure 13. Trajectory plots of the proposed LSAF method and comparison with FASTLIO2 and VINS-Fusion on the UL outdoor handheld dataset.

Figure 14. Box plots showing the overall APE of each strategy.

Figure 15. Absolute estimated position of x, y, and z axes showing the plots of various methods on the UL outdoor handheld dataset.

Figure 16. Absolute position error of roll, yaw, and pitch showing the plots of various methods on the UL outdoor handheld dataset.

Figure 17. Trajectory plots of the proposed LSAF method and comparison with FASTLIO2 and VINS-Fusion.

Figure 18. Absolute estimated position of the x, y, and z axes showing the plots of various methods on the UAV car bridge dataset.

Figure 19. Absolute position error of roll, yaw, and pitch showing plots of various methods on the UAV car bridge dataset.

Figure 20. Box plots showing the overall APE of each strategy.

Table 1. Summary of the comparison between proposed LSAF vs. state-of-the-art methods.

Method	Sensors	Fusion Strategy	Dynamic Weighting	Adaptability	Key Strengths	Key Limitations
LSAF (Proposed)	Stereo, IMU, GPS, LiDAR	LSTM-based Adaptive Fusion + MSKF	Yes	High	Robust to sensor degradation and dynamic targets	Requires model training and tuning
VINS-Fusion (2018) [4]	Stereo, IMU, GPS	Predefined Weighting	No	Moderate	Efficient visual-inertial fusion	High drift in GPS-denied or dynamic environments
FASTLIO2 (2022) [17]	LiDAR, IMU	Predefined Weighting	No	Moderate	High accuracy in LiDAR-rich areas	Degrades in dynamic or sparse environments
ORB-SLAM3 (2020) [18]	Stereo, IMU	Visual SLAM with Fixed Weights	No	Low to Moderate	Loop-closure-enabled global optimization	Fails in low-texture or poor-light conditions
MSCKF (2017 [19])	Stereo, IMU	Multi-State Constraint Kalman Filter	No	Low to Moderate	Fast and computationally efficient	Limited accuracy; lacks adaptability
OKVIS (2015) [20]	Stereo, IMU	Probabilistic Fusion	No	Low to Moderate	Robust visual-inertial tracking	Poor performance in dynamic environments
LOAM (2014) [21]	LiDAR, IMU	LiDAR Odometry + Mapping	No	Moderate	Accurate LiDAR-based mapping	Computationally intensive
DLIO (2021) [22]	LiDAR, IMU	Deep Learning for LiDAR Odometry	Partial (Learned Weights)	Moderate	Data-driven LiDAR odometry	Requires extensive training data
R-VIO (2021) [23]	Stereo, IMU	Fixed Weights	No	Moderate	Accurate in small-scale scenarios	High drift in large-scale environments
GPS Only	GPS	Single-Sensor	No	Low	Accurate global positioning in open areas	Fails in GPS-denied or multipath environments
LVI-SAM (2021) [24]	LiDAR, IMU, Stereo	Joint Optimization	No	Moderate to High	Combines LiDAR and visual optimization	Requires high computational power
DeepVIO (2020) [25]	Stereo, IMU	Deep Learning-based Visual-Inertial	Yes (Learned Weights)	Moderate	Learns fusion weights; robust in some cases	Needs large datasets; computationally intensive
SC-LIO-SAM (2022) [26]	LiDAR, IMU	Joint LiDAR SLAM	No	High	Combines semantic segmentation with LiDAR SLAM	High dependency on LiDAR quality

Table 2. Summary of the LSTM-based self-adaptive multi-sensor fusion (LSAF) framework.

Component	Description
Input Sensors	GPS-RTK (latitude, longitude, altitude), IMU (linear acceleration, angular velocity), Stereo Cameras (left and right image illumination), LiDAR (point cloud density)
Input Shape	$(N, T, 12)$ (Number of samples, time steps, sensor features)
Output Shape	$(N, T, 4)$ (Sensor reliability weights for fusion)
LSTM Architecture	Two LSTM layers (128, 64 units) followed by a Time-Distributed Dense Layer
Loss Function	Mean Squared Error (MSE)
Optimizer	Adam
Training Epochs	1000
Batch Size	32
Weight Updates	At each time step (Real-time adjustment)

Table 3. Hardware configuration.

Sensor H/W	Type	Specifications	Frequency
GPS/RTK	DJI M300	GPS+GLONASS+BeiDou+Galileo	50 Hz
3D-LiDAR	Livox Mid360	Laser: 905 nm, FOV: 360° (H)/−7° to 52° (V), Pointrate: 200k pts/s, Range: 40 m (10%), 70 m (80%)	20 Hz
IMU	DJI M300	6-axis Mechanism	400 Hz
Stereo-Camera	DJI M300	Grayscale, 640 × 480 resolution, 10 fps	15 Hz

Table 4. Summary of accuracy evaluation of the UL outdoor car parking dataset (in metres).

Method	Max	Mean	Median	Min	RMSE
Proposed (LSAF)	0.889442	0.296818	0.280041	0.04252	0.328436
Proposed (without LSAF)	1.97031	0.345241	0.30052	0.043532	0.385019
FASTLIO2 (L+I)	3.241041	1.173322	1.097815	0.030088	1.321733
VINS-Fusion (S+I+G)	14.048131	8.664866	9.059936	1.096348	9.438278
VINS-Fusion (S+I)	14.043965	8.682303	9.079959	1.081238	9.45291

Table 5. Summary of accuracy evaluation of the UL outdoor handheld dataset (in metres).

Method	Max	Mean	Median	Min	RMSE
Proposed (LSAF)	2.982927	0.525667	0.450802	0.051566	0.598172
Proposed (without LSAF)	4.955625	0.232469	0.559698	0.064084	1.341954
FASTLIO2 (L+I)	11.488647	6.466611	5.564047	2.445243	6.830505
VINS-Fusion (S+I+G)	10.320782	6.479688	6.737983	1.811507	6.846302
VINS-Fusion (S+I)	14.391085	6.772864	6.961714	2.853421	7.18024

Table 6. Summary of accuracy evaluation of the UL car bridge dataset (in metres).

Method	Max	Mean	Median	Min	RMSE
Proposed (LSAF)	1.103132	0.283116	0.193321	0.002481	0.318363
Proposed (without LSAF)	1.296882	0.384646	0.291391	0.012473	0.473501
FASTLIO2 (L+I)	2.485825	0.630901	0.438535	0.032907	0.805311
VINS-Fusion (S+I+G)	2.284776	0.421503	0.253154	0.006963	0.683928
VINS-Fusion (S+I)	22.006355	7.392873	5.851291	1.614134	8.892644

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Irfan, M.; Dalai, S.; Trslic, P.; Riordan, J.; Dooly, G. LSAF-LSTM-Based Self-Adaptive Multi-Sensor Fusion for Robust UAV State Estimation in Challenging Environments. Machines 2025, 13, 130. https://doi.org/10.3390/machines13020130

AMA Style

Irfan M, Dalai S, Trslic P, Riordan J, Dooly G. LSAF-LSTM-Based Self-Adaptive Multi-Sensor Fusion for Robust UAV State Estimation in Challenging Environments. Machines. 2025; 13(2):130. https://doi.org/10.3390/machines13020130

Chicago/Turabian Style

Irfan, Mahammad, Sagar Dalai, Petar Trslic, James Riordan, and Gerard Dooly. 2025. "LSAF-LSTM-Based Self-Adaptive Multi-Sensor Fusion for Robust UAV State Estimation in Challenging Environments" Machines 13, no. 2: 130. https://doi.org/10.3390/machines13020130

APA Style

Irfan, M., Dalai, S., Trslic, P., Riordan, J., & Dooly, G. (2025). LSAF-LSTM-Based Self-Adaptive Multi-Sensor Fusion for Robust UAV State Estimation in Challenging Environments. Machines, 13(2), 130. https://doi.org/10.3390/machines13020130

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LSAF-LSTM-Based Self-Adaptive Multi-Sensor Fusion for Robust UAV State Estimation in Challenging Environments †

Abstract

1. Introduction

2. Related Work

2.1. Multi-Sensor Fusion Approaches

2.1.1. Filtering-Based Methods

2.1.2. Optimization-Based Methods

2.1.3. Deep Learning-Based Methods

2.2. Sensor-Specific Contributions

2.2.1. Vision-Based SLAM

2.2.2. LiDAR-Based SLAM

2.2.3. Multi-Sensor Fusion for SLAM

2.3. Challenges and Opportunities

3. Methodology

3.1. Coordinate Systems and Sensor Calibration

3.2. State Representation and Propagation

3.3. Measurement Models for Multi-Sensor Integration

3.4. Self-Adaptive Fusion with LSTM

3.5. Proposed LSTM-Based Multi-Sensor Fusion Architecture

3.6. LSTM Cell Mechanism for Self-Adaptive Fusion

3.6.1. LSTM Training and Validation Loss

3.6.2. Mean Absolute Error (MAE) Analysis

3.6.3. Validation of the Proposed Framework

3.7. Fusion Framework Using MSCKF

3.8. Advantages of the Proposed Framework

3.9. Experimental Setup and Dataset

4. Results and Comparison

4.1. UL Outdoor Car Parking Dataset

4.2. UL Outdoor Handheld Dataset

Quantitative Analysis

4.3. UL Car Bridge Dataset

4.4. Qualitative Analysis and Trajectory Comparison

4.5. Analysis of LSTM-Based Adaptive Fusion

4.6. System Robustness and Computational Efficiency

5. Discussion

6. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

LSAF-LSTM-Based Self-Adaptive Multi-Sensor Fusion for Robust UAV State Estimation in Challenging Environments^†