Open AccessArticle

Parameter-Efficient Vehicle Trajectory Prediction Based on Attention-Enhanced Liquid Structural Neural Model

Ruochen Wang

¹,

Yue Chen

¹,

Renkai Ding

^2,* and

Qing Ye

School of Automotive and Traffic Engineering, Jiangsu University, Zhenjiang 212013, China

Automotive Engineering Research Institute, Jiangsu University, Zhenjiang 212013, China

Author to whom correspondence should be addressed.

World Electr. Veh. J. 2025, 16(1), 19; https://doi.org/10.3390/wevj16010019

Submission received: 16 December 2024 / Revised: 28 December 2024 / Accepted: 30 December 2024 / Published: 31 December 2024

(This article belongs to the Special Issue Deep Learning Applications for Electric Vehicles)

Download

Browse Figures

Versions Notes

Abstract

Due to advances in sensor techniques and deep learning, autonomous vehicular technologies have become more reliable and practical. Trajectory prediction is a critical task to anticipate the future positions of surrounding vehicles. However, existing algorithms, such as LSTM-based and attention-based models, face challenges of high computational complexity, large parameter sizes, and limited ability to efficiently capture both temporal dependencies and spatial interactions in dynamic traffic scenarios. In this paper, we propose a parameter-efficient trajectory prediction model that integrates Liquid Time-Constant (LTC) networks with attention mechanisms, termed the Attn-LTC model. The key contributions of our work are threefold. First, we introduce a temporal attention-enhanced LTC encoder that effectively captures both long-term temporal dependencies and dynamic behaviors from historical trajectory data. Second, we incorporate a spatial attention-enhanced LTC decoder, which emphasizes the influence of neighboring vehicles and spatial interactions, thereby improving prediction accuracy. Third, we demonstrate the computational efficiency of the Attn-LTC model, which achieves high predictive accuracy with significantly fewer parameters compared to LSTM-based and Transformer-based counterparts. Extensive experiments conducted on the NGSIM dataset demonstrate the advantages of our proposed Attn-LTC model. Notably, it reduces computational complexity and model size while maintaining superior accuracy, making it well suited for deployment in resource-constrained systems. The results highlight the effectiveness of the Attn-LTC model in balancing precision and efficiency, paving the way for its application in real-time autonomous driving systems.

Keywords:

trajectory prediction; ordinary differential equation; attention mechanism; autonomous driving; computational efficiency

1. Introduction

Trajectory prediction plays a critical role in various applications, including autonomous driving [1], intelligent transportation systems [2], path planning [3], and collision avoidance [4]. The ability to accurately forecast the future positions of vehicles enables systems to make informed and timely decisions. This property is critical for ensuring safety and efficiency. For example, in autonomous vehicles, predicting the trajectories of surrounding vehicles is essential for collision avoidance, lane-changing maneuvers, and route planning. Similarly, in robotics, trajectory prediction is fundamental for path planning and human–robot interaction.

Trajectory prediction tasks can be categorized into single-vehicle trajectory prediction (SVTP) and multi-vehicle trajectory prediction (MVTP). SVTP focuses on predicting the future trajectory of an isolated vehicle, which is effective in sparse traffic scenarios. In contrast, MVTP considers the interdependencies among multiple vehicles. This is more challenging because recognizing the motion of one vehicle can influence others, which a critical design factor in crowded urban settings. The importance of trajectory prediction lies in its ability to model and anticipate the motion dynamics of agents in complex and interactive environments. As real-world traffic scenarios become increasingly crowded and dynamic, the need for precise, reliable, and real-time trajectory forecasts becomes paramount. Autonomous systems must process a large amount of spatial and temporal data to predict trajectories accurately while considering environmental factors and interactions between agents. The success of these applications relies heavily on the underlying prediction algorithms’ ability to capture both temporal dependencies (historical motion patterns) and spatial relationships (vehicle interactions) effectively.

Recent advancements in machine learning, particularly deep learning, along with high-quality sensor data, have significantly enhanced trajectory prediction models. Groundbreaking studies like Biktairov et al. [5] utilize bird’s-eye views and CNNs to model trajectory distributions, emphasizing spatial patterns within specific scenes. IntentNet [6] expands on this by leveraging raw sensor data to infer vehicle intentions, capturing complex behavioral cues for improved decision-making. Multimodal frameworks [7] further enhance prediction accuracy by integrating diverse data modalities, such as visual and motion features. To address the uncertainty inherent in dynamic traffic scenarios, approaches like [8] incorporate uncertainty estimation into motion predictions, facilitating safer navigation. Graph-based techniques, exemplified by LanerCNN [9], utilize distributed spatial reasoning for precise trajectory forecasting, highlighting the potential of graph neural networks in motion prediction tasks.

A key aspect of trajectory prediction lies in modeling dependencies, as vehicle motion is influenced by both historical trajectories and interactions with surrounding vehicles. Long Short-Term Memory (LSTM) networks [10] effectively capture long-term temporal dependencies, enabling models to learn sequential motion patterns. Variants like Social-LSTM [11] and social pooling LSTM [12] integrate spatial interactions among vehicles, allowing for the prediction of maneuver-specific trajectories in complex settings such as highways or crowded intersections. However, LSTMs face challenges with scalability and interpretability, particularly in dense traffic environments. To overcome these limitations, attention mechanisms [13,14] have been adopted to prioritize relevant temporal and spatial features, dynamically weighting important interactions for improved accuracy and explainability. Models like the Spatial–Temporal Attentive LSTM [13] effectively capture dynamic relationships among vehicles and historical motions. Recent advancements have also seen the adoption of Transformer-based models [15], which leverage multi-head self-attention to model complex spatial and temporal interactions in parallel. By eliminating the recurrent structures of traditional architectures, Transformers offer superior scalability, efficiently handling long sequences and higher-order interactions. The Spatial Interaction-aware Transformer [15] further extends this capability by incorporating attention masks, enabling the simultaneous capture of historical dependencies and future interactions.

Despite the significant progress in trajectory prediction, developing parameter-efficient algorithms remains a key challenge. Many existing models, such as LSTMs and Transformer-based architectures, require a substantial number of parameters to model complex dependencies, which can lead to high computational costs and memory demands. This limits their deployment in resource-constrained systems, such as embedded devices in autonomous vehicles or edge devices in robotics. Parameter-efficient trajectory prediction algorithms are critical for achieving the balance between prediction accuracy and computational feasibility, enabling real-time operation without sacrificing performance.

In this work, we address these challenges by proposing a parameter-efficient trajectory prediction model that integrates Liquid Time-Constant (LTC) networks [16] with attention mechanisms, offering a lightweight yet accurate solution. The key contributions are as follows:

A Temporal Attention-enhanced LTC Encoder is developed to effectively capture long-term temporal dependencies and dynamic behaviors from historical trajectory data.
A Spatial Attention-enhanced LTC Decoder is introduced, emphasizing the influence of neighboring vehicles and spatial interactions to improve prediction accuracy.
The model demonstrates significant computational efficiency, achieving superior prediction accuracy with a much smaller parameter size compared to traditional LSTM-based models [10,11,13]. Extensive experiments on the NGSIM dataset [17] validate the effectiveness of the Attn-LTC model, showcasing its suitability for real-time deployment in resource-constrained environments.

The rest of this paper is organized as follows:

Section 2 reviews the relevant literature on trajectory prediction, highlighting advancements in deep learning methods, attention mechanisms, and dependency modeling.
Section 3 introduces the proposed Attn-LTC model, detailing its framework, representation techniques, encoding and decoding modules, and training methodology.
Section 4 presents the experimental setup, evaluation metrics, baselines, and the results of performance comparisons and ablation studies.
Section 5 concludes the paper, summarizing the findings and outlining potential directions for future research.

2. Related Literature Study

2.1. Deep Learning Methods for Trajectory Prediction

A pivotal aspect of autonomous driving technology is the SVTP problem, a task that focuses on forecasting the future trajectory of a solitary entity within the vehicle’s vicinity. In the realm of single-vehicle prediction, several pioneering studies have made significant contributions. The paper [5] by Biktairov et al. presents a novel method utilizing bird’s-eye views and convolutional neural networks (CNNs) for predicting an individual vehicle’s motion, emphasizing trajectory distribution within a specific scene. Complementing this, IntentNet [6] shifts the focus to understanding vehicle intentions directly from raw sensor data, leveraging deep learning techniques to interpret complex behavioral cues. The importance of considering multiple data modalities is further explored in [7], which demonstrates how integrating diverse data forms can enhance trajectory prediction accuracy for a single vehicle. Addressing the critical aspect of uncertainty, the work [8] introduces a method to incorporate uncertainty estimations in motion predictions, a crucial factor for navigating unpredictable traffic scenarios safely. Finally, LanerCNN [9] offers a graph-centric approach for motion prediction, utilizing distributed representations to forecast an individual vehicle’s trajectory, showcasing the potential of graph-based neural networks in spatial reasoning.

As demonstrated in [18], eco-driving integrates queue prediction at signalized intersections to optimize speed profiles. Recent advancements incorporate vehicle connectivity and real-time data analysis, such as predictive frameworks leveraging vehicle-to-cloud communication for dynamic traffic scenarios [19]. Additionally, eco-cruising strategies [20] enhance energy efficiency on highways and intersections by integrating advanced predictive control methods, which balance driving dynamics with fuel or energy optimization. Together, these studies form a comprehensive foundation for advancing single-vehicle prediction technologies in autonomous driving, each addressing unique challenges and contributing to the development of more sophisticated, reliable prediction models.

2.2. Dependency Modeling in Trajectory Prediction

When processing with trajectory prediction tasks, dependency modeling is crucial because the motion of a vehicle is inherently influenced by its past behavior and the actions of surrounding vehicles. Accurately capturing these dependencies enables models to understand the temporal evolution of a vehicle’s trajectory and the spatial interactions within the traffic scene. There have been multiple types of methods to model spatial and temporal dependencies.

2.2.1. Long Short-Term Memory (LSTM)

Long Short-Term Memory (LSTM) networks [10] have emerged as a powerful tool for trajectory prediction due to their ability to model long-term temporal dependencies effectively. By using an encoder–decoder framework, LSTM-based models [11,12,21] can capture sequential motion patterns in vehicle trajectories. For instance, Deo and Trivedi’s convolutional social pooling LSTM [12] integrates spatial interactions among vehicles through social pooling layers, enhancing the prediction of maneuver-specific trajectories in highway scenarios. Similarly, the Social-LSTM architecture [11] connects LSTMs of neighboring entities using pooling layers, enabling the model to learn social interactions for trajectory prediction in crowded settings. Despite these advancements, traditional LSTM architectures often struggle with explainability and scalability when modeling interactions in dense traffic environments.

2.2.2. Attention Mechanism

Attention mechanisms have further improved trajectory prediction by enabling models to focus on relevant temporal or spatial features. For example, the Spatial–Temporal Attentive LSTM model [13] uses both spatial and temporal attention to capture the dynamic relationships among neighboring vehicles and historical motion features, respectively. This mechanism allows the model to prioritize important interactions, thereby improving accuracy and interpretability. Additionally, methods like the Dynamic and Static Context-Aware Network [14] leverage attention to dynamically identify the importance of surrounding vehicles, enhancing predictions in scenarios with static environmental constraints. These attention-based approaches bridge the gap between modeling individual vehicle trajectories and their dependencies on surrounding vehicles.

Another mainstream method that leverages multi-head self-attention has recently gained traction in trajectory prediction due to its ability to model complex spatial and temporal interactions in parallel. The Spatial Interaction-aware Transformer [15] extends traditional Transformer architectures by incorporating attention masks to capture both historical dependencies and future interactions among vehicles. This approach eliminates the need for recurrent structures like LSTMs, enabling efficient processing of long sequences and improved scalability. The adoption of Transformer-based models demonstrates a significant improvement in handling higher-order interactions and long-term dependencies.

In this work, we leverage advanced Liquid Time-Constant (LTC) networks [16] to build more accurate and parameter-efficient trajectory prediction models. LTC represents a novel class of time-continuous recurrent neural networks (CT-RNNs) designed to enhance the modeling of complex time-series data. Unlike traditional recurrent networks that rely on static parameters to govern their dynamics, LTC networks introduce input-dependent time constants, enabling them to adapt their behavior dynamically based on incoming signals. This flexibility results in superior expressivity and stability, making them particularly effective for tasks requiring high precision in temporal modeling, such as trajectory prediction.

3. Proposed Attn-LTC Model for Trajectory Prediction

The core target of this paper is to precisely predict vehicle trajectories in the observed scene based on attention and liquid time-constant (LTC) networks [16]. Meanwhile, we also particularly focus on the computing efficiency and prediction accuracy. The methodology description for the proposed model is divided into the following aspects. We first introduce the overall framework in Section 3.1. Section 3.2 shows how to effectively represent and preprocess the fused features of vehicles using vectorized format. Section 3.3 introduces the proposed temporal attention-enhanced LTC encoder for trajectory fusion. Then, Section 3.4 shows how the encoded state information can be decoded and leveraged to generate future trajectories. Finally, we introduce how the proposed model is trained to ensure stability in Section 3.5.

3.1. Overall Framework

Trajectory prediction plays a critical role in autonomous driving systems, where understanding and forecasting the future positions of vehicles is essential for safe and reliable navigation. Traditional approaches often struggle to capture complex temporal dependencies and spatial interactions among vehicles. To address this challenge, the overall framework of our proposed Attn-LTC model is illustrated in Figure 1. We assume that the input data include the history trajectories of the target vehicle as well as neighboring vehicles. The input sensor data are usually captured by sensors (such as LiDAR) installed in cars.

The proposed Attn-LTC model exploits the emerging Liquid Time-Constant (LTC) networks [16] and attention mechanisms [22,23]. LTC has superior expressivity and can produce improved performance for time-series prediction tasks. Thus, LTC is capable of modeling both short-term and long-term dynamics effectively while considering the spatial relationships between vehicles. This integration allows the model to generate precise trajectory predictions even in dynamic and multi-vehicle environments. Three core modules make up the entire processing pipeline:

First, the data preprocessing and vectorization module transforms the historical $3 \times 13$ spatial grid information into a structured input format.
Second, the Temporal Attention-enhanced LTC Encoder encodes the temporal dynamics of the target and neighboring vehicles by applying LTC cells enhanced with temporal attention to capture temporal dependencies.
Finally, the Spatial Attention-enhanced LTC Decoder decodes the encoded states, using spatial attention to emphasize the influence of neighboring vehicles and predict the future trajectory of the target vehicle.

This modular design ensures that both temporal and spatial interactions are effectively modeled, resulting in more accurate and reliable trajectory predictions.

3.2. Spatial and Temporal Representation for Target and Neighbor Vehicles

The traditional rasterized representations of maps and vehicles can lead to high-dimensional data, especially in large-scale environments. However, the efficiency of data processing is crucial in trajectory prediction workload as it can significantly impact the overall processing latency. To improve the efficiency of handling the geometric data, we utilize the vectorized representation [12] to process vehicle trajectory information. The benefits of vectorized representation include the following: it reduces computational complexity and data redundancy by only focusing on the relevant features.

Each vehicle in the observed area has its associated spatial and motion features, including position

(x, y)

and maneuver information

(u, v)

. These features are normalized and concatenated into the vectorized format. These features for the i-th vehicle at a specific time t are expressed as the following vector:

p_{t}^{i} = [x_{t}^{i}, y_{t}^{i}, u_{t}^{i}, v_{t}^{i}],

(1)

where

x_{t}^{i}

and

y_{t}^{i}

are the spatial coordinates, and

u_{t}^{i}

denotes the longitudinal maneuver, such as acceleration or deceleration, while

v_{t}^{i}

denotes the lateral maneuver, such as lane change, for the i-th vehicle at time t.

Our proposed model also takes advantage of spatial features. For grid information, spatial relationships such as the vehicle position within a discretized grid cell are used as additional features. As shown in Figure 1, the area surrounding the target vehicle is divided into a

3 \times 13

grid. The rows of the grid correspond to the left, current, and right lanes relative to the target vehicle, while the columns represent grid cells, each with a width of 4.6 m. Neighboring vehicles, excluding the target vehicle, are located within this

3 \times 13

grid and assigned unique grid cells based on their positions.

It should be noted that not all grid cells contain neighboring vehicles, and some cells may hold irrelevant or missing information. We use a mask matrix to indicate whether there is a vehicle in the corresponding cell of the

3 \times 13

grid. The mask matrix at the time step t is expressed as follows:

A_{t} [i] [j] = \{\begin{matrix} 1 & if a vehicle exists \\ 0 & otherwise \end{matrix}

(2)

By applying a mask matrix, the model can avoid incorporating noise or non-existent data from empty grid cells, which can negatively affect the predictions. Additionally, the mask matrix enables the selective weighting of contributions from different cells, allowing the model to emphasize more critical interactions, such as those involving vehicles in closer proximity or directly influencing the target vehicle’s motion. This selective approach ensures computational efficiency and enhances the prediction accuracy by focusing resources on relevant spatial dependencies.

3.3. Temporal Attention-Enhanced LTC Encoder for Trajectory Fusion

In trajectory forecasting tasks, understanding the temporal evolution of the target vehicle’s motion and its interactions with surrounding vehicles is critical. Existing methods often fail to efficiently capture long-term dependencies and prioritize relevant historical information, which limits their predictive accuracy. To overcome this, our proposed Temporal Attention-enhanced LTC Encoder leverages LTC cells [16] to model continuous-time dynamics. Meanwhile, it integrates a temporal attention mechanism to selectively focus on the important time steps.

Figure 2 illustrates the flow of the Temporal Attention-enhanced LTC Encoder module in the proposed Attn-LTC model. The module accepts the vectorized history trajectories of vehicles in the spatial grid for the past t time steps from the previous data preprocessing and vectorization module. Once the vehicle trajectories are converted into the vectorized representation, a multi-layer perceptron (MLP) layer is used to map the raw input data into a higher-dimensional latent space with meaningful features. This latent space representation helps capture more complex patterns and dependencies. The embedding transformation is performed using an MLP represented by the function

ϕ (\cdot)

as follows:

e_{t}^{i} = ϕ (x_{t}^{i}, W_{e}) = ReLU (W_{e} x_{t}^{i} + b_{e}),

(3)

where

W_{e} \in R^{d \times c}

is the weight matrix of the FC layer.

b_{e} \in R^{d}

is the bias vector. d is the dimensionality of the embedding space. c is the size of the input feature vector (

c = 4

for

[x, y, u, v]

). The output

e_{t}^{i}

is a d-dimensional vector for the i-th vehicle, representing its latent embedding at the time step t. It will be used in downstream modules (e.g., attention mechanisms or prediction models). The MLP embedding allows for nonlinear transformations, enabling the model to learn complex interactions between features. By embedding both vehicle-specific and grid-specific information into a shared latent space, the model can better capture spatial relationships and interactions.

As shown in Figure 2, the second step is then feeding the embeddings generated from the MLP layers into the LTC cells. This step is meant to model temporal dependencies and generate hidden states

{h_{t - T + 1}, h_{t - T + 2}, \dots, h_{t}}

and

h_{t} \in R^{d}

for the history trajectories of vehicles in the spatial grid. The LTC model [16] is derived as an improved version of the original Continuous-Time Recurrent Neural Network (CT-RNN). The standard CT-RNN formulates the evolution of the hidden state as follows:

\frac{d x (t)}{d t} = - \frac{x (t)}{τ} + f (x (t), I (t), t, θ),

(4)

where

x (t)

is the hidden state while

I (t)

denotes the input. t represents the time, and the function f is parametrized by

θ

. Finally,

- \frac{x (t)}{τ}

introduces a stabilizing time-constant

τ

that drives the hidden state toward equilibrium of the autonomous system.

The authors in [16] exploit the hidden state flow of a network declared by a system of linear ordinary differential equations (ODEs). Then, Equation (4) is converted into the following format:

\frac{d x (t)}{d t} = - [\frac{1}{τ} + f (x (t), I (t), t, θ)] x (t) + f (x (t), I (t), t, θ) A,

(5)

where the new formulation enables the network to dynamically adapt its response to the temporal characteristics of the input, making LTCs particularly effective for tasks with varying time dependencies. This adaptive time-constant property allows LTC networks to outperform traditional CT-RNNs and other time-series models in expressivity and stability [16], as they can better capture dynamic temporal patterns in the input data.

For the implementation, the authors in [16] propose a discrete-time approximation for the efficient solving. When solved using a fused ODE solver, the state update can be expressed as follows:

x (t + ∆ t) = \frac{x (t) + ∆ t f (x (t), I (t), t, θ) A}{1 + ∆ t (\frac{1}{τ} + f (x (t), I (t), t, θ))} .

(6)

The internal architecture of each LTC cell is depicted in Figure 3. The input time-series signals are first processed by the input neurons, which feed into the liquid layer. Within the liquid layer, neurons are interconnected with dynamic pathways (shown as arrows). The LTC network is built around neurons with adaptable time constants, which govern the rate at which each neuron responds to changes in input. The output neurons aggregate the processed information.

Unlike traditional neural networks that primarily adjust weights during learning, LTCs optimize both weights and time constants. This distinctive design enables the network to dynamically modulate its response, tailoring its behavior to the temporal patterns present in the input data. Such adaptability makes LTCs particularly effective for tasks involving complex temporal dependencies.

After the LTC cells generate the hidden states

{h_{t - T + 1}, h_{t - T + 2}, \dots, h_{t}}

for the history trajectories, these hidden states are further refined by the temporal attention module, consisting of fully connected (FC) layers, a Tanh activation, and a Softmax function. The temporal attention module assigns attention weights to prioritize influential time steps. The attention mechanism computes the influence of historical trajectories on the future motion of the target vehicle or its neighbors. Given the hidden states

H_{t}^{v} = {h_{t - T + 1}^{v}, \dots, h_{t}^{v}}

of a vehicle v, the temporal attention weights

A_{t}^{v}

are calculated using the following equation:

A_{t}^{v} = Softmax (W_{t 2} \tanh (W_{t 1} H_{t}^{v})),

(7)

where

W_{t 1}

and

W_{t 2}

are learnable weights that determine the importance of each historical time step for predicting the future trajectory. The

Softmax (\cdot)

function is denoted as follows:

Softmax (z_{i}) = \frac{e^{z_{i}}}{\sum_{j = 1}^{N} e^{z_{j}}},

(8)

where

z_{i}

is the i-th input to the Softmax function while N is the total number of inputs.

Finally, the following encoded temporal context enriched with spatial information is output for further processing:

G_{t}^{v} = A_{t}^{v} H_{t}^{v} .

(9)

This module ensures that the model effectively generate rich, context-aware feature embeddings that capture both short-term and long-term temporal dynamics, resulting in precise and reliable trajectory predictions.

3.4. Spatial Attention-Enhanced LTC Decoder for Trajectory Prediction

Accurate prediction of future trajectories relies not only on temporal dependencies but also on understanding the spatial relationships between vehicles. Traditional methods often struggle to effectively weigh the relative influence of neighboring vehicles in dynamic environments. The other key part of Attn-LTC is called the Spatial Attention-enhanced LTC Decoder, which is used for decoding the encoded temporal context with spatial information from the Temporal Attention-enhanced LTC Encoder. Meanwhile, the module is used to generate the predicted trajectories. Previous work, namely, ADAPT [24], presents a customized head-adapt model that shows significant improvement in trajectory prediction accuracy over other popular methods [25,26,27,28] while maintaining low computational complexity. However, the hand-crafted prediction model suffers from poor scalability to large-scale problems.

To address the mentioned challenges, the proposed decoder module incorporates spatial attention mechanisms to enhance the predictions by prioritizing relevant spatial information and combining it with encoded temporal context. Figure 4 presents the algorithmic flow of the Spatial Attention-enhanced LTC Decoder module in the proposed Attn-LTC model. It consists of two basic modules: the spatial attention module and the LTC-based decoder. The Spatial Attention-enhanced LTC Decoder receives the encoded temporal context enriched with spatial information obtained from the Temporal Attention-enhanced Encoder module. In the first step, the spatial attention mechanism computes attention weights for neighboring vehicles to identify their influence on the target vehicle’s motion. The spatial attention module is similar to the temporal attention. The spatial attention weights

A_{s}^{v}

are calculated using the following equation:

A_{s}^{v} = Softmax (W_{s 2} \tanh (W_{s 1} G_{t}^{v})),

(10)

where

W_{s 1}

and

W_{s 2}

are learnable weights that determine the importance of each vehicle for predicting the future trajectory.

The combined spatial-temporal representation

J_{t}^{v}

is computed as follows:

J_{t}^{v} = A_{s}^{v} G_{t}^{v} .

(11)

The resulting encoded spatial–temporal context

J_{t}^{v}

is then passed into a series of LTC cells, which predict the future trajectory points

(x_{t + 1}, y_{t + 1}), (x_{t + 2}, y_{t + 2}), \dots

. This step-by-step decoding process ensures that both spatial and temporal dependencies are seamlessly integrated, allowing the model to generate precise and reliable trajectory predictions in multi-vehicle environments.

3.5. Training

The training process of our proposed Attn-LTC model basically follows the conventional approaches similar to other recurrent neural networks. Specifically, the model is trained end-to-end using gradient-based methods such as stochastic gradient descent (SGD). Our proposed Attn-LTC network’s output is compared to the ground truth using the Mean-Square Error (MSE) loss. This is because MSE is suitable for trajectory prediction because it emphasizes penalizing larger errors by squaring the differences between predicted and actual values. This property aligns with the need for accurate trajectory outputs, ensuring that long-range predictions remain precise even for challenging scenarios. For a batch of size N, the loss function is computed as the average loss over all vehicle samples in the batch as follows:

L = \frac{1}{T} \frac{1}{N} \sum_{j = 1}^{N} \sum_{i = 1}^{T} {({\hat{x}}_{i}^{j} - x_{i}^{j})}^{2} + {({\hat{y}}_{i}^{j} - y_{i}^{j})}^{2},

(12)

where T is the number of prediction time steps.

(x_{i}^{j}, y_{i}^{j})

represents the ground truth positions of j-th input at time step i, and

({\hat{x}}_{i}, {\hat{y}}_{i})

represents the predicted position of j-th input at time step i.

We also incorporate unique optimization schemes [16] due to the continuous-time nature of LTCs. This involves calculating gradients through time, optimizing parameters, and ensuring stability in time-series modeling. The continuous nature of LTCs requires gradients to be computed through time. This is achieved by using Back-propagation Through Time (BPTT) [16], which unfolds the computational graph over discrete time steps. The gradients are calculated for each parameter, including weights, biases, and time constants, using the chain rule. To ensure stable training, the training for the proposed Attn-LTC model leverages the following schemes:

Bounded Dynamics: Normalized weights inside the LTC cells are used to guarantees of numerical stability during inference and training. This mechanism ensures that the hidden states remain within finite ranges even for long sequences.
Gradient Clipping: The recursive calculation of BPTT leads to explosive gradients, leading to instability in training. By capping gradients at a specified threshold, it ensures stable and efficient optimization, especially for long sequences.

4. Experiments

In this section, the proposed classification algorithms based on hierarchical vision transformers are explained and investigated. First, the datasets used in this research are described in depth. The subsequent contents illustrate some metrics and the experimental outcome of this technique.

4.1. Dataset Specifications

In this study, our evaluation is performed on the state-of-the-art NGSIM (Next Generation Simulation) dataset [17]. The NGSIM dataset is a comprehensive dataset for evaluating trajectory prediction models in autonomous driving and traffic analysis. Collected by the Federal Highway Administration, it provides high-resolution vehicle trajectory data from real-world traffic scenarios. The dataset encompasses detailed trajectory information for multiple vehicles, recorded at a sampling rate of 10 Hz, across various driving behaviors such as lane changes, merges, and car-following scenarios. Data collection sites include segments of the I-80 and US-101 highways, representing diverse road geometries and traffic densities. The NGSIM dataset comprises 25 variables with over 11.8 million data points, offering granular kinematic and positional data. This level of detail makes it particularly well suited for studying interaction-aware trajectory prediction and validating models under complex driving conditions.

Following the approach outlined in prior work [11], the trajectory data were downsampled from 10 Hz to 5 Hz. The two datasets mentioned above were merged into a single dataset, which was then randomly shuffled and split into training, validation, and test sets in a 7:1:2 ratio. All subsequent experimental evaluations were performed on the test set. The code for data preprocessing and dataset partitioning is available on GitHub at https://github.com/nachiket92/conv-social-pooling (accessed on 15 November 2024).

4.2. Evaluation Metrics

To evaluate the performance of the proposed trajectory prediction model, we assess the prediction outputs of the trained models based on the following steps: Similar to previous works [15,29], we use Root Mean Squared Error (RMSE) for evaluating the accuracy of trajectory prediction models. It measures the average magnitude of the error between the predicted trajectory and the ground truth trajectory over a defined prediction horizon. For a prediction horizon of 5 s, RMSE is calculated as follows:

RMSE = \sqrt{\frac{1}{T} \sum_{i = 1}^{T} {({\hat{x}}_{i} - x_{i})}^{2} + {({\hat{y}}_{i} - y_{i})}^{2}},

(13)

where T is the number of prediction time steps within the 5 s horizon,

(x_{i}, y_{i})

represents the ground truth position at time step i, and

({\hat{x}}_{i}, {\hat{y}}_{i})

represents the predicted position at time step i. RMSE provides an interpretable metric that reflects the overall prediction accuracy of the model, enabling a direct comparison across different approaches. The metric emphasizes larger errors due to its squaring operation, making it sensitive to significant deviations that could impact safety.

4.3. Experimental Environment and Training Configurations

The training process was conducted on a workstation equipped with a NVIDIA RTX 3090 GPU with 24 GB of VRAM, an Intel Core i9-12900K CPU with 16 cores, and 64 GB of RAM. The system ran on Ubuntu 20.04 with Python 3.10 and PyTorch 2.0, ensuring compatibility with the implemented deep learning framework. The proposed Attn-LTC algorithm was implemented using PyTorch and the code provided in [16].

In our study, we use

d = 64

as the embedding dimension of MLP. The models are trained for 10 epochs with a batch size set to 1024. For optimization, we employed the Adam optimizer, starting with learning rates of

1 \times 10^{- 3}

4.4. Baselines

We thoroughly assess the performance of the proposed Attn-LTC model by comparing it against a diverse range of state-of-the-art models that employ different methodologies for trajectory prediction. These include techniques utilizing spatial–temporal attention mechanisms, convolutional pooling layers, and advanced architectures like Transformer-based networks. By comparing Attn-LTC with these methods, we aim to highlight its advantages, such as improved accuracy and computational efficiency. Specifically, we compare the proposed model with the following baselines:

1.: Constant velocity (CV): The model uses a vehicle’s constant speed for trajectory prediction.
2.: LSTM with fully connected social pooling (S-LSTM) [11]: The model incorporates a social pooling layer to capture interactions among individuals and predict future trajectories in crowded spaces.
3.: LSTM with convolutional social pooling (CS-LSTM) [12]: An LSTM encoder–decoder model with a convolutional social pooling layer to improve interaction modeling between vehicles, combined with maneuver-based trajectory prediction for robust and multi-modal future predictions.
4.: Multi-head attention LSTM (MHA-LSTM) [22]: The model leverages multi-head attention to capture higher-order interactions among vehicles and predict multi-modal trajectories, enabling long-range dependency modeling and accurate motion forecasting.
5.: Dynamic and static context-aware attention network (DSCAN) [14]: The algorithm models inter-vehicle interactions using attention mechanisms and incorporates static environmental constraints for improved trajectory prediction accuracy.
6.: Spatial interaction-aware Transformer (SIT) [15]: The model integrates temporal dependencies and spatial interactions through multi-head self-attention modules for precise long-term trajectory predictions.
7.: Dual learning model (DLM) [29]: The model uses Occupancy Maps and Risk Maps in an encoder–decoder structure to capture inter-vehicle interactions and risk-based spatial relationships for accurate and efficient trajectory predictions.
8.: Spatial–temporal attentive LSTM (STAM-LSTM) [13]: The model employs spatial and temporal attention mechanisms to extract critical features from historical trajectories for enhanced vehicle trajectory prediction.

4.5. Ablation Study

In this section, we conduct a comprehensive ablation study to evaluate the performance of the proposed Attn-LTC model under various configurations and scenarios. The goal of this study is to analyze the impact of key design factors, such as the number of LTC neurons, lane variability, and dataset-specific characteristics, on the model’s trajectory prediction accuracy. By systematically varying these parameters, we aim to identify the optimal configuration and highlight the robustness of the Attn-LTC model across different settings. Additionally, this study provides valuable insights into the model’s ability to generalize to diverse traffic conditions and its sensitivity to specific design choices, offering a deeper understanding of its performance in practical applications.

4.5.1. Ablation Experiments on Model Size

Table 1 presents the ablation study to analyze the impact of the number of LTC neurons on the RMSE performance of the proposed Attn-LTC model. The prediction horizons range from 1 s to 5 s. The results show that increasing the number of LTC neurons improves accuracy significantly, particularly for longer prediction horizons. For example, reducing the number of neurons to eight (Attn-LTC-8) results in the highest RMSE values across all time steps, such as 0.64 m at 1 s and 3.81 m at 5 s. In comparison, Attn-LTC models with 16, 24, and 32 neurons exhibit much lower RMSE values, with Attn-LTC-24 achieving the lowest errors at the critical longer horizons of 3 s to 5 s.

Interestingly, the performance gain diminishes as the number of neurons increases beyond 24. For example, Attn-LTC-32 achieves nearly identical RMSE values to Attn-LTC-24, particularly at the 3 to 5 s prediction horizons (e.g., 1.60 vs. 1.57 m at 3 s). This indicates that the addition of neurons beyond 24 does not yield significant improvements. The results show that Attn-LTC with 24 neurons strikes the best balance between model complexity and prediction accuracy, making Attn-LTC-24 the optimal configuration for the studied task. Therefore, we choose this configuration in the following analysis.

4.5.2. Ablation Experiments on LSTM and LTC

Figure 5 visualizes the training loss curves for Attn-LTC and Attn-LSTM models under different configurations, focusing on the impact of neuron count. In Figure 5a, the Attn-LTC model is evaluated with varying numbers of neurons: 8, 16, 24, 32, and 64. The results show that the training loss decreases significantly as the number of neurons increases, particularly during the initial training iterations. The Attn-LTC-64 model shows the fastest convergence and lowest final training loss. In comparison, Attn-LTC-8 converges more slowly and stabilizes at a higher loss. However, the differences between Attn-LTC-24, Attn-LTC-32, and Attn-LTC-64 diminish after convergence, indicating that increasing the number of neurons beyond 24 provides marginal benefits in reducing the training loss. This suggests that the model reaches an optimal capacity with 24 to 32 neurons for the evaluated task.

Figure 5b compares the training loss of Attn-LTC and Attn-LSTM models with 8, 16, and 24 neurons. Across all configurations, the Attn-LTC model consistently achieves lower training losses than the Attn-LSTM model, highlighting the efficiency and representational power of LTC dynamics. The loss difference is most significant when using 24 neurons, where Attn-LTC shows a sharper decline in loss during the initial iterations and stabilizes at a lower value compared to Attn-LSTM. This is because Attn-LTC leverages its dynamic time-constant mechanism to capture temporal dependencies more effectively than the LSTM architecture. Overall, the figure demonstrates the superiority of the Attn-LTC model in training efficiency and accuracy, particularly when an optimal number of neurons are used.

Figure 6 illustrates the RMSE performance of the proposed Attn-LTC model and LSTM counterparts across varying numbers of neurons (8, 16, 24, and 32) and prediction horizons (1 to 5 s). The results show the superior performance of the Attn-LTC model compared to the LSTM across all horizons and neuron configurations. For shorter prediction horizons (1 and 2 s), the differences between Attn-LTC and LSTM are less significant. Both models achieve competitive RMSE values as the number of neurons increases. However, Attn-LTC consistently outperforms LSTM, particularly as the number of neurons increases to 24 and 32, suggesting its better ability to model short-term dependencies.

For longer prediction horizons (3 to 5 s), the advantages of the Attn-LTC model become more apparent. Our proposed Attn-LTC model demonstrates better robustness when it comes to handling long-term trajectory prediction, providing lower RMSE values across all neuron configurations compared to the LSTM. Attn-LTC models with 24 and 32 neurons yield the best results, with diminishing returns observed beyond 24 neurons. These findings highlight the efficacy of the Attn-LTC model in achieving superior accuracy for trajectory prediction tasks while efficiently utilizing its computational resources.

We evaluate the parameter efficiency of the proposed Attn-LTC model compared to the Attn-LSTM baseline. This experiment aims to determine the computational cost of each model. Figure 7 shows the total number of parameters in the Attn-LTC and Attn-LSTM models for different neuron configurations (8, 16, 24, and 32 neurons). It is evident that while both models experience an increase in parameters as the number of neurons grows, the Attn-LSTM model consistently has a higher parameter count, particularly in the encoder and decoder components. This highlights the parameter efficiency of the Attn-LTC model, which achieves better performance with significantly fewer parameters. Such efficiency is crucial for deploying models in resource-constrained environments, as it reduces computational and memory requirements without sacrificing accuracy.

Figure 8 illustrates the inference latency of the proposed Attn-LTC models with varying numbers of neurons (8, 16, 24, and 32). The latency is measured on a mobile Intel CPU. The results show that the inference latency increases slightly with the number of neurons, ranging from approximately 15 ms for Attn-LTC-8 to around 20 ms for Attn-LTC-32. This low latency demonstrates the computational efficiency of the Attn-LTC model, making it highly suitable for real-time applications like autonomous driving. In such scenarios, rapid and accurate trajectory predictions are crucial for enabling timely decision-making, such as collision avoidance, lane changes, and navigation in dynamic traffic environments. The balance between latency and accuracy achieved by the Attn-LTC model underscores its practicality for deployment in latency-sensitive systems.

4.6. Comparison Results

Table 2 compares the RMSE values for various models over different prediction horizons. The prediction horizons range from 1 s to 5 s, and the RMSE values are expressed in meters. The proposed Attn-LTC model demonstrates significant performance improvements over most of the state-of-the-art methods, with consistently lower RMSE values across all prediction horizons except for DLM and STAM-LSTM, which outperforms Attn-LTC at the 1 s and 2 s marks. However, the Attn-LTC model achieves better results at longer horizons (3 s to 5 s), indicating its superior ability to model long-term dependencies.

Notably, traditional models like CV show the highest RMSE values due to the limitations of simpler approaches in handling complex trajectory prediction tasks. Among deep learning-based methods, Attn-LTC and DLM exhibit superior performance, with Attn-LTC excelling in the critical 3 s to 5 s range, achieving the lowest RMSE values of 1.57, 2.33, and 3.24 m, respectively. This highlights the effectiveness of the attention mechanism and LTC networks in improving prediction accuracy over longer time horizons, which are particularly important for safety-critical applications like autonomous driving.

Figure 9 presents the RMSE values for the proposed Attn-LTC model across different lanes and prediction horizons (1 to 5 s). The results reveal that the RMSE increases consistently with the prediction horizon across all lanes. This reflects the challenge of maintaining high accuracy for longer-term predictions. Lanes 1 and 6 exhibit the highest RMSE values across all lanes, suggesting that trajectory predictions in these outer lanes are more challenging. This could be attributed to the greater variability in vehicle dynamics and interactions near lane boundaries.

In contrast, the RMSE values for lanes 3, 4, and 5 are comparatively lower, indicating that the Attn-LTC model performs more effectively in central lanes. This could be attributed to the relative stability and predictability of vehicle trajectories in these lanes, where interactions with neighboring vehicles and lane-changing behaviors are less complex. These findings demonstrate the effectiveness of the proposed Attn-LTC model in central lanes while suggesting potential areas for improvement in handling edge cases such as boundary lanes, where higher variability poses greater challenges for trajectory prediction.

Figure 10 illustrates the RMSE values of the proposed Attn-LTC model on the NGSIM dataset across different road segments (US101 and I80) and times of day. The prediction horizons range from 1 to 5 s. The RMSE for a 1 s prediction remains below 1 m across all scenarios, while the 5 s prediction exhibits the highest RMSE, exceeding 3 m in most cases. Comparing the road segments, the performance on the I80 segment at 16:00–16:15 shows the highest RMSE. This indicates the challenges in predicting vehicle trajectories during this period because of more complex traffic dynamics. The Attn-LTC model achieves slightly better performance on the US101 dataset across shorter prediction horizons (1 to 3 s). This could be due to the differences in traffic flow characteristics, with US101 exhibiting more structured vehicle behavior. However, the I80 dataset shows larger errors for longer horizons (4 to 5 s), which might be due to denser or more variable traffic conditions. Overall, the Attn-LTC model demonstrates robust performance across diverse datasets and conditions, but the variation in RMSE highlights the influence of specific traffic scenarios and time periods on prediction accuracy, suggesting potential areas for further optimization or dataset-specific tuning.

Figure 11 illustrates the RMSE values across different prediction horizons (1 to 5 s) while varying the number of neighboring vehicles considered in the prediction. Here, the proposed Attn-LTC model uses 24 neurons because it achieves the best trade-off between performance and complexity. As the number of neighboring vehicles increases, the RMSE values steadily decrease. For short prediction horizons, such as 1 and 2 s, the proposed model achieves low RMSE values even with a smaller number of neighbors. This suggests that the immediate context provided by neighboring vehicles is sufficient for short-term trajectory predictions. However, as the prediction horizon extends to 3, 4, and 5 s, incorporating a larger number of neighboring vehicles becomes increasingly important to maintain accuracy. For instance, at a 5 s horizon, the RMSE decreases significantly when the number of neighbors increases from 1 to 15. This emphasizes the role of distant vehicles in influencing long-term trajectory predictions. Beyond 15 neighbors, the improvement becomes marginal. This highlights our Attn-LTC model’s ability to balance computational efficiency with predictive performance by prioritizing the most relevant interactions, making it suitable for real-time deployment in dynamic, multi-agent environments.

4.7. Visualization and Qualitative Analysis

Figure 12 displays the temporal attention weights over 16 time steps for different lanes, reflecting the significance of historical trajectory information in predicting future behavior. Initially, the temporal attention weights remain close to zero across all lanes, showing that earlier time steps contribute minimally to the predictions. This pattern persists until around the 10th time step, where the weights begin to gradually increase. The temporal attention mechanism assigns more importance to recent trajectory information as it approaches the present moment, which is critical for making accurate short-term predictions. The most notable increase in temporal attention weights occurs after the 14th time step. This sharp increase highlights the importance of the final few time steps in determining the trajectory prediction. While the attention weights vary slightly among the lanes, they generally follow the same trend. The proposed temporal attention mechanism is robust and consistent across different traffic lanes. The similarity across lanes may also indicate that temporal dependencies dominate over spatial variations for this specific task, emphasizing the model’s ability to focus on the most relevant recent information for accurate trajectory predictions.

Figure 13 visualizes the wiring structure of the Attn-LTC model with varying configurations of motor, sensory, and inter neurons (8, 16, and 24 motor neurons). As the number of motor neurons increases, the connectivity becomes significantly denser, reflecting a more complex and interconnected network capable of capturing intricate patterns in the input data. In Figure 13a, with only eight motor neurons, the wiring is sparse, limiting the capacity of the model to process complex dependencies. Figure 13b,c, with 16 and 24 motor neurons, respectively, exhibit progressively richer interconnections, particularly between sensory neurons (green) and inter neurons (blue). This denser wiring in higher configurations facilitates greater representational power, allowing the model to effectively encode and process detailed temporal and spatial dependencies. The balance between motor, sensory, and inter neuron connections also suggests a deliberate design to optimize information flow while avoiding excessive complexity.

5. Discussion and Conclusions

Trajectory prediction is a cornerstone of autonomous driving [1] and intelligent transportation systems [2]. By forecasting the future trajectories of vehicles, it enables collision avoidance, safe lane changes, and efficient route planning. However, achieving accurate trajectory prediction in dynamic environments is challenging due to the complexity of spatial and temporal dependencies among vehicles. Existing models often face limitations such as high computational cost and large parameter sizes, which hinder real-time deployment in resource-constrained systems. Addressing these challenges requires innovative algorithms that balance predictive accuracy with computational efficiency.

In this work, we proposed an Attention-enhanced Liquid Time-Constant (Attn-LTC) model to address the aforementioned challenges. Our model integrates LTC networks [16] with temporal and spatial attention mechanisms to capture both long-term temporal dependencies and dynamic spatial interactions. Extensive experiments on the NGSIM dataset [17] demonstrate that the proposed Attn-LTC model achieves superior accuracy with significantly fewer parameters compared to traditional LSTM-based methods [10,15]. Notably, the model’s lightweight design makes it particularly suitable for real-time applications in embedded systems and autonomous vehicles.

There are several potential extensions and avenues for future research. First, the adaptability of the Attn-LTC model can be further enhanced by incorporating uncertainty quantification to better handle unpredictable traffic scenarios. Second, integrating multi-modal data such as visual and LiDAR inputs could improve prediction robustness in complex environments. Finally, exploring domain adaptation techniques would enable the model to generalize effectively across diverse traffic conditions and datasets, enhancing its applicability in global autonomous driving systems.

In summary, this work contributes a novel parameter-efficient trajectory prediction model that balances computational efficiency and predictive performance. By leveraging LTC networks and attention mechanisms, the Attn-LTC model demonstrates its potential as a scalable solution for real-time trajectory prediction. Future efforts should focus on extending its capabilities to handle more diverse and challenging traffic scenarios, paving the way for its broader adoption in intelligent transportation systems.

Author Contributions

Conceptualization, Ruochen Wang and Yue Chen; methodology, Yue Chen; software, Yue Chen; validation, Ruochen Wang and Renkai Ding; formal analysis, Yue Chen; investigation, Ruochen Wang; resources, Renkai Ding; data curation, Yue Chen; writing—original draft preparation, Yue Chen; writing—review and editing, Renkai Ding; visualization, Yue Chen; supervision, Renkai Ding; project administration, Qing Ye. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Publicly available datasets were used in this study. The NGSIM dataset can be found at https://ops.fhwa.dot.gov/trafficanalysistools/ngsim.htm (accessed on 15 November 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Nayakanti, N.; Al-Rfou, R.; Zhou, A.; Goel, K.; Refaat, K.S.; Sapp, B. Wayformer: Motion forecasting via simple & efficient attention networks. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; IEEE: New York, NY, USA, 2023; pp. 2980–2987. [Google Scholar]
Li, H.; Xing, W.; Jiao, H.; Yuen, K.F.; Gao, R.; Li, Y.; Matthews, C.; Yang, Z. Bi-directional information fusion-driven deep network for ship trajectory prediction in intelligent transportation systems. Transp. Res. Part E Logist. Transp. Rev. 2024, 192, 103770. [Google Scholar] [CrossRef]
Song, H.; Ding, W.; Chen, Y.; Shen, S.; Wang, M.Y.; Chen, Q. Pip: Planning-informed trajectory prediction for autonomous driving. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXI 16. Springer: New York, NY, USA, 2020; pp. 598–614. [Google Scholar]
Xie, X.; Zhang, C.; Zhu, Y.; Wu, Y.N.; Zhu, S.C. Congestion-aware multi-agent trajectory prediction for collision avoidance. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; IEEE: New York, NY, USA, 2021; pp. 13693–13700. [Google Scholar]
Biktairov, Y.; Stebelev, M.; Rudenko, I.; Shliazhko, O.; Yangel, B. PRANK: Motion Prediction based on RANKing. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020. [Google Scholar]
Casas, S.; Luo, W.; Urtasun, R. IntentNet: Learning to Predict Intention from Raw Sensor Data. In Proceedings of the 2nd Conference on Robot Learning, Zürich, Switzerland, 29–31 October 2018. [Google Scholar]
Cui, H.; Radosavljevic, V.; Chou, F.C.; Lin, T.H.; Nguyen, T.; Huang, T.K.; Schneider, J.; Djuric, N. Multimodal trajectory predictions for autonomous driving using deep convolutional networks. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; IEEE: New York, NY, USA, 2019; pp. 2090–2096. [Google Scholar]
Djuric, N.; Radosavljevic, V.; Cui, H.; Nguyen, T.; Chou, F.C.; Lin, T.H.; Singh, N.; Schneider, J. Uncertainty-aware Short-term Motion Prediction of Traffic Actors for Autonomous Driving. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Snowmass, CO, USA, 1–5 March 2020. [Google Scholar]
Zeng, W.; Liang, M.; Liao, R.; Urtasun, R. Lanercnn: Distributed representations for graph-centric motion forecasting. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; IEEE: New York, NY, USA, 2021; pp. 532–539. [Google Scholar]
Altché, F.; de La Fortelle, A. An LSTM network for highway trajectory prediction. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; IEEE: New York, NY, USA, 2017; pp. 353–359. [Google Scholar]
Alahi, A.; Goel, K.; Ramanathan, V.; Robicquet, A.; Fei-Fei, L.; Savarese, S. Social lstm: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 961–971. [Google Scholar]
Deo, N.; Trivedi, M.M. Convolutional social pooling for vehicle trajectory prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1468–1476. [Google Scholar]
Jiang, R.; Xu, H.; Gong, G.; Kuang, Y.; Liu, Z. Spatial-temporal attentive LSTM for vehicle-trajectory prediction. ISPRS Int. J. Geo-Inf. 2022, 11, 354. [Google Scholar] [CrossRef]
Yu, J.; Zhou, M.; Wang, X.; Pu, G.; Cheng, C.; Chen, B. A dynamic and static context-aware attention network for trajectory prediction. ISPRS Int. J. Geo-Inf. 2021, 10, 336. [Google Scholar] [CrossRef]
Li, X.; Xia, J.; Chen, X.; Tan, Y.; Chen, J. SIT: A spatial interaction-aware transformer-based model for freeway trajectory prediction. ISPRS Int. J. Geo-Inf. 2022, 11, 79. [Google Scholar] [CrossRef]
Hasani, R.; Lechner, M.; Amini, A.; Rus, D.; Grosu, R. Liquid time-constant networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 2–9 February 2021; Volume 35, pp. 7657–7666. [Google Scholar]
Department of Transportation Federal Highway Administration. Next Generation Simulation (NGSIM) Vehicle Trajectories and Supporting Data. Available online: https://ops.fhwa.dot.gov/trafficanalysistools/ngsim.htm (accessed on 15 November 2024).
Dong, H.; Zhuang, W.; Chen, B.; Yin, G.; Wang, Y. Enhanced eco-approach control of connected electric vehicles at signalized intersection with queue discharge prediction. IEEE Trans. Veh. Technol. 2021, 70, 5457–5469. [Google Scholar] [CrossRef]
Dong, H.; Hu, Q.; Li, D.; Li, Z.; Song, Z. Predictive Battery Thermal and Energy Management for Connected and Automated Electric Vehicles. IEEE Trans. Intell. Transp. Syst. 2024, 1–13. [Google Scholar] [CrossRef]
Dong, H.; Wang, Q.; Zhuang, W.; Yin, G.; Gao, K.; Li, Z.; Song, Z. Flexible eco-cruising strategy for connected and automated vehicles with efficient driving lane planning and speed optimization. IEEE Trans. Transp. Electrif. 2024, 10, 1530–1540. [Google Scholar] [CrossRef]
Lin, L.; Li, W.; Bi, H.; Qin, L. Vehicle trajectory prediction using LSTMs with spatial–temporal attention mechanisms. IEEE Intell. Transp. Syst. Mag. 2021, 14, 197–208. [Google Scholar] [CrossRef]
Messaoud, K.; Yahiaoui, I.; Verroust-Blondet, A.; Nashashibi, F. Attention based vehicle trajectory prediction. IEEE Trans. Intell. Veh. 2020, 6, 175–185. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
Aydemir, G.; Akan, A.K.; Güney, F. Adapt: Efficient multi-agent trajectory prediction with adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 8295–8305. [Google Scholar]
Wang, M.; Zhu, X.; Yu, C.; Li, W.; Ma, Y.; Jin, R.; Ren, X.; Ren, D.; Wang, M.; Yang, W. Ganet: Goal area network for motion forecasting. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; IEEE: New York, NY, USA, 2023; pp. 1609–1615. [Google Scholar]
Ngiam, J.; Vasudevan, V.; Caine, B.; Zhang, Z.; Chiang, H.T.L.; Ling, J.; Roelofs, R.; Bewley, A.; Liu, C.; Venugopal, A.; et al. Scene transformer: A unified architecture for predicting future trajectories of multiple agents. In Proceedings of the International Conference on Learning Representations, Virtual, 3–7 May 2021. [Google Scholar]
Zhou, Z.; Ye, L.; Wang, J.; Wu, K.; Lu, K. Hivt: Hierarchical vector transformer for multi-agent motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8823–8833. [Google Scholar]
Da, F.; Zhang, Y. Path-aware graph attention for hd maps in motion prediction. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; IEEE: New York, NY, USA, 2022; pp. 6430–6436. [Google Scholar]
Khakzar, M.; Rakotonirainy, A.; Bond, A.; Dehkordi, S.G. A dual learning model for vehicle trajectory prediction. IEEE Access 2020, 8, 21897–21908. [Google Scholar] [CrossRef]

Figure 1. Overall framework of the proposed Attn-LTC model based on Liquid Time-Constant (LTC) networks for trajectory prediction. The proposed algorithm is composed of data preprocessing and vectorization, Temporal Attention-enhanced LTC Encoder, and Spatial Attention-enhanced LTC Decoder modules.

Figure 2. The algorithmic flow of the Temporal Attention-enhanced LTC Encoder module in the proposed Attn-LTC model.

Figure 3. Illustration of the Liquid Time-Constant (LTC) network structure.

Figure 4. The algorithmic flow of the Spatial Attention-enhanced LTC Decoder module in the proposed Attn-LTC model.

Figure 5. Visualization of training loss using various numbers of neurons from 8 to 64. (a) Training loss of proposed Attn-LTC models with 8, 16, 24, 32, or 64 neurons. (b) Training loss of Attn-LTC and Attn-LSTM models with 8, 16, or 24 neurons.

Figure 6. Comparison of RMSE values for the proposed Attn-LTC model and LSTM counterparts. The number of neurons vary from 8 to 32. The prediction horizon ranges from 1 s to 5 s.

Figure 7. Comparison of number of parameters in the proposed Attn-LTC model and Attn-LSTM baselines using various numbers of neurons from 8 to 32.

Figure 8. Inference latency of the proposed Attn-LTC models using various numbers of neurons from 8 to 32.

Figure 9. Comparison of RMSE values using the proposed Attn-LTC model under different lanes.

Figure 10. Comparison of RMSE values using the proposed Attn-LTC model under different roadmaps and time.

Figure 11. Comparison of RMSE values using the proposed Attn-LTC model for different numbers of neighboring vehicles. The number of neurons is 24.

Figure 12. Visualization of the average temporal attention weights across different traffic lanes.

Figure 13. Visualization of wiring patterns for LTC neurons. (a) Attn-LTC with 8 motor neurons and sensory neurons. (b) Attn-LTC with 16 motor neurons and sensory neurons. (c) Attn-LTC with 24 motor neurons and sensory neurons.

Table 1. Comparison of RMSE values for the proposed Attn-LTC model using various LTC neurons from 8 to 32. RMSE values are converted into meters.

# of LTC Neurons	Prediction Horizon
# of LTC Neurons	1 s	2 s	3 s	4 s	5 s
Attn-LTC-8	0.64	1.20	1.86	2.73	3.81
Attn-LTC-16	0.49	0.99	1.60	2.38	3.32
Attn-LTC-24	0.49	0.98	1.57	2.33	3.24
Attn-LTC-32	0.48	0.99	1.60	2.35	3.24

Table 2. Comparison of RMSE values for state-of-the-art models for trajectory prediction. The proposed Attn-LTC model uses 24 neurons. RMSE values are converted into meters. Bold font indicates the best RMSE performance.

Model	Prediction Horizon
Model	1 s	2 s	3 s	4 s	5 s
CV	0.73	1.78	3.13	4.78	6.68
S-LSTM [11]	0.65	1.31	2.16	3.25	4.55
CS-LSTM [12]	0.61	1.27	2.09	3.10	4.37
MHA-LSTM [22]	0.56	1.22	2.01	3.00	4.25
DSCAN [14]	0.58	1.26	2.03	2.98	4.13
SIT [15]	0.58	1.23	1.99	2.96	4.05
DLM [29]	0.41	0.95	1.72	2.64	3.87
STAM-LSTM [13]	0.43	0.96	1.60	2.37	3.24
Attn-LTC (This Work)	0.49	0.98	1.57	2.33	3.24

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Published by MDPI on behalf of the World Electric Vehicle Association. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, R.; Chen, Y.; Ding, R.; Ye, Q. Parameter-Efficient Vehicle Trajectory Prediction Based on Attention-Enhanced Liquid Structural Neural Model. World Electr. Veh. J. 2025, 16, 19. https://doi.org/10.3390/wevj16010019

AMA Style

Wang R, Chen Y, Ding R, Ye Q. Parameter-Efficient Vehicle Trajectory Prediction Based on Attention-Enhanced Liquid Structural Neural Model. World Electric Vehicle Journal. 2025; 16(1):19. https://doi.org/10.3390/wevj16010019

Chicago/Turabian Style

Wang, Ruochen, Yue Chen, Renkai Ding, and Qing Ye. 2025. "Parameter-Efficient Vehicle Trajectory Prediction Based on Attention-Enhanced Liquid Structural Neural Model" World Electric Vehicle Journal 16, no. 1: 19. https://doi.org/10.3390/wevj16010019

APA Style

Wang, R., Chen, Y., Ding, R., & Ye, Q. (2025). Parameter-Efficient Vehicle Trajectory Prediction Based on Attention-Enhanced Liquid Structural Neural Model. World Electric Vehicle Journal, 16(1), 19. https://doi.org/10.3390/wevj16010019

Article Menu

Parameter-Efficient Vehicle Trajectory Prediction Based on Attention-Enhanced Liquid Structural Neural Model

Abstract

1. Introduction

2. Related Literature Study

2.1. Deep Learning Methods for Trajectory Prediction

2.2. Dependency Modeling in Trajectory Prediction

2.2.1. Long Short-Term Memory (LSTM)

2.2.2. Attention Mechanism

3. Proposed Attn-LTC Model for Trajectory Prediction

3.1. Overall Framework

3.2. Spatial and Temporal Representation for Target and Neighbor Vehicles

3.3. Temporal Attention-Enhanced LTC Encoder for Trajectory Fusion

3.4. Spatial Attention-Enhanced LTC Decoder for Trajectory Prediction

3.5. Training

4. Experiments

4.1. Dataset Specifications

4.2. Evaluation Metrics

4.3. Experimental Environment and Training Configurations

4.4. Baselines

4.5. Ablation Study

4.5.1. Ablation Experiments on Model Size

4.5.2. Ablation Experiments on LSTM and LTC

4.6. Comparison Results

4.7. Visualization and Qualitative Analysis

5. Discussion and Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI