Open AccessArticle

EEG-Based Mobile Robot Control Using Deep Learning and ROS Integration

Bianca Ghinoiu

Victor Vlădăreanu

^1,*

Ana-Maria Travediu

^1,*,

Luige Vlădăreanu

¹,

Abigail Pop

¹,

Yongfei Feng

² and

Andreea Zamfirescu

Robotics and Mechatronics Department, Institute of Solid Mechanics of the Romanian Academy, 010141 Bucharest, Romania

Faculty of Mechanical Engineering and Mechanics, Ningbo University, Ningbo 315211, China

Faculty of medicine, University of Medicine and Pharmacy “Carol Davila”, 050474 Bucharest, Romania

Authors to whom correspondence should be addressed.

Technologies 2024, 12(12), 261; https://doi.org/10.3390/technologies12120261 (registering DOI)

Submission received: 29 October 2024 / Revised: 8 December 2024 / Accepted: 12 December 2024 / Published: 14 December 2024

(This article belongs to the Special Issue Advanced Autonomous Systems and Artificial Intelligence Stage)

Download

Browse Figures

Versions Notes

Abstract

Efficient BCIs (Brain-Computer Interfaces) harnessing EEG (Electroencephalography) have shown potential in controlling mobile robots, also presenting new possibilities for assistive technologies. This study explores the integration of advanced deep learning models—ASTGCN, EEGNetv4, and a combined CNN-LSTM architecture—with ROS (Robot Operating System) to control a two-wheeled mobile robot. The models were trained using a published EEG dataset, which includes signals from subjects performing thought-based tasks. Each model was evaluated based on its accuracy, F1-score, and latency. The CNN-LSTM architecture model exhibited the best performance on the cross-subject strategy with an accuracy of 88.5%, demonstrating significant potential for real-time applications. Integration with ROS was facilitated through a custom middleware, enabling seamless translation of neural commands into robot movements. The findings indicate that the CNN-LSTM model not only outperforms existing EEG-based systems in terms of accuracy but also underscores the practical feasibility of implementing such systems in real-world scenarios. Considering its efficacy, CNN-LSTM shows a great potential for assistive technology in the future. This research contributes to the development of a more intuitive and accessible robotic control system, potentially enhancing the quality of life for individuals with mobility impairments.

Keywords:

BCI (Brain-Computer Interface); EEG (Electroencephalogram); MI (Motor Imagery); CNN (Convolutional Neural Network); LSTM (Long Short-Term Memory) networks; mobile robot; control; ROS (Robot Operating System)

1. Introduction

BCIs (Brain Computer Interfaces) represent a promising tool that facilitates the direct communication between the human brain and external devices [1], offering more and more innovative solutions in the realm of assistive technologies [2]. EEG-based (Electroencephalography-based) BCIs, especially, have gained traction due to the possibility of using non-invasive techniques providing the ability to capture neural activity. These captions can be translated into control commands for various devices including mobile robots. On a deeper level, such technology has the potential to significantly improve the quality of life for individuals with mobility impairments by allowing them to control robotic systems using only their thoughts. Considering the recent advancements in DL (Deep Learning) techniques and the flexibility of ROS (Robot Operating System), a great deal of opportunities has emerged to enhance the efficiency and accuracy of EEG-based control systems. This paper explores the integration of state-of-the-art DL models with ROS to develop a robust BCI for controlling mobile robots, and discusses the feasibility of these technologies in real-world assistive applications.

The paper refers only to the non-invasive EEG technology which captures the electrical activity of the brain. The approach used to translate neural signals into actionable commands utilizes an endogenous BCI system. Unlike exogenous BCIs, which depend on external cues or stimuli to elicit brain responses, the endogenous approach involves that the control signals are generated by the user’s voluntary mental activity. As discussed in [3], endogenous systems can rely on thoughts, MI (motor imagery), or cognitive tasks, offering greater autonomy to users by eliminating the need for external triggers. The system presented in this paper uses a MI-based control strategy, requiring users to imagine specific motor movements—such as moving a hand or foot—to generate control signals. When an individual imagines moving a limb, the same brain areas activate as when they are physically moving that limb. This design aligns with the core principles of non-invasive, user-driven interaction, making the system more intuitive and versatile for real-world applications. However, endogenous systems also present some challenges in terms of noise and variability [4], making robust DL models essential for accurate interpretation of the brain’s spontaneous activity.

The advent of deep learning has proven to be a reliable solution to deal with these challenges leading to improved BCIs with complex algorithms capable of extracting relevant features from EEG signals [5]. Schirrmeister et al. [6] explores how CNNs can decode EEG signals, including MI data, and provide insights into how the visualization of learned features can help to understand the brain activity related to MI tasks. Their findings indicate that recent advancements in machine learning have significantly enhanced the performance of deep ConvNets (convolutional networks) in decoding MI EEG data. These improvements have led to decoding accuracies that match or surpass the widely adopted FBCSP (filter bank common spatial patterns) algorithm, with mean accuracies of 84.0% for deep ConvNets compared to 82.1% for FBCSP. Several CNN architectures have been specifically designed for EEG data classification, each with distinct features. EEGNet is a compact and efficient model that uses separable convolutions to capture spatial and temporal EEG patterns [7]. Shallow ConvNet focuses on low-level feature extraction, while Deep ConvNet utilizes multiple layers to capture complex signal patterns [8]. The Multi-Branch 3D CNN processes EEG data in a 3D format, learning from both spatial and temporal dimensions [9]. The Temporal-Spatial Deep ConvNet emphasizes capturing both temporal changes and spatial relationships across EEG channels [10]. EEG-Inception, inspired by the Inception model, applies parallel convolutions with different kernel sizes for multi-scale feature extraction [11]. However, the efficiency of training these deep models heavily relies on the optimization algorithms used. The Adam optimizer [12] is commonly employed due to its adaptive learning rate mechanism, which benefits from combining the strengths of AdaGrad and RMSProp. Despite its popularity, Adam can exhibit high variance in learning rate adaptation. To overcome this, the RAdam (Rectified Adam) optimizer introduces a warm-up mechanism that stabilizes training, leading to faster and more consistent convergence. This makes RAdam particularly effective for CNN like models [13,14].

ASTGCNs (Adaptive Spatiotemporal Graph Convolutional Networks) have been used in BCI applications to effectively capture the spatial relationships between different EEG channels. Sun et al. [15] propose an ASTGCN framework for MI-EEG classification, aiming to address the challenges faced by conventional GNN (graph neural network) models. While traditional GNN approaches have shown promise in various fields, their application to MI-EEG classification remains limited and often results in suboptimal performance [16,17]. The ASTGCN framework developed in [15] overcomes these limitations by capturing the spatiotemporal structure of multichannel EEG signals more effectively. Unlike traditional GNNs, which treat neighbour nodes with equal importance, ASTGCN adaptively determines the relevance of each neighbour node while simultaneously extracting temporal features. This approach enhances classification accuracy and robustness in real BCI applications. The proposed ASTGCN model demonstrated improved performance, achieving an average accuracy of 90.6% in the classification of two MI classes across 25 subjects. This result outperforms baseline methods, such as CNN-SAE [18] and EEGNet, which achieved average accuracies of 74.9% and 84.9%, respectively, highlighting ASTGCN’s potential for advancing MI-EEG classification tasks.

CNNs (Convolutional Neural Networks) have shown promise in spatial feature extraction, while LSTM (Long Short-Term Memory) networks excel at capturing temporal dependencies in time-series data [19,20,21]. By combining these two architectures, hybrid CNN-LSTM models can leverage both spatial and temporal patterns, offering superior performance in EEG-based tasks. CNN-LSTM models proved to be more efficient compared to simple LSTM architectures [22,23]. Liu et al. [24] present a CNN-LSTM network with weight-sharing techniques for recognizing MI-EEG. The CNN extracts spatial features from EEG signals, while the LSTM captures time-domain dependencies, making it suitable for decoding MI tasks with high accuracy (highest accuracy rate of 82.3%), especially in multi-class scenarios.

In parallel, ROS provides a flexible middleware that facilitates the control and programming of robots. Integrating EEG-based BCIs with ROS allows for efficient translation of neural commands into robotic movements, offering an accessible and scalable solution for thought-controlled mobile robots. Previous studies have explored such integrations, though challenges related to system robustness and real-time control remained [25]. ROS-Neuro was introduced as an extension of ROS to address the lack of standardized platforms for integrating neural interfaces with robotics [26]. Despite the benefits of using ROS-Neuro, real-time control remains a complex challenge. The inherent delay in EEG signal processing and the potential noise from brain signals can affect the accuracy and reliability of robotic movements. Gloria Beraldo et al. [27] proposed a technique of shared control navigation, where the BCI provides high-level commands (e.g., turning directions), and ROS handles low-level control such as obstacle avoidance and path planning.

Despite these advancements, many EEG-based BCI systems still face limitations in terms of accuracy, latency, and scalability, particularly when deployed in real-world scenarios. This research seeks to address these challenges by developing a novel hybrid CNN-LSTM model that enhances both spatial and temporal EEG signal interpretation, integrated with ROS for real-time control of a mobile robot.

This paper presents a comprehensive evaluation of several deep learning models, including ASTGCN, EEGNetv4, and a CNN-LSTM hybrid architecture, for controlling a two-wheeled mobile robot using EEG data. The primary contributions of this work are: (1) the development of a high-performance CNN-LSTM model, (2) seamless integration with ROS for real-time control, and (3) a comparative analysis of model performance in terms of accuracy, latency, and robustness. The findings can further be used in assistive technology applications, particularly for individuals with mobility impairments. The need for “eHealth” in geriatric rehabilitation has been growing fast lately according to [28], showing the growth of the aging population, which supports the need for advanced technology dedicated to rehabilitation. Blanco-Diaz et al. [29] use EEGs to estimate lower-limb movements during pedalling tasks. They use an LSTM model to decode kinematic parameters for ankle and knee joint angles. They discuss the idea of using these systems for personalized rehabilitation therapies, enhancing motor recovery for individuals with neurological conditions like stroke. Camargo-Vargas et al. [30] review the use of EEG-based BCIs in rehabilitation of lower and upper limbs. The review includes details about preprocessing techniques like filtering, feature extraction (e.g., Common Spatial Patterns and wavelet transforms), and classification methods such as Support Vector Machines and Linear Discriminant Analysis to interpret motor intentions. The review also presents how these processed signals enable applications such as motor imagery training, neurofeedback, and controlling assistive devices like robotic limbs and exoskeletons.

The remainder of this paper is organized as follows: Section 2 details the methodology, including data pre-processing, model architecture, and ROS integration. Section 3 provides a comprehensive discussion on the experimental results, challenges encountered, and the implications of the proposed models for real-world mobile robot applications. Section 4 provides insights, while Section 5 presents the conclusions and potential future directions.

2. Materials and Methods

This section outlines the methodologies employed in developing a robust BCI system for controlling a mobile robot using MI data. It describes the process of acquiring EEG signals, the deep learning models developed for decoding these signals, and the integration of these models with ROS to enable real-time control. Additionally, it details the setup of the system, including hardware specifications, data pre-processing techniques, model training strategies, and the evaluation criteria used to assess the system’s performance.

2.1. System Setup

The overall system configuration for this EEG-based BCI comprises a combination of hardware and software components that work in harmony to achieve real-time robotic control. As shown in Figure 1, the core hardware and software components of the system include:

EEG Acquisition Device: an OpenBCI EEG cap with 19 electrodes;
Computer System: equipped with an NVIDIA GEFORCE RTX 3060 Laptop GPU;
Mobile Robot: a two-wheeled mobile robot;
ROS (Robot Operating System) Noetic;
Deep Learning Framework;
Operating System: Ubuntu 20.04.

For the inference phase of the study, an OpenBCI EEG cap with 19 electrodes was employed [31], of which 16 electrodes (Fp1, Fp2, F4, Fz, F3, T3, C3, Cz, C4, T4, P4, Pz, P3, O1, O2, REF) were actively used to capture brain signals. The electrodes were placed following the international 10–20 system to ensure optimal coverage of the motor cortex. As described in [32] the system was connected to a computer using a Bluetooth dongle and a Cython Daisy module, enabling wireless transmission of EEG data in real-time. Additionally, the OpenBCI GUI (Graphical User Interface) was used for visualizing and recording the EEG signals [33]. OpenBCI GUI provides real-time feedback, including data visualization tools and customizable settings, which allows monitoring signal quality and electrode impedance during the setup. The OpenBCI hardware and GUI were selected due to their compatibility with open-source environments and their ability to provide accurate, non-invasive brainwave measurements necessary for real-time control tasks.

The mobile robot used in this project, derived from a customized version of a TurtleBot3-like platform [34], is equipped with a Raspberry Pi 2B and various sensors. The robot’s primary function is to navigate autonomously in an environment using a differential drive mechanism, where each wheel is controlled independently to enable smooth turns and movement in all directions. The Raspberry Pi serves as the central processing unit, running ROS, which manages the data exchange between different components and coordinates the robot’s actions.

Connection Setup and Commands Transmission

The integration between the Raspberry Pi and the other robotic components, such as the motor driver and sensors, is managed through ROS nodes that handle data flow. ROS nodes are created for different functionalities like motor control and command reception from the EEG classification system. To set up the communication, the Raspberry Pi is connected via an Arduino device to the motor driver L298N [35] through the GPIO pins, enabling direct control over the motors’ speed and direction.

The Raspberry Pi is networked with a workstation ((2) from Figure 1) via a local Wi-Fi network, enabling remote access and debugging through SSH (Secure Shell) and ROS topics. This configuration allows for the real-time transfer of commands, ensuring that the robot can be monitored and controlled from a distance without the need for constant physical adjustments.

Once the EEG classification identifies the intended motor imagery movement as part of a class, an associated command such as moving forward or turning left or right is transmitted to the motors via Raspberry Pi. The classification system, which runs on the dedicated machine, publishes these commands to a predefined ROS topic, /cmd_vel. This topic is subscribed to by a ROS node on the Raspberry Pi, which then translates these high-level commands into motor control signals for the driver L298N. Details can be found in Section 2.4.

2.2. EEG Data Collection

2.2.1. Public Dataset for Model Training

The deep learning models used in this study were trained on a publicly available EEG dataset from an OpenBCI competition [36]. This dataset consists of EEG recordings from 9 subjects, each performing MI tasks involving 4 distinct movements: the imagination of movement of the left hand (class 1), right hand (class 2), both feet (class 3), and tongue (class 4). The data were collected in two sessions per subject, with each session comprising 6 runs. Each run contained 48 trials, evenly distributed across the four motor imagery tasks, resulting in 288 trials per session per subject. During each trial (6 s), subjects were instructed to imagine the requested movement according to the following sequence: in the beginning a short acoustic warning signal was played in parallel with a cross displayed on the screen (the cross remains on the screen for the entire trial); after 2 s an arrow is displayed for 1.25 s, “pointing either to the left, right, down or up (corresponding to one of the four classes left hand, right hand, foot or tongue)” [36], prompting the subjects to focus on the desired body part; in the end (after 6 s), a short break is signalized using a black screen; repeat.

The EEG signals were recorded using 22 Ag/AgCl electrodes, with electrode placement following a standard montage. The data were sampled at 250 Hz, and a band pass filter (0.5 Hz to 100 Hz) was applied to remove unwanted frequency components. A 50 Hz notch filter was also employed to mitigate line noise. The amplifier was set to a sensitivity of 100 µV to ensure high-quality signal capture. These pre-processing steps and signal acquisition settings were consistent across all subjects, ensuring high-quality recordings suitable for training the deep learning models. The dataset was subsequently labelled according to the motor imagery tasks. The labeling process came after a rigorous inspection carried out by an expert that marked the trials containing artifacts, providing a reliable foundation for the model training phase.

2.2.2. Data Acquisition and Pre-Processing

For real-time inference, the EEGs were recorded in a way that ensures the compatibility with the DL models trained on the public dataset described in Section 2.2.1. Thus, the subject was instructed to think of one of the specified movements for at least 6 s. In this way, the model could receive relevant data to classify later.

EEG signals were acquired at a sampling rate of 250 Hz using the OpenBCI system, which features 19 electrodes, with 16 actively used during the experiment. Furthermore, to improve the quality of the signals, pre-processing steps were applied. The raw EEG data was first filtered using a band pass filter between 0.5 Hz and 50 Hz to eliminate low-frequency drifts and high-frequency noise. A notch filter at 50 Hz was applied to mitigate power line interference. The cleaned data was segmented into 6-s segments, aligning with the motor imagery tasks performed in Section 2.2.1.

Lastly, the pre-processed signals were subsequently passed to the trained DL models for real-time prediction of motor imagery tasks, which involved the imagination of movements such as the left hand, right hand, both feet, or tongue. This setup ensured consistency between the training data from the public dataset [36] and the real-time inference system.

2.3. Deep Learning Model

2.3.1. Initial Models: EEGNetv4 and ASTGCN

The initial phase of the model experimentation involved the training and evaluation of two state-of-the-art models: EEGNetv4 and ASTGCN. EEGNetv4, a lightweight deep learning architecture specifically designed for EEG classification tasks, was chosen for its compact design and ability to extract both spatial and temporal features from EEG data. This model utilizes depth-wise and separable convolutions to reduce computational complexity while maintaining high classification accuracy in BCI applications [7].

Simultaneously, ASTGCN was explored to capture the spatial relationships between different EEG electrodes, as well as the temporal dynamics of EEG signals. ASTGCN leverages graph convolutional networks to model the connectivity between EEG channels and temporal convolutions to capture the progression of brain activity over time [15]. These approaches provided promising results during initial experimentation but faced challenges in generalizing to the MI tasks under study, particularly in multi-class scenarios.

2.3.2. Transition to CNN-LSTM Architecture

After evaluating the performance of EEGNetv4 and ASTGCN, a hybrid CNN-LSTM architecture was developed and tested to better capture the complex spatial-temporal patterns present in EEG signals during motor imagery. The CNN-LSTM model was designed to leverage the strengths of both convolutional layers for spatial feature extraction and LSTM layers for temporal sequence modelling. The same parameters were applied across all three architectures, with details about EEGNetv4 and ASTGCN provided in the Results section. The following paragraphs and Section 2.3.3 present the methodology used for the CNN-LSTM architecture, shown in Figure 2.

The CNN section included a single convolutional layer with 40 filters, a kernel size of 20, and a stride of 4, effectively reducing the dimensionality of the input data and extracting spatial features. The ReLU activation function was applied to introduce non-linearity, enabling the model to capture complex patterns. The ReLU function is defined as:

R e L U (x) = \max (0, x),

(1)

where

x

represents the input to the activation function. This function outputs the input directly if it is positive, otherwise, it outputs zero, which helps the model learn non-linear relationships. To prevent overfitting, a dropout layer with a rate of 0.5 was added, followed by batch normalization to stabilize and accelerate training. A max-pooling layer with a pool size of 4 and stride of 4 further reduced the spatial dimensions.

Following the CNN layer, the LSTM section comprised 2 LSTM layers with 30 units each, configured to return sequences. This configuration allowed the model to capture the temporal dynamics and long-term dependencies of the EEG signals. Dropout layers with a rate of 0.5 were again applied between LSTM layers, along with batch normalization, to enhance model robustness and generalization. A final dense layer with a softmax activation function was used to classify the EEG signals into their respective motor imagery classes.

2.3.3. Training—Validation—Testing

The CNN-LSTM model was trained on the public EEG dataset [36], which includes motor imagery tasks involving four different movements (left hand, right hand, both feet, tongue). For training and validation, data from 8 subjects was used, with an 80/20 split within each subject’s data for training and validation sets, respectively. The remaining 9th subject’s data was reserved for testing, allowing for an evaluation of the model’s ability to generalize to unseen participants. Two training strategies were employed: a within-subject approach, and a cross-subject approach. The within-subject strategy involves the dividing of the dataset into subsets, one for each subject, then the models being trained and validated on the same subject’s data. The cross-subject approach means training the models on data from multiple subjects by combining data from all subjects into one large training/validation set.

The models were trained using two different optimizers. Initially, the Adam optimizer was applied, followed by the Radam optimizer, both paired with the categorical cross-entropy loss function to measure performance.

The Adam optimizer updates the model parameters by computing adaptive learning rates for each parameter using estimates of the first moment (mean) and the second moment (uncentered variance) of the gradients. The parameter update rule is given by:

θ_{t + 1} = θ_{t} - η \frac{\hat{m_{t}}}{\sqrt{\hat{v_{t}}} + ϵ},

(2)

where

\hat{m_{t}}

and

\hat{v_{t}}

are bias-corrected estimates of the first and second moments, respectively,

η

is the learning rate, and

ϵ

is a small constant for numerical stability.

RAdam extends Adam by introducing a variance rectification term, which helps to stabilize the training process, especially in the early stages. The rectified update is defined as:

θ_{t + 1} = θ_{t} - η \cdot r_{t} \cdot \frac{\hat{m_{t}}}{\sqrt{\hat{v_{t}}} + ϵ},

(3)

where

r_{t}

is a rectification term that adjusts the variance based on the current training step, making the optimization process more stable.

For the loss function, the categorical cross-entropy was used to measure the difference between the predicted class probabilities and the true labels. It is defined as:

L (y, \hat{y}) = - \sum_{i = 1}^{C} y_{i} \log (\hat{y_{𝚤}}),

(4)

where

y_{i}

represents the true label for class

i

\hat{y_{𝚤}}

is the predicted probability of class

i

, and

C

is the total number of classes. This loss function helps the model adjust its predictions to minimize the difference between predicted probabilities and the actual class distribution, guiding the optimization process effectively.

To further refine the training process, the LRScheduler provided by the Skorch library [37] was used to dynamically adjust the learning rate during training. Specifically, the CosineAnnealingLR policy was applied, which gradually reduces the learning rate according to a cosine annealing schedule. In this approach,

η_{m a x}

is set to the initial learning rate, and

T_{c u r}

represents the number of epochs since the last restart, as part of the SGDR (Stochastic Gradient Descent with Warm Restarts) mechanism [38]:

η_{t} = η_{m i n} + \frac{1}{2} (η_{m a x} - η_{m i n}) (1 + c o s (\frac{T_{c u r}}{T_{m a x}} π)),

(5)

The NeuralNetClassifier [39] from the Skorch library is used in this setup to streamline the process of training the deep learning models for classification. With NeuralNetClassifier, the training process is simplified: it automatically manages tasks like splitting the data into training and validation sets, adjusting learning rates during training, and saving the best-performing version of the model. A total of 300 training iterations (epochs) were conducted multiple times. The model was trained on an NVIDIA GEFORCE RTX 3060 Laptop GPU using CUDA 11.6 and the PyTorch 2.0.1 deep learning framework [40].

During the training phase, the model’s performance was evaluated on the validation set to optimize hyperparameters and monitor overfitting. Metrics such as accuracy/loss and F1-score were calculated for each motor imagery class. After the validation phase, the model that achieved the highest performance was tested on an independent test set, which included unseen data from the 9th subject to evaluate the model’s ability to generalise.

The CNN-LSTM model ultimately outperformed both EEGNetv4 and ASTGCN in terms of classification accuracy and robustness, achieving a testing accuracy of 88.5% for the cross-subject scenario. This model’s ability to effectively decode motor imagery tasks with high accuracy made it the best candidate for real-time EEG-based robot control.

2.4. ROS Integration

The DL model was integrated with the ROS to enable real-time control of a two-wheeled mobile robot based on EEG signals. ROS was chosen due to its modular and flexible architecture, which allows seamless communication between the different components of the BCI system. The integration involved developing custom ROS nodes responsible for receiving, processing, and translating the neural commands predicted by the CNN-LSTM model into corresponding robot movement commands.

2.4.1. Architecture of ROS Integration

The overall ROS architecture consisted of three main modules:

EEG Signal Acquisition Node
Neural Command Processing Node
Robot Control Node

EEG Signal Acquisition node handled the real-time data acquisition from the EEG device during the inference phase. The EEG signals were pre-processed, following the steps detailed in Section 2.2.2, and transmitted to the DL model for motor imagery classification. The EEG data was streamed in real-time to the ROS system using Bluetooth communication.

The processed EEG data was then fed into the pre-trained CNN-LSTM model, which was embedded within the Neural Command Processing node using a custom wrapper. This node utilized ROS messages to transmit predictions from the model to other system components. The predictions corresponded to motor imagery classes (e.g., left hand, right hand, both feet, tongue), which were mapped to specific control commands for the robot.

The output from the CNN-LSTM model was transformed into low-level robot control commands (e.g., turn left, turn right, forward, stop). These commands were sent to the robot’s motor controllers through ROS’s communication system. The Robot Control node responsible for robot control used a predefined action library to execute the commands based on the motor imagery classification results.

2.4.2. Real-Time Control and Feedback

To maintain real-time control of the robot, the system employed ROS Topics to publish and subscribe to different data streams efficiently. The neural commands predicted by the CNN-LSTM model were published to a specific ROS topic, which the robot’s control node subscribed to. This modular architecture ensured low-latency communication between components, allowing the robot to respond to EEG-based commands in near real-time.

The tasks performed by the robot were designed to evaluate the system’s ability to accurately interpret motor imagery commands and translate them into robot movements. The control tasks included directional movement. The robot was instructed to turn left, turn right, move forward by 0.5 m or stop based on the user’s motor imagery of moving left hand, right hand, both feet or tongue, respectively. These tasks were designed to simulate real-world scenarios in which a user with mobility impairments could navigate a robotic assistant or wheelchair.

2.5. Performance Metrics and Statistical Analysis

The performance of the system was evaluated based on several key metrics. The primary metric for model performance was classification accuracy, which measured the ability of the deep learning models (CNN-LSTM, EEGNetv4, ASTGCNN) to correctly predict motor imagery tasks. The CNN-LSTM model achieved the highest accuracy at 88.5% during testing, outperforming the other models.

Response time was a critical metric in evaluating the system’s real-time performance. This metric measured the time delay between when the EEG signals were acquired and when the corresponding robotic action was executed. The minimum latency achieved in real-time control averaged a response time of 200 ms.

The robustness of the system was evaluated based on its ability to maintain accurate control in varying conditions, such as noisy environments or variability in user EEG signals. The dropout layers in the CNN-LSTM model helped enhance the system’s robustness. Tests conducted across multiple sessions demonstrated consistent performance for the same subject, but the process proved to be challenging for untrained users.

The cross-subject training strategy aimed to evaluate the model’s ability to generalize across different users. The CNN-LSTM model showed improved generalization compared to EEGNetv4 and ASTGCNN, highlighting its adaptability in multi-subject scenarios.

To further validate the differences in performance between the models, McNemar’s test was employed as a statistical tool to assess the significance of their classification discrepancies. McNemar’s test is particularly useful in scenarios where models are compared on the same set of test data, focusing on the instances where their predictions differ. It evaluates whether the difference in the misclassification rates between two models is statistically significant, providing a deeper understanding of their comparative effectiveness.

In this study, McNemar’s test was applied to evaluate the prediction discrepancies between the CNN-LSTM model and EEGNetv4, having the best accuracy rates. The outcomes of each model on the same test instances were categorized into four groups: (1) both models correctly predicted the outcome, (2) CNN-LSTM was correct while the other model was incorrect, (3) EEGNetv4 or ASTGCN was correct while CNN-LSTM was incorrect, and (4) both models made incorrect predictions. McNemar’s test then focused on the second and third groups—instances where one model succeeded while the other failed—to determine whether these differences were statistically meaningful.

The test statistic for McNemar’s test is derived from the formula:

χ^{2} = \frac{{(B - C)}^{2}}{B + C}

(6)

where

B

represents the number of cases where the CNN-LSTM model was correct but the compared model (e.g., EEGNetv4) was incorrect, and

C

represents cases where the opposite was true. The test follows a

χ^{2}

distribution with one degree of freedom. A significant result (e.g., p < 0.05) indicates that the observed differences in prediction accuracy between the models are not due to random chance but reflect a genuine performance advantage of one model over the other.

By incorporating McNemar’s test, this study ensures a rigorous statistical validation of the performance differences between the models. This approach proves the claim that the CNN-LSTM model not only achieves higher accuracy but also offers a statistically significant improvement over its counterparts in EEG-based motor imagery classification presented in this paper.

3. Results

This section presents the performance of the evaluated deep learning models—EEGNetv4, ASTGCN, and CNN-LSTM—in classifying motor imagery tasks and the integration of the CNN-LSTM model with ROS for real-time robotic control. The models were evaluated based on classification accuracy, F1-score, response time, robustness, and generalization ability.

The CNN-LSTM model outperformed both EEGNetv4 and ASTGCN, achieving the highest classification accuracy of 88.5% and an F1-score of 0.87 for cross-subject strategy. This suggests that the CNN-LSTM’s ability to combine spatial feature extraction through convolutional layers with temporal sequence modelling via LSTM layers made it particularly effective for motor imagery classification tasks. The higher F1-score indicates a balanced precision and recall, making the CNN-LSTM model more reliable in distinguishing between different motor imagery classes.

In contrast, EEGNetv4 achieved a classification accuracy of 83.9% with an F1-score of 0.81, reflecting its effectiveness as a compact model, tailored for EEG signal classification in the case of the cross-subject strategy. While it did not achieve the same level of accuracy as the CNN-LSTM model, its relatively high performance demonstrates its utility in scenarios where a lightweight architecture is advantageous.

The ASTGCN model, despite its advanced graph-based approach for modelling the spatial relationships between EEG channels, achieved the lowest accuracy at 78.6% and an F1-score of 0.75 for cross-subject scenario. This lower performance may be attributed to the challenges of adapting graph-based models to the highly variable nature of EEG data, which may require more tuning or additional data for optimal results.

Overall, the results indicate that while all three models are capable of decoding motor imagery signals, the CNN-LSTM model’s accuracy and robustness make it the most suitable candidate for real-time applications, particularly in the context of controlling mobile robots through the ROS framework. The results on within-subject strategy are also discussed in the next sections.

3.1. Classification Performance

This section provides an in-depth comparison of the three deep learning models—EEGNetv4, ASTGCN, and CNN-LSTM—evaluated for their ability to classify MI tasks using EEG data. Table 1 summarizes the best overall results for each model in both within-subject and cross-subject scenarios. In the case of within-subject strategy the best model was chosen as follows: a new model was trained with data from each new subject resulting in a set of 8 accuracy percentages; the best model was considered the model corresponding to the best accuracy percentage from the set.

3.1.1. Within-Subject Evaluation

In the within-subject evaluation, each model was trained and tested using data from the same subject, allowing the models to learn the unique neural patterns associated with each individual. As shown in Table 1, the CNN-LSTM model achieved the highest accuracy of 98.7% and an F1-score of 0.89, outperforming EEGNetv4 and ASTGCN. This suggests that the hybrid architecture of CNN-LSTM, which combines convolutional layers for spatial feature extraction and LSTM layers for temporal pattern recognition, is particularly effective at capturing the intricacies of motor imagery signals. All three models obtained the best results for the same subject, pointing out how much of an influence the quality of data has.

The EEGNetv4 model performed well in this scenario, achieving an accuracy of 93.5% and an F1-score of 0.85. Despite its simpler architecture, EEGNetv4’s ability to focus on the most relevant spatial and temporal features contributed to its strong performance. However, its lightweight nature resulted in a slightly lower performance compared to CNN-LSTM, indicating that more complex patterns may require deeper modelling.

The ASTGCN model, while leveraging graph-based representations to capture spatial relationships between EEG channels, achieved an accuracy of 84.2% and an F1-score of 0.80. Although it performed adequately, the model faced challenges in fully utilizing the spatial relationships. This suggests that the graph-based approach might require further tuning to better adapt to the motor imagery domain.

3.1.2. Cross-Subject Evaluation

The cross-subject evaluation provides a more challenging test of a model’s generalization capability, as it involves training on data from multiple subjects and testing on an unseen participant. As shown in Table 1, the CNN-LSTM model again achieved the highest accuracy of 88.5% and an F1-score of 0.87%, demonstrating its ability to generalize well to new users.

In comparison, EEGNetv4 achieved a cross-subject accuracy of 83.9% with an F1-score of 0.81, showing a moderate decrease in performance compared to the within-subject scenario. This decrease is expected due to the inherent variability between subjects’ neural patterns. However, EEGNetv4’s relatively consistent performance suggests that it can be a viable option when a lightweight model with reasonable generalization capabilities is needed.

The ASTGCN model experienced a notable drop in performance in the cross-subject evaluation, achieving an accuracy of 78.6% and an F1-score of 0.75. This decline highlights the model’s struggle with the variability in EEG patterns between different subjects, which is a common challenge in BCI applications. The reliance of ASTGCN on predefined graph structures to capture spatial relationships between EEG channels may have limited its ability to generalize across subjects with different neural patterns.

Figure 3 illustrates the confusion matrix for the ASTGCN model in this scenario, revealing that the model faced some difficulties in distinguishing between certain motor imagery classes (0—left hand, 1—right hand, 2—feet, 3—tongue). The model shows a good performance in predicting “tongue”. It also performs well for “feet”, having some misclassifications with “left hand” and “tongue”. The performance for “left hand” raises some concerns, considering the notable confusions with “feet” and “tongue”. The most difficult class for the model to distinguish seems to be “right hand”, a particularly high number of misclassifications occurring between it and “feet” or “tongue” classes. This performance suggests that the model could not effectively differentiate finer motor tasks when faced with cross-subject variability. This points to a need for further optimization or more robust training strategies to improve the model’s ability to generalize across users.

3.1.3. Accuracy Trends During Training

Figure 4a shows the training accuracy trends of the three models—ASTGCN, EEGNetv4, and CNN-LSTM—over 100 epochs during cross-subject training. Figure 4b extends this analysis over 300 epochs but focuses on CNN-LSTM and EEGNetv4, as ASTGCN reached a performance plateau after the initial 100 epochs.

Figure 4a shows that during the first 30 epochs, all models exhibit rapid learning, with ASTGCN showing a slight advantage in early-stage accuracy. This suggests that the ASTGCN model was initially more effective at capturing spatial-temporal relationships within the EEG data. However, after the 80th epoch, its learning curve began to plateau, indicating a limitation in the model’s ability to further adapt to the nuances of cross-subject variability.

In contrast, the CNN-LSTM model demonstrated a more consistent improvement over the entire 100-epoch period. It reached ASTGCN’s performance by the end of this phase. This smoother progression implies that the CNN-LSTM model could better capture both spatial and temporal features of the motor imagery data, maintaining a stable learning trajectory. This robustness is further observable in Figure 4b, where CNN-LSTM continued to increase in accuracy up to around 300 epochs. This consistency indicates that the model effectively generalized across different subjects without overfitting to specific patterns in the data.

EEGNetv4, while initially trailing behind ASTGCN in early epochs, exhibited a steady upward trend throughout the 100 epochs. The model managed to achieve a competitive accuracy, albeit with more pronounced fluctuations compared to CNN-LSTM. This variability could be attributed to EEGNetv4’s simpler architecture, which may require more adjustments to fully capture the complex patterns in EEG data. In the longer training phase, EEGNetv4 continued to improve gradually, though its accuracy remained lower than CNN-LSTM. Regardless, it still managed to achieve a reasonable level of convergence.

The significant fluctuations in all models’ accuracy suggest sensitivity to parameter adjustments and potential overfitting, highlighting the sensitivity to the diverse nature of cross-subject EEG data.

3.2. Impact of Hyperparameter Tuning on Model Performance

In this study, several critical hyper-parameters were fine-tuned to optimize the performance of the deep learning models—CNN-LSTM, EEGNetv4, and ASTGCN—on the motor imagery EEG data. Among these, the learning rate and choice of optimizer played pivotal roles in determining the models’ convergence speed, stability, and final accuracy. The learning rate controls how much the model weights are adjusted with each update, while the choice of optimizer influences how these adjustments are made, impacting the efficiency of gradient descent.

3.2.1. Learning Rate Selection

Various learning rates were tested during the training phase, ranging from 0.0001 to 0.01. Through empirical experimentation, a learning rate of 0.001 was initially found to provide a balance between speed and stability for all three models CNN-LSTM, EEGNetv4 and ASTGCN. However, further analysis revealed that a lower learning rate of 0.0005 was particularly effective for the CNN-LSTM model. While this learning rate resulted in slower convergence during the initial training epochs, it ultimately led to a more stable and robust learning process. The slower convergence allowed the model to better adapt to the intricacies of the motor imagery EEG data, reducing overfitting and improving its generalization across subjects. This made the compromise of slower convergence worthwhile, as it contributed to higher final accuracy and a smoother training curve. For all models, a cosine annealing learning rate schedule was applied, gradually reducing the learning rate during training to fine-tune the models as they approached convergence, ensuring that the learning process did not stagnate.

3.2.2. Optimizer Comparison: Adam vs. RAdam

The choice of optimizer significantly impacted the training dynamics of the models. Initially, the Adam optimizer was used due to its adaptive learning rate capabilities and ease of implementation. Adam demonstrated effective convergence during the initial training stages, especially for CNN-LSTM, leading to faster accuracy gains in the first 50 epochs. However, as training progressed, the model’s performance plateaued, indicating that a more stable optimization method might be beneficial.

Subsequently, the RAdam optimizer was tested. RAdam introduces a variance rectification term, which helps to stabilize the early stages of training by adjusting the learning rate more dynamically. This led to smoother convergence in all three models, reducing fluctuations in accuracy and improving overall generalization. While Adam performed well for initial training, RAdam’s stability resulted in slightly higher accuracy and lower variance during cross-subject evaluation, making it the preferred choice for the final model training.

3.2.3. Final Model Selection Based on Parameter Tuning

Based on the results of the hyper-parameter tuning process, the CNN-LSTM model with a learning rate of 0.001 and the RAdam optimizer emerged as the best-performing configuration. This combination provided a balance between rapid initial learning and stable long-term convergence, ultimately achieving the highest accuracy and F1-score in cross-subject evaluations. The use of RAdam ensured smoother training curves and better generalization across diverse EEG patterns compared to Adam. These characteristics made the CNN-LSTM model particularly suitable for the complex nature of motor imagery classification, justifying its selection as the final model for real-time BCI applications.

Figure 5 presents a comparison of the training accuracy across 300 epochs for the CNN-LSTM model using the Adam and RAdam optimizers. As illustrated, the RAdam optimizer provided smoother convergence, with fewer fluctuations in accuracy beyond the 50th epoch.

3.3. Latency and Real-Time Response Analysis

The latency and real-time response of the system are critical factors in evaluating its suitability for practical applications, especially in scenarios where a user needs to control a mobile robot using MI. In this study, latency refers to the total time taken from the moment the subject begins imagining a movement to the point where the system classifies the movement and sends a corresponding command to the robot. A key aspect of this process is the 6-s window during which the subject imagines a specific movement, such as moving one hand, the feet or the tongue.

After this 6-s imagination period, the system processes the EEG data, classifies the imagined movement using the trained CNN-LSTM model, and translates the prediction into a control command for the robot. The time taken for this classification and command transmission is critical for ensuring a smooth and responsive interaction between the user and the robot. The experiments showed that the average time for processing and classification after the 6-s imagination window was approximately 200 milliseconds for the CNN-LSTM model. This included the time required for data pre-processing, feature extraction, and model inference.

The total response time, therefore, consisted of the 6-s period of motor imagery, followed by this 200 ms processing time. While the initial 6-s duration is necessary for the subject to focus and generate clear motor imagery signals, the rapid processing time of the CNN-LSTM model ensured that the system could provide timely feedback to the user.

3.4. Statistical Significance Analysis Using McNemar’s Test

To evaluate whether the differences in classification performance between the CNN-LSTM model and EEGNetv4 are statistically significant, a McNemar’s test was conducted. The test compares the models’ performance on the same test dataset, focusing on instances where the models disagree on their predictions.

For the comparison, the test categorized outcomes into four groups:

instances where both models were correct,
instances where CNN-LSTM was correct but EEGNetv4 was incorrect,
instances where EEGNetv4 was correct but CNN-LSTM was incorrect, and
instances where both models were incorrect.

The McNemar’s test resulted in a

χ^{2}

value of 6.35 with a p-value of 0.01, indicating that the difference in performance between the two models is statistically significant at the 0.05 level. This suggests that the superior accuracy observed for the CNN-LSTM model over EEGNetv4 is not due to random chance but reflects a genuine improvement in model performance.

4. Discussion

The results of this study highlight the potential of using deep learning models, specifically the CNN-LSTM architecture, for EEG-based mobile robot control in a real-time BCI system. The CNN-LSTM model achieved the highest classification accuracy of 88.5% in the cross-subject evaluation, outperforming the other models, EEGNetv4 and ASTGCN. This finding is particularly important for practical applications, as the cross-subject setting represents a more realistic use case where the system must adapt to varying neural patterns from different users. The superior performance of CNN-LSTM can be attributed to its ability to effectively capture both the spatial features of EEG signals through convolutional layers and the temporal dependencies using LSTM layers. This makes it better suited for the dynamic and complex nature of motor imagery tasks compared to the more lightweight EEGNetv4 and the graph-based approach of ASTGCN.

Despite its success, the CNN-LSTM model also faced challenges, particularly related to training time and computational resources. This is due to the large number of parameters used to integrate a CNN for spatial feature and LSTMs for temporal sequences. The significant computational resources and memory needed are evident during training, especially when using a learning rate of 0.0005. Though beneficial for stability, this scenario shows slower convergence and requires more epochs to achieve optimal accuracy, extending the training time. This trade-off between convergence speed and model stability is crucial when designing real-time BCI systems, as the goal is to achieve both high accuracy and timely response. The use of the RAdam optimizer played a significant role in maintaining smooth training curves, ensuring that the model did not overfit to specific training data patterns while still adapting well to new subjects.

In response to these issues, several strategies can be adopted. Optimizing the model architecture by implementing techniques such as pruning, normalization or transfer learning, can be considered. These methods help by removing redundant weights at the cost of lowering precision, aiming for a smaller, more efficient model.

The latency analysis further supports the viability of the CNN-LSTM model for real-time control applications. The system’s average response time of 200 milliseconds after a 6-s motor imagery period is competitive, providing a balance between the time required for signal acquisition and the need for quick robotic responses. This rapid processing time is crucial in real-world scenarios, such as controlling assistive devices for individuals with mobility impairments, where any delays can impact user experience and the system’s usability. When scaling the application to more complex tasks, usually the data resolutions are also higher, resulting in challenges for the model. Inefficiencies can interfere during the data acquisition step leading to an inconsistent pre-processing phase, resulting in possible bottlenecks during the physical implementation. A possible solution for such challenges could be represented by a change in the length of the EEG data window necessary for classification. Shorter motor imagery windows can be used for classification (e.g., 4 s). The downside of this kind of a solution is the fact that the entire training process should be repeated from the beginning.

Moreover, the application of McNemar’s test provided statistical validation for the differences in performance between the models. The test revealed that the observed improvement in the CNN-LSTM model’s accuracy over EEGNetv4 was statistically significant (p < 0.05). This result underscores the robustness of the CNN-LSTM model in decoding motor imagery tasks compared to the other architectures, providing evidence that its performance is not merely a result of random variation in the dataset but a genuine improvement in prediction accuracy.

The study also highlighted the limitations of the ASTGCN model. Although ASTGCN initially exhibited better accuracy in early training epochs, it reached a performance plateau after 100 epochs, failing to match the long-term learning capability of CNN-LSTM. This plateau may be due to the model’s reliance on predefined graph structures, which could limit its ability to adapt to diverse EEG patterns across different subjects. As a result, while ASTGCN’s graph-based approach holds promise for capturing spatial relationships, it may require further tuning or more sophisticated strategies to fully leverage its potential in cross-subject BCI applications.

5. Conclusions

Although the application presented in this paper has shown promising results, there are some disadvantages that need to be addressed. EEG data is usually noisy and considerably dependent on the subject, which make them challenging. Even though the CNN-LSTM model performed well in cross-subject evaluations, extensive finetuning may be required, if it is to be applied to a broader population. If so, the computational resources needs will increase.

With the scaling of the application, another downside can be involved, namely the number of electrodes necessary for data acquisition. Numerous electrodes mean time consuming procedures, to prepare the subject, which can lead to a less user-friendly setup.

From a computational point of view, the CNN-LSTM model needs a lot of processing power, especially for training and running in real time. It depends on advanced GPUs, which might not be available to all users or organizations with limited resources. In addition, the 6-s time frame needed to collect motor imagery data can cause delays in situations where quick decisions are important. Even though the 200-millisecond processing time is quite fast, the total response time might still feel slow for tasks that need immediate feedback.

In terms of system integration, the study successfully demonstrated the feasibility of using ROS as a middleware to translate EEG-based commands into robotic actions. The custom ROS nodes for data acquisition, neural command processing, and robot control allowed for seamless communication between the CNN-LSTM model and the mobile robot, enabling precise control of the robot’s movements. This setup proves to be a scalable solution that can be adapted to various robotic platforms, highlighting the versatility of combining ROS with deep learning models for BCI systems. One obvious advantage of this fact is the hands-free control provided which can allow, in a future version of the application, individuals with mobility impairments to interact with the environment intuitively.

Overall, this study offers a comprehensive evaluation of different deep learning architectures for EEG-based motor imagery classification and their integration into real-time control of a mobile robot. The results suggest that hybrid models like CNN-LSTM hold great promise for practical BCI applications, particularly in assistive technologies where adaptability and reliability are paramount. The low latency of CNN-LSTM architecture represents another advantage, enabling timely responses in real-time implementation.

Author Contributions

Conceptualization, B.G. and V.V.; methodology, L.V. and A.-M.T.; software, B.G. and A.P.; validation, A.Z., L.V. and V.V.; formal analysis, B.G. and A.-M.T.; investigation, B.G.; resources, L.V.; data curation, B.G., A.Z. and A.P.; writing—original draft preparation, B.G.; writing—review and editing, A.-M.T. and Y.F.; visualization, B.G.; supervision, V.V.; project administration, L.V. and Y.F.; funding acquisition, L.V. and Y.F. All authors have read and agreed to the published version of the manuscript.

Funding

The paper was funded by the European Commission Marie Skłodowska-Curie iMARS project, Intelligent Multi Agent Robotic Systems, HORIZON-MSCA-2023-SE-01-01-MSCA Staff Exchanges 2023, Grant agreement ID: 101182996; JESH 2023 (Joint Excellence in Science and Humanities) project Pose Estimation for Rehabilitation Robots (V. Vlădăreanu); Research on key technologies of intelligent multi-posture lower limb rehabilitation robot project, Ningbo Municipal International Science and Technology Cooperation; and Ningbo International Cooperation Project under Grant 2023H014.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from the author who was the only subject involved in the study.

Data Availability Statement

Restrictions apply to the availability of these data. Data were obtained from the Institute for Knowledge Discovery (Laboratory of Brain-Computer Interfaces) at Graz University of Technology and are available at https://www.bbci.de/competition/iv/ (accessed on 13 December 2024) with the permission of Clemens Brunner, Robert Leeb, Gernot Müller-Putz, Alois Schlögl, and Gert Pfurtscheller. The dataset includes EEG recordings of 4-class motor imagery (left hand, right hand, feet, tongue) with 22 EEG channels (0.5–100 Hz, notch filtered), 3 EOG channels, sampled at 250 Hz, covering 4 motor imagery classes across 9 subjects.

Acknowledgments

The authors gratefully acknowledge the support of the Robotics and Mechatronics Department of the Institute of Solid Mechanics of the Romanian Academy, and Zhejiang Youren Intelligent Robotics Co., Ltd., Ningbo University, and Ningbo Polytechnic by the “Research on Key Technologies of Multi-Posture Lower Limb Rehabilitation Robot” Project, Grant 2023H014.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Värbu, K.; Muhammad, N.; Muhammad, Y. Past, present, and future of EEG-based BCI applications. Sensors 2022, 22, 3331. [Google Scholar] [CrossRef]
Lazcano-Herrera, A.G.; Fuentes-Aguilar, R.Q.; Chairez, I.; Alonso-Valerdi, L.M.; Gonzalez-Mendoza, M.; Alfaro-Ponce, M. Review on BCI virtual rehabilitation and remote technology based on EEG for assistive devices. Appl. Sci. 2022, 12, 12253. [Google Scholar] [CrossRef]
Padfield, N.; Camilleri, K.; Camilleri, T.; Fabri, S.; Bugeja, M. A comprehensive review of endogenous EEG-based BCIs for dynamic device control. Sensors 2022, 22, 5802. [Google Scholar] [CrossRef] [PubMed]
Indurani, P.; Firdaus, B.B. A detailed analysis of EEG signal processing in E-healthcare applications and challenges. Int. J. Innov. Res. Sci. Eng. Technol. 2021, 10, 635–642. [Google Scholar]
AltahAltaheri, H.; Muhammad, G.; Alsulaiman, M.; Amin, S.U.; Altuwaijri, G.A.; Abdul, W.; Bencherif, M.A.; Faisal, M. Deep learning techniques for classification of electroencephalogram (EEG) motor imagery (MI) signals: A review. Neural Comput. Appl. 2023, 35, 14681–14722. [Google Scholar] [CrossRef]
Schirrmeister, R.T.; Springenberg, J.T.; Fiederer, L.D.J.; Glasstetter, M.; Eggensperger, K.; Tangermann, M.; Hutter, F.; Burgard, W.; Ball, T. Deep learning with convolutional neural networks for EEG decoding and visualization. Hum. Brain Mapp. 2017, 38, 5391–5420. [Google Scholar] [CrossRef] [PubMed]
Lawhern, V.J.; Solon, A.J.; Waytowich, N.R.; Gordon, S.M.; Hung, C.P.; Lance, B.J. EEGNet: A compact convolutional neural network for EEG-based brain–computer interfaces. J. Neural Eng. 2018, 15, 056013. [Google Scholar] [CrossRef]
Han, C.H.; Choi, G.Y.; Hwang, H.J. Deep convolutional neural network based eye states classification using ear-EEG. Expert Syst. Appl. 2022, 192, 116443. [Google Scholar] [CrossRef]
Zhao, X.; Zhang, H.; Zhu, G.; You, F.; Kuang, S.; Sun, L. A multi-branch 3D convolutional neural network for EEG-based motor imagery classification. IEEE Trans. Neural Syst. Rehabil. Eng. 2019, 27, 2164–2177. [Google Scholar] [CrossRef] [PubMed]
Mirzabagherian, H.; Menhaj, M.B.; Suratgar, A.A.; Talebi, N.; Sardari MR, A.; Sajedin, A. Temporal-spatial convolutional residual network for decoding attempted movement related EEG signals of subjects with spinal cord injury. Comput. Biol. Med. 2023, 164, 107159. [Google Scholar] [CrossRef]
Santamaria-Vazquez, E.; Martinez-Cagigal, V.; Vaquerizo-Villar, F.; Hornero, R. EEG-inception: A novel deep convolutional neural network for assistive ERP-based brain-computer interfaces. IEEE Trans. Neural Syst. Rehabil. Eng. 2020, 28, 2773–2782. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Melinte, D.O.; Vlădăreanu, L. Facial expressions recognition for human–robot interaction using deep convolutional neural networks with rectified adam optimizer. Sensors 2020, 20, 2393. [Google Scholar] [CrossRef] [PubMed]
Liu, L.; Jian, D.; He, T.; Chen, W.; Li, X. On the Variance of the Adaptive Learning Rate and Beyond. arXiv 2019, arXiv:1908.03265. [Google Scholar]
Sun, B.; Zhang, H.; Wu, Z.; Zhang, Y.; Li, T. Adaptive spatiotemporal graph convolutional networks for motor imagery classification. IEEE Signal Process. Lett. 2021, 28, 219–223. [Google Scholar] [CrossRef]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Diao, Z.; Wang, X.; Zhang, D.; Liu, Y.; Xie, K.; He, S. Dynamic spatial-temporal graph convolutional neural networks for traffic forecasting. Proc. AAAI Conf. Artif. Intell. 2019, 33, 890–897. [Google Scholar] [CrossRef]
Tabar, Y.R.; Halici, U. A novel deep learning approach for classification of EEG motor imagery signals. J. Neural Eng. 2016, 14, 016003. [Google Scholar] [CrossRef] [PubMed]
Hochreiter, S. Long Short-term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Graves, A. Generating sequences with recurrent neural networks. arXiv 2013, arXiv:1308.0850. [Google Scholar]
Gers, F.A.; Schmidhuber, J.; Cummins, F. Learning to forget: Continual prediction with LSTM. Neural Comput. 2000, 12, 2451–2471. [Google Scholar] [CrossRef]
Xu, G.; Ren, T.; Chen, Y.; Che, W. A one-dimensional CNN-LSTM model for epileptic seizure recognition using EEG signal analysis. Front. Neurosci. 2020, 14, 578126. [Google Scholar] [CrossRef]
Wang, X.; Wang, Y.; Liu, D.; Wang, Y.; Wang, Z. Automated recognition of epilepsy from EEG signals using a combining space–time algorithm of CNN-LSTM. Sci. Rep. 2023, 13, 14876. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Zhao, B.; Zhang, S.; Xiao, W. Motor Imagery EEG Recognition Based on Weight-Sharing CNN-LSTM Network. In Proceedings of the 2022 34th Chinese Control and Decision Conference (CCDC), Hefei, China, 15–17 August 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1382–1386. [Google Scholar]
Li, H.; Li, X.; Millán, J.D.R. Noninvasive EEG-Based Intelligent Mobile Robots: A Systematic Review. In IEEE Transactions on Automation Science and Engineering; IEEE: Piscataway, NJ, USA, 2024. [Google Scholar]
Tonin, L.; Beraldo, G.; Tortora, S.; Menegatti, E. ROS-Neuro: An open-source platform for neurorobotics. Front. Neurorobot. 2022, 16, 886050. [Google Scholar] [CrossRef]
Beraldo, G.; Antonello, M.; Cimolato, A.; Menegatti, E.; Tonin, L. Brain-Computer Interface meets ROS: A robotic approach to mentally drive telepresence robots. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 4459–4464. [Google Scholar]
Kraaijkamp, J.J.; Persoon, A.; Aurelian, S.; Bachmann, S.; Cameron, I.D.; Choukou, M.A.; Dockery, F.; Eruslanova, K.; Gordon, A.L.; Grund, S.; et al. eHealth in geriatric rehabilitation: An international survey of the experiences and needs of healthcare professionals. J. Clin. Med. 2023, 12, 4504. [Google Scholar] [CrossRef]
Blanco-Diaz, C.F.; Guerrero-Mendez, C.D.; de Andrade, R.M.; Badue, C.; De Souza, A.F.; Delisle-Rodriguez, D.; Bastos-Filho, T. Decoding lower-limb kinematic parameters during pedaling tasks using deep learning approaches and EEG. Med. Biol. Eng. Comput. 2024, 62, 3763–3779. [Google Scholar] [CrossRef]
Camargo-Vargas, D.; Callejas-Cuervo, M.; Mazzoleni, S. Brain-Computer Interfaces Systems for Upper and Lower Limb Rehabilitation: A Systematic Review. Sensors 2021, 21, 4312. [Google Scholar] [CrossRef]
EEG Electrode Cap Kit Website. Available online: https://shop.openbci.com/products/openbci-eeg-electrocap?srsltid=AfmBOooVHF6ZbmUqNRAamTVDu0Ij6Pz5WRE7juBK9nh71mJi7CgGi-X7 (accessed on 3 October 2024).
Gel Electrode Cap Guide Website. Available online: https://docs.openbci.com/AddOns/Headwear/ElectrodeCap/?_gl=1*l1hxj1*_gcl_au*MTg2MjU1MzgyNy4xNzI1OTU5NTIy*_ga*ODczMTAxNzk3LjE2ODY2NTMzODc.*_ga_HVMLC0ZWWS*MTcyOTY3MTQxOC42LjEuMTcyOTY3MTUxMS4zMS4wLjA (accessed on 3 October 2024).
The OpenBCI GUI Website. Available online: https://docs.openbci.com/Software/OpenBCISoftware/GUIDocs/?_gl=1*dypsg2*_gcl_au*MTg2MjU1MzgyNy4xNzI1OTU5NTIy*_ga*ODczMTxNzk3LjE2ODY2NTMzODc.*_ga_HVMLC0ZWWS*MTcyOTY3MTQxOC42LjEuMTcyOTY3MTQ3Ni4yLjAuMA (accessed on 3 October 2024).
TurtleBot3 Website. Available online: https://emanual.robotis.com/docs/en/platform/turtlebot3/features/ (accessed on 9 October 2024).
Interface L298N DC Motor Driver Module with Arduino Website. Available online: https://lastminuteengineers.com/l298n-dc-stepper-driver-arduino-tutorial/ (accessed on 9 October 2024).
Brunner, C.; Leeb, R.; Müller-Putz, G.; Schlögl, A.; Pfurtscheller, G. BCI Competition 2008–Graz data set A. Inst. Knowl. Discov. (Lab. Brain-Comput. Interfaces) Graz Univ. Technol. 2008, 16, 1–6. [Google Scholar]
Yan, B.; Root, A.J.; Gale, T.; Broman, D.; Kjolstad, F. Scorch: A Library for Sparse Deep Learning. arXiv 2024, arXiv:2405.16883. [Google Scholar]
Xu, G.; Cao, H.; Dong, Y.; Yue, C.; Zou, Y. Stochastic gradient descent with step cosine warm restarts for pathological lymph node image classification via PET/CT images. In Proceedings of the 2020 IEEE 5th International Conference on Signal and Image Processing (ICSIP), Nanjing, China, 23–25 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 490–493. [Google Scholar]
NeuralNet Subclasses for Classification Tasks Website. Available online: https://skorch.readthedocs.io/en/stable/classifier.html (accessed on 9 October 2024).
Install PyTorch Website. Available online: https://pytorch.org/ (accessed on 9 October 2024).

Figure 1. Overall system Architecture.

Figure 2. Model architecture for CNN-LSTM.

Figure 3. Confusion matrix for ASTGCN in the cross-subject evaluation.

Figure 4. Training accuracy for Cross-Subject Strategy: (a) EEGNetv4, ASTGCN, and CNNLSTM (100 epochs); (b) EEGNetv4, and CNNLSTM (300 epochs).

Figure 5. Comparison between Adam and RAdam.

Table 1. Best performance metrics of the evaluated models in within-subject and cross-subject evaluations.

Model	Accuracy (Cross-Subject)	Accuracy (Within-Subject)	F1-Score (Cross-Subject)	F1-Score (Within-Subject)
EEGNetv4	83.9%	93.5%	0.81	0.85
ASTGCN	78.6%	84.2%	0.75	0.80
CNN-LSTM	88.5%	98.7%	0.87	0.89

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ghinoiu, B.; Vlădăreanu, V.; Travediu, A.-M.; Vlădăreanu, L.; Pop, A.; Feng, Y.; Zamfirescu, A. EEG-Based Mobile Robot Control Using Deep Learning and ROS Integration. Technologies 2024, 12, 261. https://doi.org/10.3390/technologies12120261

AMA Style

Ghinoiu B, Vlădăreanu V, Travediu A-M, Vlădăreanu L, Pop A, Feng Y, Zamfirescu A. EEG-Based Mobile Robot Control Using Deep Learning and ROS Integration. Technologies. 2024; 12(12):261. https://doi.org/10.3390/technologies12120261

Chicago/Turabian Style

Ghinoiu, Bianca, Victor Vlădăreanu, Ana-Maria Travediu, Luige Vlădăreanu, Abigail Pop, Yongfei Feng, and Andreea Zamfirescu. 2024. "EEG-Based Mobile Robot Control Using Deep Learning and ROS Integration" Technologies 12, no. 12: 261. https://doi.org/10.3390/technologies12120261

APA Style

Ghinoiu, B., Vlădăreanu, V., Travediu, A. -M., Vlădăreanu, L., Pop, A., Feng, Y., & Zamfirescu, A. (2024). EEG-Based Mobile Robot Control Using Deep Learning and ROS Integration. Technologies, 12(12), 261. https://doi.org/10.3390/technologies12120261

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

EEG-Based Mobile Robot Control Using Deep Learning and ROS Integration

Abstract

1. Introduction

2. Materials and Methods

2.1. System Setup

Connection Setup and Commands Transmission

2.2. EEG Data Collection

2.2.1. Public Dataset for Model Training

2.2.2. Data Acquisition and Pre-Processing

2.3. Deep Learning Model

2.3.1. Initial Models: EEGNetv4 and ASTGCN

2.3.2. Transition to CNN-LSTM Architecture

2.3.3. Training—Validation—Testing

2.4. ROS Integration

2.4.1. Architecture of ROS Integration

2.4.2. Real-Time Control and Feedback

2.5. Performance Metrics and Statistical Analysis

3. Results

3.1. Classification Performance

3.1.1. Within-Subject Evaluation

3.1.2. Cross-Subject Evaluation

3.1.3. Accuracy Trends During Training

3.2. Impact of Hyperparameter Tuning on Model Performance

3.2.1. Learning Rate Selection

3.2.2. Optimizer Comparison: Adam vs. RAdam

3.2.3. Final Model Selection Based on Parameter Tuning

3.3. Latency and Real-Time Response Analysis

3.4. Statistical Significance Analysis Using McNemar’s Test

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI