Open AccessArticle

Efficient Distributed Denial of Service Attack Detection in Internet of Vehicles Using Gini Index Feature Selection and Federated Learning

Muhammad Dilshad

^1,†

Madiha Haider Syed

^1,*,†

and

Semeen Rehman

^2,*

Institute of Information Technology, Quaid-e-Azam University, Islamabad 45320, Pakistan

Institute of Computer Technology, Technical University of Vienna (TU Wien), 1040 Vienna, Austria

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Future Internet 2025, 17(1), 9; https://doi.org/10.3390/fi17010009

Submission received: 4 December 2024 / Revised: 23 December 2024 / Accepted: 26 December 2024 / Published: 1 January 2025

Download

Browse Figures

Figure 1
Internet of Vehicles (IoV) illustrating various communication types. "> Figure 2
IoV network attack detection system. "> Figure 3
Selected important features. "> Figure 4
Distribution of different attack types in the dataset. "> Figure 5
Federated learning and machine learning process flow. "> Figure 6
ROC Curves for Decision Tree, Random Forest, XGBoost, Gradient Boosting, and K-Nearest Neighbors models; comparing classification performance across multiple classes: (a) ROC Curve for Decision Tree Model. (b) ROC Curve for Random Forest Model. (c) ROC Curve for XGBoost Model. (d) ROC Curve for Gradient Boosting Model. (e) ROC Curve for K-Nearest Neighbors Model. "> Figure 7
Confusion matrices for Model (a), Model (b), Model (c), Model (d), and Model (e): (a) Confusion Matrix for Decision Tree Model. (b) Confusion Matrix for Random Forest Model. (c) Confusion Matrix for XGBoost Model. (d) Confusion Matrix for Gradient Boosting Model. (e) Confusion Matrix for K-Nearest Neighbors Model. In each confusion matrix, color intensity shows prediction frequency, with darker shades indicating higher values and lighter shades showing lower values, helping to spot misclassifications. "> Figure 8
Running time of models under different scenarios. "> Figure 9
Memory usage of models under different scenarios. ">

Versions Notes

Abstract

Considering that smart vehicles are becoming interconnected through the Internet of Vehicles, cybersecurity threats like Distributed Denial of Service (DDoS) attacks pose a great challenge. Detection methods currently face challenges due to the complex and enormous amounts of data inherent in IoV systems. This paper presents a new approach toward improving DDoS attack detection by using the Gini index in feature selection and Federated Learning during model training. The Gini index assists in filtering out important features, hence simplifying the models for higher accuracy. FL enables decentralized training across many devices while preserving privacy and allowing scalability. The results show that the case for this approach is in detecting DDoS attacks, bringing out data confidentiality, and reducing computational load. As noted in this paper, the average accuracy of the models is 91%. Moreover, different types of DDoS attacks were identified by employing our proposed technique. Precisions achieved are as follows: DrDoS_DNS: 28.65%, DrDoS_SNMP: 28.94%, DrDoS_UDP: 9.20%, and NetBIOS: 20.61%. In this research, we foresee the potential for harvesting from integrating advanced feature selection with FL so that IoV systems can meet modern cybersecurity requirements. It also provides a robust and efficient solution for the future automotive industry. By carefully selecting only the most important data features and decentralizing the model training to devices, we reduce both time and memory usage. This makes the system much faster and lighter on resources, making it perfect for real-time IoV applications. Our approach is both effective and efficient for detecting DDoS attacks in IoV environments.

Keywords:

DDoS attack detection; Internet of Vehicles (IoV); Gini index; feature selection; Federated Learning (FL); cybersecurity; smart vehicles

1. Introduction

The automobile industry is becoming more inclined toward launching smart vehicles integrated with AI, automation, and the Internet of Vehicles. Thus, these new waves of technological development would promote safety and traffic management and also upgrade the general practical experience of vehicle operation in a significant way. But all these prove troublesome from a security perspective as connectivity increases the threat of cyber attacks, with the greatest threats being in the form of a Distributed Denial of Service (DDoS) attack [1,2]. Although the Internet of Vehicles (IoVs) also offers an interactive environment to vehicles, humans, and roadside infrastructures, serious cybersecurity drawbacks come with these improvements. One DDoS threat, a form of cyberspace war, is vertically oriented and centralized. Typical communication modes include vehicle-to-vehicle (V2V) for facilitating cars while they communicate with other cars through communication infrastructure, or the Internet. This is also known as the Internet of Vehicles (IoV). Although the Internet of Vehicles (IoVs) improves mobility by integrating communication between vehicles, people, and infrastructure, many security aspects must be addressed. One of the most serious non-malware threats is Distributed Denial of Service (DDoS) attacks as they can bring down communication networks and create danger to public safety. These attacks specifically target key connections such as the following: Vehicle to Infrastructure (V2I), which refers to equipment like traffic lights and toll booths, which, in turn, facilitate the interconnection between vehicles and road infrastructure; Vehicle to Pedestrian (V2P), which is designed to make it safe for interaction between vehicles and pedestrians; Vehicle to Cloud (V2C), where vehicles are connected to the cloud for data processing and vital updates; and Vehicle to Sensor (V2S), [3] where vehicles use various sensors as supplements to their core functionality, including the use of sensor fusion as depicted in Figure 1. In this study, we focus on the detection of DDoS attacks in the IoV ecosystem, which target the IP and UDP protocols. These types of attacks occur frequently in IoV systems because they have the potential capability for overloading the communication channel in vehicle-to-vehicle and vehicle-to-infrastructure networks. In flooding these channels with a large number of packets, the real-time data exchanges may be disrupted due to IP/UDP-based DDoS attacks, resulting in deteriorating network performance and vehicle safety.

The integration of artificial intelligence and IoV technologies has enabled vehicles to operate autonomously, diagnose issues in real-time, and communicate with infrastructure. Self-driving abilities allow these vehicles to achieve some of these tasks with minimal assistance from human beings [4], which are features that improve usability among transport consumers. In addition, the IoV realizes the interconnectivity of vehicles with other subsystems, including traffic signals and road sensors, which form this complex transportation system. This network enhances effective communication and data sharing in real-time, hence increasing efficiency in traffic control and reducing traffic accidents [5]. At the same time, the very connectivity that offers these opportunities attracts new threats as well. Such threats, in this instance, are the DDoS attacks. In a DDoS attack, a group of systems are used to engage a target network with more traffic than it can handle. In the case of smart vehicles and IoV systems, this implies that it is easy for attackers to interrupt and delay the typical operation of vehicle communication networks [6]. A prominent example of such a large-scale DDoS-based attack is the 2016 Dyn cyberattack, which targeted a major DNS provider and resulted in widespread outages across key services. While this attack was not specifically targeted at IoV systems, it underlines the vulnerability of interconnected networks to high-volume DDoS attacks. This could potentially disrupt the Internet of Vehicles when vehicle-to-infrastructure (V2I) or vehicle-to-vehicle (V2V) networks become jammed, hence compromising safety-critical functions and communications within smart transportation systems [7,8]. The impacts of a successful DDoS attack on the IoV are the following: There are monetary risks involved because of the service disruptions and the necessity of response and recovery services [9]. The reputation of manufacturers and service providers could be impacted, mainly due to crises in consumer confidence in smart vehicle systems. Furthermore, interruption of service can have quick and perhaps lethal consequences; for instance, it can slow down emergency response vehicles or trigger traffic mishaps arising from failed communication gadgets [10]. The effective upgrade in mobility and automotives can be brought by the integration of AI, automation, and IoV technologies; however, enhanced security measures are required to address threats such as DDoS attacks [11]. As connectivity and data exchange are becoming critical in smart vehicles, the potential risks and attacks must be detected and mitigated to ensure future transport systems’ security and functionality.

DDoS attacks are a major threat to IoV, which is the interconnection of vehicles, devices, and infrastructure for reducing the rate of accidents, increasing efficiency, and improving the transport experience [12]. While the cascaded anomaly detection systems allow for the reduction of data transmission to the central systems by only forwarding essential data, they would still internally require a certain degree of central processing, which might be hard for scalability and privacy. On the other hand, our approach with FL fully decentralizes the detection process because model training in IoV devices happens right on the devices. It also further minimizes data transmission requirements, enforces scalability, and preserves the privacy of data, which are crucial in the distributed and data-sensitive environment present in the Internet of Vehicles [13]. Considering the intricacy and the size of the IoV networks, the traditional detection methods often fail to mitigate the impact of DDoS threats. Accordingly, this paper proposes a new way of doing things by combining Gini index-based feature selection with Federated Learning (FL), which is scalable and protects privacy when detecting DDoS attacks. Moreover, while the number of connected vehicles is on the rise, DDoS attacks are equally on the rise, as well as the consequences of such attacks. This is because centralized systems become overloaded with massive amounts of data and cannot grow and change to catch up with the nature of IoV networks, which are distributed and complex. This calls for new methods that would enable the detection methods to operate faster and protect data against any suspicious endorsements [14]. Such issues have to be addressed so that the adoption of smart vehicle technologies will be welcomed by the public and that some operational safe transport systems will be relied upon in the future [15].

To address these challenges, this research offers a new technique that uses A Gini index as a special approach to feature selection for enabling the application of Federated Learning FL to enhance the DDoS attack detection mechanism in IoV systems [16]. Feature selection with the Gini index is employed on the features of a given dataset to lessen the dimensionality of a particular feature space when machine learning systems are designed. When it comes to FL, it is possible to have a model running on multiple devices, which obtains and learns new data while preserving privacy and is able to scale [17,18]. This one is focused on boosting the accuracy of detection while at the same time making the process efficient and at the same time secure in a way that the information being searched will not be altered or the privacy of certain patients compromised. The main objectives of this research are as follows:

Develop an effective feature selection method that uses the Gini index to determine which features are most important for detecting DDoS attacks.
Implement Federated Learning to enable privacy-preserving, scalable, and adaptive model training in IoV environments.
Evaluate the performance of the suggested approach in terms of accuracy, privacy preservation, and computational efficiency.
Compare the effectiveness of the proposed method against traditional centralized methods and other machine learning approaches to highlight the improvements in detection capabilities and operational efficiency.

While numerous studies have addressed DDoS attacks within the Internet of Vehicles (IoV) landscape, there remains a need for integrating advanced feature selection methods with privacy-preserving learning frameworks. The following literature review provides an overview of recent developments in the field and identifies the gaps that our proposed methodology seeks to fill.

DDoS attack detection operations in an IoV network are shown in Figure 2. It begins with an attacker using a botnet to generate infected traffic in the network for it to be assaulted. The key elements of the system can be classified into vehicles, roadside units (RSUs), and cloud servers, where each of them performs functions of communication and data processing. Within this network, the detection system comprises a traffic monitor, detection algorithm, and response mechanism. This traffic monitor is meant to track the network activity while the detection is being performed on the data for the DDoS attacks. This response mechanism is utilized to counter the attacks once they are detected. In this way, the IoV can be effectively guarded against different kinds of threats, which might be used against the network.

The organization of the paper goes as follows: In Section 2, a critical understanding of previous work in the field is presented, which includes pieces from the literature related to the subject of the thesis; Section 3 justifies the way selected features are transmitted and explains Federated Learning. The results achieved in the experiments are presented in Section 4, and conclusions are drawn in Section 5 of the paper. Finally, Section 6 is the conclusion of the paper with all the recommendations made, which can be attributed to the author. Looking to solve these problems concerning Cybersecurity, this study aims at fortifying the IoV systems as well as bringing up advanced smart transportation.

2. Literature Review

The cybersecurity field is moving quickly to identify DDoS assaults in the Internet of Vehicles (IoV). As transportation networks become increasingly interconnected, they face greater vulnerability to cyber-attacks, necessitating the development of robust protective measures. With the surge in communication between cars and infrastructure, the risk of DDoS attacks intensifies, underscoring the need for effective defense strategies. Recent advancements and methodologies in detecting DDoS attacks within IoV systems showcase significant progress and innovations designed to safeguard our transportation networks. FLAD is the freshly designed adaptive federated learning algorithm that aims to solve the issues with the FedAvg algorithm that is improper for partly balanced and non-i partial successor databases; i.d. (data that are identically distributed but not independent). FLAD adaptively controls the training process because it allocates more resources to clients with more difficult-to-train attack signatures [19] while reducing convergence time and increasing recognition accuracy. This approach is more suitable for distinguishing a DDoS attack which is characterized by variability in the traffic pattern and needs constant updates to cover new attack signatures, as depicted by Dietrich [20]. Some current applied researchers have incorporated FL updates with various cybersecurity applications, with more emphasis placed on the increase in IDS detection accuracy and security. For example, a federated learning-based NIDS applied feature selection procedures to boost the detection efficiency. This method uses a greedy algorithm to choose the best features to increase the ability to identify various types of attacks in the network [21,22]. The federated framework categorizes the edge devices according to the type of attack they want to prevent and hence provides models to different areas of the globe that are tailored to attend to certain attack types.

Table 1 provides an overview of various datasets used to evaluate security measures across IoT, IIoT, and IoV environments. The table details each dataset’s features, ML techniques, and testbed configurations while highlighting threat scenarios such as DDoS and malware attacks. It also categorizes datasets based on learning approach and traffic type, offering insights into their applicability in different contexts.

The Internet of Vehicles brings all these new challenges to cybersecurity, particularly the mechanism of defense against DDoS attacks, which would paralyze both V2V and V2I communications. Though general IoT datasets, such as CIC-IDS-2017, provide some insight into network intrusion detection, IoV might not capture the particular communication protocols, network architectures, and mobility patterns. Hence, all the solutions described in this section are those that can be applied directly in an IoV context [23].

The latest trends in IoV security target the FL and new approaches in DDoS attack detection. The federated learning itself lets decentralized devices learn together with others, while the federated distillation assures that the data remain within the devices. Feature selection, when applied to the datasets ISCXIDS2012 and CIC-IDS2017, for example, has improved the performance of the computational learning models for cyber-attack detection, rehabilitation, and eruption prevention strategies about interception and rationalization as in [24,25]. They further pointed out how such FL-based techniques are better off in terms of accuracy than the old-fashioned ways, a need of the researcher who came up with an LSTM autoencoder integrated into an IDS within ITS. These improvements or upgrades see to the general effectiveness and operational performance of the overall detection system, which concerns computational and data privacy, to address the feature protection challenges brought about by an increased dynamic maturation of IoV. More formally, it is believed that this idea originated at Google within the scope of the Federated Learning Framework; it has its advantages, which include the feasibility of shifting some computational power to the edge of the network, where data do not leave the device [26]. Their efforts laid the foundation for the adoption of FL in subsequent domains, including the IoV. In Anomaly Detection in In-Vehicle Networks, the authors proposed a self-supervised anomaly detection for the in-vehicle networks with the help of federated learning. This work illustrates how FL can be an insufficient resource efficiency achievement without violating vehicle data privacy [27].

With the rapid increase in the number of devices that can be connected to a network, and the threats that this poses, security for IoT and IIoT systems has never been more critical. Intrusion detection has been the most researched area in such environments and many datasets and methods have been developed to this effect [28]. These days, however, one of the most favorable lines of approach is federated learning as data are not transmitted, thus ensuring privacy and security. The paper has presented Edge-IIoTset which is a novel dataset about cyber security of IoT and IIoT applications. This dataset can be utilized not only in training and assessing the models through Classical Centralized Learning but also in training and assessing in a Federated Learning environment. It was created using a special IoT/IIoT testbed comprising devices, sensors, protocols, and clouds/edges. It contains 61 derived attributes extracted from the set of 1176 origin attributes, which proves to be useful in the construction of a machine learning-based Intrusion Detection System (IDS) [29]. This dataset was evaluated using different machine learning algorithms, including Decision Tree, Random Forest, Support vector Machine, K-Nearest Neighbor, and Deep Neural Network. Another limitation is related to the scope of the Edge-IIoTset dataset and, as the authors mentioned, when applying big data solutions, computational requirements can be very high. However, due to its extensive planning scope and the integration of federated learning, it is recommended to use it for the development and testing of IoT/IIoT security solutions [30]. The Federated TON_IoT Dataset is the second major contribution that is the focus of the paper under review. Within the UNSW Canberra’s IoT Research Lab, a dataset has been created to evaluate the security of IoT/IIoT systems through federated learning. The dataset reflection includes IoT-services telemetry, operating systems, and network traffic as an example and comprises nine attack types, including DoS/DDoS, scanning, ransomware, backdoor, injection, and Cross. About the features, there are fifty of them, and they were selected since they were deemed to be dependent on one another. This database is wide enough, but it seems that the evaluation of intrusions is not quite exhaustive by utilizing various machine learning techniques to support [31,32] this dataset. However, the Federated TON_IoT dataset is still more useful in the development of federated learning for the mentioned approach within the context of IoT/IIoT systems [33].

The present work has suggested a system to detect the spoofing of location information in the IoV through an efficient Ens. RF classifier. This was performed on the VeReMi dataset, which has four types of location falsification attacks and benign traffic. Features were lowered using Pearson’s Correlation Coefficient to the available number of features. Finally, the proposed model obtains an accuracy of around 99%, which demonstrates its effectiveness, shows an accuracy of 92% without the use of Federated Learning (FL), and outperforms other models in real-time identification of forged Basic Safety Messages (BSM).An IDS based on DBN and RF was presented and the results on datasets showed the effectiveness of the proposed solution. The IDS was tested using the CICIDS2017 and NSL-KDD datasets, for which RF with 30 and 24 features was chosen, respectively. Inketing incorporates over-the-air updates of attack signatures and attains high detection ratios for internal and external communications of vehicles without the use of Federated Learning (FL) [34,35]. Exploitation of several instances of machine learning models with FL for protection of the IIoT environments has been explained. The study uses the geolocation X-IIoTID dataset, which includes 68 features; the model is reduced to the use of 41 of them. In this way, the FL approach proves how effective FL is in boosting the security of data belonging to the IIoT devices as the learning process is distributed across various nodes, and privacy is still upheld [36]. This paper describes a framework that will be used in identifying jamming attacks using the Gradient boosting model coupled with federated learning. By using the custom-collected dataset in the study, the features with the top k importance scores can be chosen. With the help of FL, privacy is maintained in addition to the collaborative analysis of the traffic in the nodes, which leads to the enhanced and accurate detection of jamming attacks in traffic [37]. A two-phase Deep Learning Model for Anomaly Detection in IoT Network Traffic using the IoT-23 dataset was presented. The model solely concentrates on the construction of an efficient AD with no FL and includes the inclusion of crucial features in the algorithm to improve the recognition of IoT cyberattacks [38].

A present approach adopted CNN and LSTM networks as the basis for the hybrid IoT traffic anomalousness model. The model uses the IoT-23 dataset and incorporates spatial and temporal characteristics to detect various types of attacks, and in certain scenarios, it applies FL and increases the detection rates compared to a model without FL [39]. Interfering with vehicular networks can be very disruptive, as it may disrupt the reception of safety messages or may consume the available amount of bandwidth, destroying people’s lives. The absence of proper security facilities to protect these vehicles might lead to serious shocks in the cities, and, for example, stopping a part of cars in heavy traffic may lead to total confusion. The espionage, injection, bus-off, and DoS attacks all target the failure of the Engine Control Unit ECU. These threats have been studied by various researchers, and solutions for security have been proposed to address these threats [40,41]. In this case, intrusion detection systems (IDS) come in handy as they help in monitoring and hiding malevolent acts within vehicular networks. There are three types of IDS-subsumed network-based (NIDS), host-based (HIDS), and combined. NIDS continues tracking the traffic of the telecommunication network about the equipment as HIDS monitors the equipment for any deviations; and last, hybrid is the junction of the first two. Likewise, parametric machine learning systems, as well as deep networks, are utilized in enhancing the IDS via learning features from the datasets or imitating the real strings of the network against which the data are fitting [42,43]. Some researchers have incorporated the usage of ML and DL methods to protect vehicular networks. For example, an IDS for connected vehicles in smart cities employs DBN- DT to distinguish and categorize anomalies, as shown in [44]. Datasets for vehicular networks have been created to implement DoS and fabrication attacks and hierarchical IDS based on ML algorithms to classify malicious behaviors [45].

The utilization of 5G technology with vehicular networks is still at the stage of maturity. Its primary aim is to enhance the usability and performance of the network and functional features needed for vehicular safety applications, that is, low delay and high data rate. In the present work, the proposed intrusion detection system (IDS) makes use of 5G technology for the prevention of flooding attacks in vehicular networks. Moreover, 5G technology facilitates vehicle coordination with measures for quick detection of possible cyber threats like DDoS [46].

The advancements in DDoS attack detection within Internet of Vehicles (IoV) systems highlight significant progress in both detection accuracy and computational efficiency. The application of Federated Learning (FL), particularly through innovations such as the FLAD algorithm, has proven effective in managing non-IID data and adapting to varying attack signatures. When combined with advanced feature selection techniques like the Gini index, these approaches offer substantial improvements in refining detection models. The CICDDOS2019 dataset, which includes 25 pivotal security features, is crucial in addressing the specific challenges associated with DDoS attacks in interconnected vehicular networks. This paper aims to leverage these advancements by investigating the efficacy of the Gini index feature selection method and federated learning in enhancing DDoS attack detection within IoV environments. This contribution is expected to advance the development of more robust and secure transportation networks. The insights derived from prior research underscore the importance of both feature selection and decentralized learning in enhancing DDoS detection. This study leverages the Gini Index for feature selection and Federated Learning to address the limitations in existing DDoS detection methods to enhance detection accuracy, scalability, and privacy preservation in IoV environments. DDoS attacks in IoV can take many forms, from simple flooding to sophisticated methods using emulation dictionaries, which make detection challenging by mimicking legitimate traffic patterns. Our approach aims to address these challenges by incorporating advanced feature selection and decentralized learning, with future enhancements planned to cover more sophisticated attack types.

3. Methodology

3.1. Dataset Description

This study uses a publicly available dataset CICDDOS2019 [47] comprising network attack data. The dataset has a total of 88 features. It attempts to capture network communication by including details such as source and destination IP addresses, port numbers, type of protocol, time stamps, length of a flow, length of packets, and a few statistical metrics related to inter-arrival times and sizes of packets. These features provide a detailed view of network traffic, allowing us to identify unusual patterns that could indicate DDoS attacks. The CICDDoS2019 dataset, although not originally developed for the Internet of Vehicles (IoV) context, has recently been widely used in various IoV-related works to simulate and analyze different DDoS attack scenarios. This dataset includes a broad range of DDoS attack patterns, including IP/UDP-based traffic, which is very similar to IoV communications such as V2V and V2I systems. This dataset has been used in studies [45,46] that investigate the effects of DDoS attacks on IoV networks and, therefore, prove useful for IoV-specific issues. The various steps used to make data suitable for model training include the following:

Data Preprocessing: First, we cleaned the dataset for model training, handled missing values, and transformed categorical variables. Mean imputation was applied to numerical features, while mode imputation was performed on categorical features. One-hot encoding was used for features with a small number of categories, whereas features with a high number of categories were encoded using a label encoder to ensure the data were complete and ready for training.
Feature Importance Calculation: We used a Random Forest classifier to calculate the Gini importance scores of all features. Random Forest creates many decision trees and then averages their results for better accuracy and to avoid fitting.
Feature Selection: We picked the top 25 features with the highest Gini importance score, reducing model complexity and retaining only those that are informative.
The following formula is used to obtain the Gini index for a feature X:

$G i n i (X) = 1 - \sum_{i = 1}^{n} p_{i}^{2}$

(1)

where $p_{i}$ is the probability of class i in the dataset.

Table 2 tells the list of 25 key features and their data types used in our analysis. These features are very useful in comprehending and fully analyzing the network traffic. The data types used are integers (int64), floating-point numbers (float64), and object types. Each feature, like the source and destination ports, is very useful in aiding our study.

As shown in Figure 3, features like “Source Bytes” and “Destination Bytes” have the highest Gini scores, which highlight their importance for classification. These key features capture transmission patterns, simplify decision boundaries, and improve model performance.

Table 3 presents the attack statistics used in our study. More specifically, it details the different kinds of attacks encountered within the dataset, their total number of samples, and how these are split between the training and test sets. This information forms a critical background to understanding data distribution used for both training and testing our machine-learning models. The table indicates the prevalence of each attack type and guarantees that both training and test sets represent the real world.

Figure 4 presents the distribution of different attack types in the dataset, showcasing the variety of DDoS attacks present and the prevalence of each type.

3.2. Machine Learning Models

For our framework, we applied several machine learning models, including Decision Tree, Random Forest, XGBoost Classifier, Gradient Boosting, and KNeighborsClassifier, to all datasets, including the refined Silver and explanatory feature selection, in order to recognize DDoS attacks. Glia turns out andre dormitar pan. Each of these models offers unique benefits which enhance the success of the federated learning model. To achieve this purpose, each model is explained, and the relevant mathematical expressions are presented as follows:

3.2.1. Decision Tree

A decision tree model infers the value of a target variable based on learning simple decision rules from data features. It has an interpretable structure that works with any combination of categorical and continuous data, which makes it foundational for more complex ensemble methods. This model splits the data based on feature values to make predictions. It is simple and easy to interpret.

{\hat{y}}_{Decision Tree} (x; M) = Tree (M, x)

(2)

where, M represents the trained decision tree model, and x denotes the input features.

The decision tree algorithm uses a recursive partitioning method to divide data into subsets depending on the value of the best feature chosen by criterion, such as Gini impurity or Information Gain.

I G (T, X) = Entropy (T) - \sum_{v \in Values (X)} \frac{| T_{v} |}{| T |} Entropy (T_{v})

(3)

where, Information Gain (IG) is defined by the equations as the decrease in uncertainty (entropy) that results from splitting the dataset T based on a feature X. Entropy quantifies the degree of disorder or impurity present in the dataset.

Entropy (T) = - \sum_{i = 1}^{C} p_{i} {log}_{2} (p_{i})

(4)

T is the current dataset, and

T_{v}

is the subset for value v of feature X.

3.2.2. Random Forest

The Random Forest algorithm generates a number of decision trees and groups the predictions from each decision tree in order to achieve accuracy and stability in the performance of results. This would lead to generalization and, therefore, to the mitigation of overfitting the random forest model is explained by the equation below:

{\hat{y}}_{Random Forest} (x; {M_{i}}) = \frac{1}{N} \sum_{i = 1}^{N} Tree (M_{i}, x)

(5)

where

{M_{i}}

denotes the ensemble of decision trees and N is the number of trees.

Random Forest aggregates the results of multiple Decision Trees using Bootstrap Aggregating (bagging) and majority voting for classification or averaging for regression.

\hat{y} = mode ({Decision {Tree}_{t} (x)}_{t = 1}^{T})

(6)

\hat{y} = \frac{1}{T} \sum_{t = 1}^{T} Decision {Tree}_{t} (x)

(7)

where

Decision {Tree}_{t} (x)

is the prediction of the t-th tree.

3.2.3. XGBoost Classifier

XGBoost is an advanced implementation of gradient boosting, optimized for performance and efficiency. It employs parallel processing, tree pruning, and regularization techniques to enhance predictive power and prevent overfitting. The given equation describes the XGBoost classifier:

{\hat{y}}_{XGBoost} (x; {f_{i}}) = \sum_{i = 1}^{M} f_{i} (x)

(8)

where

{f_{i}}

represents the sequence of weak learners, and M is the number of boosting rounds.

XGBoost uses gradient boosting to iteratively add models that minimize a loss function.

L (ϕ) = \sum_{i} l (y_{i}, {\hat{y}}_{i}) + \sum_{k} Ω ({XGBoost}_{k})

(9)

where l is the loss function, and

Ω ({XGBoost}_{k})

is the regularization term.

{\hat{y}}^{(t)} = {\hat{y}}^{(t - 1)} + η {XGBoost}_{t} (x)

(10)

where

η

is the learning rate, and

{XGBoost}_{t}

is the t-th tree.

3.2.4. Gradient Boosting

Gradient Boosting builds sequential models that correct the mistakes of previous ones. It is one of the most accurate algorithms used today in making predictions, and it can handle various data types. The given equation describes the Gradient Boosting:

{\hat{y}}_{Gradient Boosting} (x; {h_{i}}) = \sum_{i = 1}^{M} h_{i} (x)

(11)

where

{h_{i}}

denotes the ensemble of weak learners and M is the number of boosting iterations.

Gradient Boosting minimizes a loss function by adding weak learners in a stage-wise manner.

{\hat{y}}_{i}^{(m)} = {\hat{y}}_{i}^{(m - 1)} + ν Gradient {Boosting}_{m} (x_{i})

(12)

where

ν

is the learning rate, and

Gradient {Boosting}_{m}

is the m-th weak learner.

L (y, \hat{y}) = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i})

(13)

where l is the loss function (e.g., squared error for regression).

3.2.5. KNeighborsClassifier

KNeighborsClassifier is an instance-based learning algorithm where predictions are made based on the k-nearest neighbors in the training dataset. It is simple and effective for small datasets with clear patterns. The given equation describes the KNeighborsClassifier:

{\hat{y}}_{KNeighbors} (x; D) = arg max_{y} \sum_{i \in N_{k} (x, D)} I (y_{i} = y)

(14)

where D is the training dataset,

N_{k} (x, D)

denotes the k nearest neighbors of x in D, and

I (\cdot)

is the indicator function.

The prediction is made by a majority vote for classification or averaging for regression among the k-nearest neighbors.

d (x, x^{'}) = \sqrt{\sum_{i = 1}^{n} {(x_{i} - x_{i}^{'})}^{2}}

(15)

\hat{y} = mode ({y_{i} | x_{i} \in N_{k} (x)})

(16)

where

N_{k} (x)

is the set of k-nearest neighbors.

\hat{y} = \frac{1}{k} \sum_{i \in N_{k} (x)} y_{i}

(17)

We evaluated each model using metrics such as accuracy, precision, recall, and F1-score to ensure a comprehensive assessment of their performance in detecting DDoS attacks. Here are the formulas for these metrics:

\begin{matrix} Accuracy & = \frac{T P + T N}{T P + T N + F P + F N} \end{matrix}

(18)

\begin{matrix} Precision & = \frac{T P}{T P + F P} \end{matrix}

(19)

\begin{matrix} Recall & = \frac{T P}{T P + F N} \end{matrix}

(20)

\begin{matrix} F 1 - Score & = \frac{2 \times Precision \times Recall}{Precision + Recall} \end{matrix}

(21)

where

T P

is true positives,

T N

is true negatives,

F P

is false positives, and

F N

is false negatives.

These models collectively enhance the federated learning system’s capability to learn from decentralized data while maintaining privacy and improving predictive performance.

3.3. Federated Learning Approach

To address privacy concerns and enhance scalability, we implemented Federated Learning (FL). FL allows decentralized model training across multiple devices, ensuring that raw data remains local and is not shared with a central server. This approach preserves data privacy while leveraging the computational power of distributed devices. The Federated Learning process involves:

Initialization:
- Start with a global model $M_{0}$ .
Local Training:
- Each device i trains the model on its local data $D_{i}$ :
  
  $M_{i}^{'} = Train (M_{i - 1}, D_{i})$
  
  where $M_{i - 1}$ is the model before training on device i, and $M_{i}^{'}$ is the updated model after training.
For local model training, the distribution of data samples among three clients can significantly impact the convergence rate and the model’s generalization ability. Each client receives an equal number of samples, totaling 20,136 per client. This balanced distribution ensures that each client has an identical amount of data for training and evaluation purposes in the federated learning setup.
Model Aggregation:
- Aggregate local model updates using Federated Averaging (FedAvg):
  
  $M_{global} = \sum_{i} \frac{| D_{i} |}{\sum_{j} | D_{j} |} M_{i}^{'}$
  
  where $| D_{i} |$ is the size of the dataset on device i, and $\sum_{j} | D_{j} |$ is the total size of all datasets across devices.
Iteration:
- Repeat steps 2 and 3 until convergence criteria are met.

The Federated Averaging (FedAvg) algorithm is given by the following:

w_{t + 1} = \sum_{k = 1}^{K} \frac{n_{k}}{n} w_{t}^{k}

(22)

where

w_{t + 1}

is the updated global model

w_{t}^{k}

is the local model from device k,

n_{k}

is the number of data points on device k, and n is the total number of data points across all devices.

Figure 5 presents the flow of the process regarding federated learning and machine learning in our proposed process. The first step of this process is the initialization of the global model. After this, the model is sent to relevant devices to be trained further. The components of the respective trainable model held on every device are sent back for central processing once the model has been trained locally using the data at the device. These updates are then taken to a central server and combined to make a new and better global model. This process continues until the desired level of performance is achieved by the global model.

Under Federated Learning, sensitive data are maintained only on local devices, breaking the chain of data breaches. FL further scales by offloading computational load across several devices, which makes it feasible to train models on large and diverse datasets typical of IoV environments. In such a decentralized approach, adaptive learning takes place while guaranteeing that the global model incorporates, in a continuous manner, updates from local devices so that it remains current, facing new traffic patterns and emerging threats. This methodology combines Gini index-based feature selection and federated learning in this paper so that efficient, scalable, and privacy-preserving models can be established for the detection of DDoS attacks in IoV systems. In this regard, considering these challenges in terms of IoV data volume and complexity, the present approach puts forward a robust framework for enhancing smart transportation network security.

This study uses the CICDDoS2019 (Kaggle: https://www.kaggle.com/datasets/dhoogla/cicddos2019 accessed on 25 September 2023), which includes the network traffic information for IoV systems. Armed with the methodology, it now falls upon us to test the validity of the approach in several experiments. Details of the experiment are presented in the next section, including the setup and the results obtained, which present the model’s performance on various metrics.

4. Experiments and Results

4.1. Experimental Setup

In this section, we executed our experiments on a high-performance computing cluster equipped with Intel® Core™ i5-1035G1 processors and 16 GB of RAM, running over 512 GB SSD storage. These machines are networked through a 10 Gbps Ethernet network for fast node communication. For the software environment, Google Colab, Python 3.10, supported libraries such as Scikit-learn and TensorFlow Federated MPI and were used for distributed processing. This configuration considerably supported the tasks of machine learning, federated learning, and data pre-processing. Each component was well prepared with the required computation power so that hardware limitations did not affect our results. Python was chosen as our main programming language due to its extensive library for both machine learning and data analysis. Concretely, we have used Scikit-learn to implement models like Decision Trees, Random Forests, Gradient Boosting, and KNeighborsClassifier, while for efficient classification tasks, XGBoost has been used. For Federated Learning, which allows training neural networks in a decentralized way while preserving privacy, we used TensorFlow Federated. Finally, we have also employed Google Colab to continue some of the development and testing. To comprehensively assess model performance, we used several metrics:

Accuracy: Ratio of the number of correctly classified instances to the number of instances in total.
Precision: Proportion of true positive predictions out of the total number of positive predictions.
Recall: Proportion of true positive predictions out of the total number of positive instances.
F1-score: A number that considers both precision and recall as if they were orthogonal.
Classification Report: Helps derive model evaluation metrics such as precision, recall, F1-score, exhorting all have support (the number of occurrences of actual class) for each model.

Scalability tests simulated increasing numbers of vehicles from 100 to 10,000, with minimal performance degradation. Our approach also addresses real-world constraints like latency and bandwidth limitations, and the feature reduction minimizes energy consumption, making this system viable for large-scale IoV deployments.

4.2. Hyperparameter Tuning

In the experiments, we adopt the default hyperparameters for scikit-learn and XGBoost to build a robust baseline of various machine-learned models for detecting DDoS attacks in IoV systems. These default configurations were selected to maintain coherence and fairness among these models, with which we compare their base functionality without the added complexity provided by ample tuning. For instance, the depth of the decision Tree was not limited and used the Gini criterion for splitting. By the same token, Random Forest utilized 100 estimators with the Gini criterion. By default, XGBoost had a learning rate of 0.3 and also had 100 estimators, while the Gradient Boosting, with a learning rate of 0.1 and 100 estimators, was chosen in order to trade-off between model complexity and computational efficiency. By default, we have also used five neighbors for the K-Nearest Neighbors model with a distance metric of Minkowski. Though these defaults provide a fair starting point, we are aware that hyperparameter optimization is a necessary investment to increase model performance and efficiency.

4.3. Evaluation

We conducted a performance evaluation of several machine learning models over all available features and afterward performed the Gini index for feature selection to limit the number of features to 25. We conducted model retraining and model evaluation on these selected features.

The accuracy of the system Acc is mathematically defined as the proportion of correctly predicted instances to the total instances:

Accuracy = \frac{\sum_{i = 1}^{N} I ({\hat{y}}_{i} = y_{i})}{N}

(23)

where N is the total number of instances,

{\hat{y}}_{i}

is the predicted label for instance i, and

y_{i}

is the true label. The indicator function

I (\cdot)

equals 1 if the condition inside is true and 0 otherwise.

The Gini index

Gini (S)

for feature selection is calculated as follows:

Gini (S) = 1 - \sum_{j = 1}^{m} {(\frac{| S_{j} |}{| S |})}^{2}

(24)

where S is the set of features,

S_{j}

is the subset of S that contains feature j,

| S |

is the total number of instances in S, and m is the number of features.

We evaluated the performance of machine learning models using all features and the selected features obtained through the Gini index-based feature selection. The results are summarized in Table 4.

The results in Table 4 illustrate that feature selection using the Gini index, which reduced the feature set from 83 to 25, either maintained or enhanced model performance. Specifically, the Decision Tree and Random Forest classifiers showed slight improvements in accuracy, increasing from 0.92 to 0.93.

The XGBoost Classifier and Gradient Boosting preserved their high accuracy of 0.94 and 0.93, respectively, with the reduced feature set. The KNeighborsClassifier’s accuracy remained unchanged at 0.85. These results indicate that the feature selection process effectively simplified the models without compromising their accuracy.

Table 5 highlights an extensive comparison of the classification capabilities of different models with the combination of two methodologies, i.e., taking only 25 features and employing federated learning techniques. The performance measures treated include accuracy, macroaverage Average Precision, Macro Average Recall, Macro Average F1 score, Weighted Average Precision, Weighted Average Recall, and Weighted Average F1 score. The collective results show that the models perform quite well under the two approaches with high levels of performance, where XGBoost and Random Forest achieve the best accuracy. Particularly, the use of federated learning leads to slightly lower values of performance measures than the case using selected features. This is notwithstanding the amount of data privacy and decentralization aspects offered by federated learning, which are very important in many real applications. The Gini index-based feature selection reduced the feature set from 88 to 25, resulting in approximately 30% less computational overhead without compromising accuracy. This reduction, applied to a dataset of 75,510 samples, ensures that real-time DDoS detection on edge devices is feasible, with faster response times and reduced resource usage.

As shown in Figure 6, the Receiver Operating Characteristic (ROC) curves for each model in this study offer a comprehensive evaluation of their classification performance. By analyzing the Area Under the Curve (AUC) scores, we gain valuable insights into how effectively each model identifies patterns within the dataset. Higher AUC scores indicate better model performance, suggesting a greater ability to distinguish between classes accurately. In Figure 6a, we see the Decision Tree model, which exhibits moderate performance as indicated by its AUC score. This suggests that while it can identify some patterns, it may not be very reliable on its own. Moving on to Figure 6b, the Random Forest model stands out with improved performance. Thanks to its ensemble approach, which combines the strengths of multiple decision trees, it achieves a higher AUC score and demonstrates better generalization. This indicates that it is more robust and can adapt more effectively to new data. Figure 6c showcases the XGBoost model, which shines with the highest AUC score among all models. This impressive performance is attributed to its gradient boosting technique, which optimizes predictions by focusing on errors from previous iterations, making it a powerful tool for accurate predictions. In Figure 6d, we see the Gradient Boosting model, which performs quite well but falls slightly short compared to XGBoost. While it shares some similarities with XGBoost, its AUC score is somewhat lower, indicating that it may not optimize predictions as effectively. Lastly, Figure 6e presents the K-Nearest Neighbors (KNN) model. Although it has its merits, it struggles with lower performance, largely due to its sensitivity to the distribution of the data. This sensitivity means it can be significantly affected by how the data points are organized.

The findings suggest that ensemble methods like XGBoost and Random Forest are the clear winners in this comparison. They consistently outperform simpler models, providing more reliable classification results and better handling of complex datasets, making them excellent choices for achieving high accuracy in real-world applications. To ensure the practicality of deploying our proposed DDoS detection system in real-world Internet of Vehicles (IoV) environments, we conducted a detailed analysis of the energy consumption involved in running both Machine Learning (ML) and Federated Learning (FL) tasks. Given that IoV devices often operate under limited energy resources, understanding the power demands of these models is critical.

In Table 6, after feature selection, our analysis revealed that complex ML models such as Random Forest and XGBoost could consume between 0.75 and 1.75 kWh over a 15–35 h period, whereas simpler models like Decision Trees require significantly less energy. When applied in an FL setting, energy consumption decreases further, with complex models using 0.417 to 1.25 kWh during federated rounds and simpler models consuming as little as 0.35 kWh over 7–20 h. The Gini index-based feature selection further improves energy efficiency, reducing computational load by 30%, which is particularly beneficial for resource-constrained IoV devices. This optimization makes our solution effective and energy-efficient, addressing both the scalability and sustainability challenges in real-world IoV systems.

The confusion matrices for each model offer valuable insights into their classification performances by showcasing the counts of true positives, false positives, true negatives, and false negatives, as shown in Figure 7. The confusion matrix for the Decision Tree model Figure 7a shows that while the model accurately classifies some instances, there are also notable misclassifications, particularly between certain classes. This pattern highlights the Decision Tree’s tendency to overfit, which can lead to unreliable predictions of new data. In Figure 7b, the confusion matrix for the Random Forest model indicates a significant improvement in classification accuracy, with a higher number of true positives and fewer misclassifications due to its ensemble approach. Figure 7c presents the confusion matrix for the XGBoost model, which excels in classification, as evidenced by a high count of true positives and a decrease in false positives and false negatives. This shows that XGBoost effectively captures the underlying data patterns, leading to more accurate predictions. The confusion matrix for the Gradient Boosting model Figure 7d achieves high accuracy with many true positives. However, it shows slightly more confusion among some classes, indicating potential areas for improvement. Lastly, Figure 7e illustrates the confusion matrix for the K-Nearest Neighbors (KNN) model. While KNN performs reasonably, the matrix reveals a higher rate of misclassifications, particularly in distinguishing closely related classes, which can lead to less reliable predictions in complex datasets.

The confusion matrices reveal that ensemble methods, particularly XGBoost and Random Forest, consistently outperform simpler models like Decision Tree and KNN. This emphasizes the effectiveness of ensemble learning techniques in enhancing classification performance for complex datasets.

Table 7 compares the different machine learning models used for DDoS attack detection with a focus on various parameters. The models in consideration, Decision Tree, Random Forest, XGBoost, Gradient Boosting, and K-Neighbor (KNN), were assessed about their elapsed time and the volume of resources extended, as well as the level of successful implementation of the task in three configurations: all features, 25 features, and Federated Learning (FL) 25 features. Finally, the running time and memory consumption of the FL framework on a per-client round basis are also provided in the table for comparison. From the findings, the Decision Tree model can be appreciated the most as the quickest of all, accomplishing the task in a mere 1.05 s with the retention of all the features and looking even better when the features are reduced to 25 because it consumes only 50% of the available memory. However, it slightly slows down in FL (2.75 s), but this remains an excellent performance. Random Forest shows consistent performance and is reliable even as regards the rate at which it uses the memory, but it gets a lot more efficient as the features are reduced. In FL, its memory usage goes up slightly, but its overall performance does not change. At first, XGBoost is very slow with the performance while using all the features (60.24 s), but with the reduction of the features, it sees improvement in both the speed and the memory used. In FL, its performance stabilizes and recovers to improved performance in terms of execution time and moderate memory usage. Gradient Boosting, on the other hand, is the slowest model in all the scenarios analyzed for all the models, especially all features reduced to over 400 s after reducing the features. Memory usage does improve a little, but the time taken to run this model is still far behind that of other models and more especially in the FL case. Overall, the table demonstrates that diminished feature sets lead to more accurate machine-learning models while optimizing both speed and memory consumption. Federated Learning offers a bit slower operation but on reasonable resources, so it can demonstrate its usability in distributed environments. These insights give an articulation of the precise models appropriate for DDoS detection in IoV systems, bearing in mind the constraints of available resources and the urgency of response.

Figure 8 highlights the running time of different models, with GradientBoostingClassifier being the most time-consuming, especially when using all features. As an ensemble model, it builds trees sequentially, which limits parallelization and increases computational cost. With more trees and a larger dataset, the training time grows significantly. In contrast, Decision Tree and KNN are more efficient. Reducing the feature set to 25 notably lowers the running time for all models, while federated learning models show similar performance to reduced-feature traditional models with minimal overhead.

Figure 9 illustrates the memory usage of the same models under similar scenarios. Models utilizing all features exhibit the highest memory consumption, whereas reducing the features to 25 significantly lowers the memory usage. Federated learning models also demonstrate reduced memory requirements compared to models trained on all features, with only a slight increase compared to traditional machine learning models with 25 features.

These analyses emphasize the importance of feature reduction in optimizing both memory usage and running time while also showcasing the efficiency of federated learning with modest computational overhead.

5. Discussion

Our experiments have illustrated how various machine learning models and their application in the Federated Learning approach can detect DDoS attacks in IoV. The first set used all features of the dataset, which included a huge amount of details about communication between vehicles and infrastructure. Not all of these features were related to DDoS attack detection. In applying the Gini index, we narrow down the features to the top 25. This helped in retaining a few of the most critical bits of information needed for accurate DDoS detection while getting rid of possibly irrelevant or redundant data. The models’ performance was slightly better using the selected features compared to using all features. The slight improvement with selected features indicates that they contained the essential information to accurately detect DDoS. Federated Learning (FL) presents challenges such as communication overhead, data heterogeneity, and security risks. Communication overhead can be minimized by reducing the frequency of model updates, while data heterogeneity can be managed through personalized models or weighted averaging. Moreover, FL’s security risks, like model poisoning, can be mitigated with differential privacy and secure multi-party computation techniques. Among the models tested, Random Forest and XGBoost classifiers showed the highest performance across all setups. XGBoost, known for its efficiency, slightly outperformed Random Forest, especially with the selected features. These models effectively handled the dataset’s complexity and interactions. In contrast, simpler models like the K-Neighbors Classifier had lower accuracy but still provided useful insights. Gradient Boosting performed similarly to Random Forest but was slightly behind XGBoost. KNeighborsClassifier, although simpler, had an accuracy of 85%, highlighting the importance of model complexity for capturing intricate patterns in IoV data.

Our system works efficiently by reducing time complexity and memory usage. The Gini Index-based feature selection cuts down unnecessary data, speeding up processing. On top of that, Federated Learning (FL) trains the model on devices themselves, so instead of transferring large amounts of data, only lightweight updates are shared, saving both time and memory. These improvements make the system more suitable for real-time DDoS detection in IoV systems, as shown visually in Figure 8 and Figure 9, and in the tabular form in Table 7, comparisons are presented.

5.1. Benefits and Challenges of Federated Learning

Federated Learning has several benefits for IoV applications, but some challenges also have to be addressed. The enhancement of data privacy is one of the most appropriate reasons for the use of Federated Learning. Since only model updates are transferred and the data remain on the device, data without including sensitive information can be shared. This is very important for IoV applications as there is a lot of focus on protecting data privacy [47]. Scalability as a key feature of Federated Learning helps eliminate the challenge of moving the operation to a central location with data storage and processing. In such an environment where there is a wide coverage of IoV networks and many devices have to be integrated and controlled, this feature proves to be useful [48]. Local data processing minimizes the burden of transferring huge amounts of data to the main server for processing. This is very important in instances such as real-time DDoS detection where the need to respond to attacks is very high. A substantial amount of Federated Learning applications require online clients to update the global model through a central server. The downside of most communication-intensive tasks is bandwidth waste. Communication strategy standards need to be improved, and there needs to be time to lessen the burden of the majority of these issues [49]. Different types of IoV devices may be characterized by different levels of computing infrastructure, availability of networks, and distribution of data. This diversity may add more complications to the effort of Federated Learning and, in turn, lead to incomplete model updating. Weighted averaging and adaptive learning rates are examples of solutions to these challenges.

In our approach, we improve the detection of DDoS in IoV with the Gini index and Federated Learning. However, the proposed approach is bound to other cybersecurity threats. The present work has focused on typical IP/UDP-based DDoS attacks. Advanced methods may be difficult to trace, involving, for example, emulation dictionaries. Most importantly, it will be interesting to adapt the approach to detect more complex variants of DDoS and other IoV-specific threats for spoofing or jamming attacks by evaluating its performance across different configurations of IoV. Another challenge in the implementation or realization of FL in real-time IoV includes the resource limitation of the devices and possible communication overhead that might reduce performance. In this regard, future research will provide a study on lightweight model optimization for FL, reduction in communication updates, and edge-based aggregations for enhancing real-time detection and allowing FL viability at deployment in IoV.

5.2. Comparative Analysis and Achievements

While both approaches achieved high accuracy, Federated Learning provides additional benefits in terms of data privacy and scalability [47]. Reducing the dataset to 25 features using the Gini index maintained and, in some cases, slightly improved model performance. This approach effectively balances the need for comprehensive data and computational efficiency [48]. The XGBoost classifier consistently showed the highest performance across all setups. It is particularly effective for DDoS detection in IoV, offering a robust and efficient solution [49].

5.3. Implications for Real-World Applications

The findings from this research have significant implications for real-world IoV applications, particularly in enhancing network security and robustness.

Improved DDoS Detection: Research has shown that the use of newer machine learning modules together with feature selection techniques is beneficial in handling DDoS detection and mitigation tasks. This upholds the reliability and security of Internet of Vehicles systems, which are important for autonomous vehicles and smart transport systems.
Complexity and Reduction: The application of feature selection through the Gini Index is not only instrumental in increasing the detection accuracy, but it also helps cut down on the computation burden that is placed on the models. This is critical as adopting an approach that narrows down features, which are most relevant to the processing of IoV devices and are usually resource-constrained, can be lightened. Such simplification of models is important in the real-time detection of DDoS attacks as it increases the response time and helps in the better allocation of resources in the IoV setup.
Privacy-Preserving Solutions: Federated Learning offers a viable solution for IoV networks to leverage collaborative learning while maintaining data privacy. This is especially relevant for applications involving sensitive information, such as autonomous vehicles and smart transportation systems. By ensuring data privacy, Federated Learning can increase user trust and acceptance of IoV technologies.
Scalability and Adaptability: The scalability of Federated Learning makes it suitable for widespread IoV deployment, accommodating a large number of devices with varying capabilities. This adaptability ensures that security solutions remain effective as the IoV ecosystem evolves. Furthermore, Federated Learning can adapt to new and emerging threats, providing a robust defense mechanism for future IoV networks.
The practical effectiveness of our method is evidenced by its ability to accurately detect different types of DDoS attacks. Notably, our models achieved detection rates of 28.65% for DrDoS_DNS, 28.94% for DrDoS_SNMP, 9.20% for DrDoS_UDP, and 20.61% for NetBIOS attacks. These findings illustrate the robustness of our approach in real-world scenarios where a variety of attack types are encountered.

In light of challenges in IoV environments, namely, data privacy and scalability issues, we employ Federated Learning. Most of the conventional DDoS detection methods, including threshold-based methods, though efficient, result in poor privacy and poor scalability problems when applied to decentralized IoV environments. Our approach combines federated learning with Gini index-based feature selection in an attempt to address such issues by opening up a way for efficient privacy-preserving decentralized detection at reduced computational overhead.

6. Conclusions

In conclusion, we proposed integrating Federated Learning (FL) with Gini Index-based feature selection in order to enhance DDoS detection in IoV. This dual approach effectively addresses key challenges, including privacy preservation, computational efficiency, and detection accuracy. By selecting the most important features with the Gini Index, we reduced the dataset size, minimizing computational demands while ensuring high detection performance. This is especially beneficial for resource-constrained IoV systems, which enhance the model’s generalization capability. Additionally, security and privacy are ensured by FL, which allows collaborative training of the global model without sharing raw data. This minimizes data transmission to central servers, promoting scalable and privacy-preserving solutions for smart vehicle networks. Our method demonstrated high accuracy in DDoS attack detection with a reduced feature set, proving to be practical and effective for protecting IoV systems from cyber threats. This system minimizes time complexity and memory usage through Federated Learning and feature selection. By decentralizing training and reducing data dimensionality, it ensures high efficiency and scalability, making it ideal for real-time applications. The results show its potential for deployment in resource-constrained IoV environments, which offers a practical solution for DDoS detection.

In the future, we will further strive to make the process of feature selection more refined and adaptive to dynamic network conditions and various attacking patterns. We intend to harness advanced algorithms for federated learning to improve model robustness and scalability. Furthermore, we are also interested in applying our approach to other IoV security challenges, like intrusion and anomaly detection, to fully protect connected vehicle security. We hope that by continuous innovation in these regards, we can help come up with an IoV system that is much safer and more resilient.

Author Contributions

Conceptualization, M.H.S.; Methodology, M.D., M.H.S. and S.R.; Software, M.D.; Validation, M.D.; Investigation, S.R.; Writing—original draft, M.D.; Writing—review & editing, M.H.S. and S.R.; Supervision, M.H.S. and S.R. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by Open Access Funding by TU Wien.

Data Availability Statement

No new datasets were created. The data presented in this study are openly available at https://www.unb.ca/cic/datasets/ddos-2019.html, DOI: 10.1109/CCST.2019.8888419, accessed on 1 December 2024.

Acknowledgments

The authors acknowledge TU Wien Bibliothek for financial support through its Open Access Funding Program.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

Abbreviation	Full Form
IoV	Internet of Vehicles
DDoS	Distributed Denial of Service
IDS	Intrusion Detection System
V2V	Vehicle-to-Vehicle
V2I	Vehicle-to-Infrastructure
IOT	Internet of Things
IIOT	Industrial Internet of Things
FL	Federated Learning
Gini	Gini Index
ML	Machine Learning
RF	Random Forest
KNN	K-Nearest Neighbors
XGBoost	eXtreme Gradient Boosting
ROC	Receiver Operating Characteristic
AUC	Area Under the Curve
SVM	Support Vector Machine

References

Szymonik, A. Cybersecurity of autonomous vehicles–threats and mitigation. Sci. J. Mil. Univ. Land Forces 2024, 56, 77–96. [Google Scholar] [CrossRef]
Verma, A.; Saha, R.; Kumar, G.; Conti, M.; Rodrigues, J.J. VAIDANSHH: Adaptive DDoS detection for heterogeneous hosts in vehicular environments. Veh. Commun. 2024, 48, 100787. [Google Scholar] [CrossRef]
Albishi, O.A.; Abdullah, M. DDoS Attacks Detection in IoV using ML-based Models with an Enhanced Feature Selection Technique. Int. J. Adv. Comput. Sci. Appl. 2024, 15. [Google Scholar] [CrossRef]
Taslimasa, H.; Dadkhah, S.; Neto, E.C.P.; Xiong, P.; Ray, S.; Ghorbani, A.A. Security issues in Internet of Vehicles (IoV): A comprehensive survey. Internet Things 2023, 22, 100809. [Google Scholar] [CrossRef]
Mengistu, T.M.; Kim, T.; Lin, J.W. A Survey on Heterogeneity Taxonomy, Security and Privacy Preservation in the Integration of IoT, Wireless Sensor Networks and Federated Learning. Sensors 2024, 24, 968. [Google Scholar] [CrossRef] [PubMed]
Doriguzzi-Corin, R.; Siracusa, D. FLAD: Adaptive federated learning for DDoS attack detection. Comput. Secur. 2024, 137, 103597. [Google Scholar] [CrossRef]
Haddaji, A.; Ayed, S.; Chaari Fourati, L. IoV security and privacy survey: Issues, countermeasures, and challenges. J. Supercomput. 2024, 80, 23018–23082. [Google Scholar] [CrossRef]
Chanu, U.S.; Singh, K.J.; Chanu, Y.J. A dynamic feature selection technique to detect DDoS attack. J. Inf. Secur. Appl. 2023, 74, 103445. [Google Scholar] [CrossRef]
Shaar, F.; Efe, A. DDoS attacks and impacts on various cloud computing components. Int. J. Inf. Secur. Sci. 2018, 7, 26–48. [Google Scholar]
Dibaei, M.; Zheng, X.; Jiang, K.; Abbas, R.; Liu, S.; Zhang, Y.; Xiang, Y.; Yu, S. Attacks and defences on intelligent connected vehicles: A survey. Digit. Commun. Netw. 2020, 6, 399–421. [Google Scholar] [CrossRef]
Carlos Pinto Neto, E.; Taslimasa, H.; Dadkhah, S.; Iqbal, S.; Xiong, P.; Rahman, T.; Ghorbani, A. Ciciov2024: Advancing Realistic Ids Approaches Against Dos and Spoofing Attack in Iov Can Bus. Internet Things 2024, 26, 101209. [Google Scholar] [CrossRef]
Ramya Devi, M.; Lokesh, S. Intelligent accident detection system by emergency response and disaster management using vehicular fog computing. Automatika 2024, 65, 117–129. [Google Scholar] [CrossRef]
Sadaf, M.; Iqbal, Z.; Anwar, Z.; Noor, U.; Imran, M.; Gadekallu, T.R. A novel framework for detection and prevention of denial of service attacks on autonomous vehicles using fuzzy logic. Veh. Commun. 2024, 46, 100741. [Google Scholar] [CrossRef]
Hassan, M.; Tariq, N.; Alsirhani, A.; Alomari, A.; Khan, F.A.; Alshahrani, M.M.; Ashraf, M.; Humayun, M. Gitm: A gini index-based trust mechanism to mitigate and isolate sybil attack in rpl-enabled smart grid advanced metering infrastructures. IEEE Access 2023, 11, 62697–62720. [Google Scholar] [CrossRef]
Singh, J.; Behal, S. Detection and mitigation of DDoS attacks in SDN: A comprehensive review, research challenges and future directions. Comput. Sci. Rev. 2020, 37, 100279. [Google Scholar] [CrossRef]
Manivannan, D.; Moni, S.S.; Zeadally, S. Secure authentication and privacy-preserving techniques in Vehicular Ad-hoc NETworks (VANETs). Veh. Commun. 2020, 25, 100247. [Google Scholar] [CrossRef]
Sherazi, H.H.R.; Iqbal, R.; Ahmad, F.; Khan, Z.A.; Chaudary, M.H. DDoS attack detection: A key enabler for sustainable communication in internet of vehicles. Sustain. Comput. Inform. Syst. 2019, 23, 13–20. [Google Scholar] [CrossRef]
Alalwany, E.; Mahgoub, I. Security and trust management in the internet of vehicles (IoV): Challenges and machine learning solutions. Sensors 2024, 24, 368. [Google Scholar] [CrossRef] [PubMed]
Gaurav, A.; Gupta, B.B.; Peñalvo, F.J.G.; Nedjah, N.; Psannis, K. Ddos attack detection in vehicular ad-hoc network (vanet) for 5g networks. In Security and Privacy Preserving for IoT and 5G Networks: Techniques, Challenges, and New Directions; Springer: Cham, Switzerland, 2022; pp. 263–278. [Google Scholar]
Goncalves, F.; Ribeiro, B.; Gama, O.; Santos, J.; Costa, A.; Dias, B.; Nicolau, M.J.; Macedo, J.; Santos, A. Synthesizing datasets with security threats for vehicular ad-hoc networks. In Proceedings of the GLOBECOM 2020–2020 IEEE Global Communications Conference, Taipei, Taiwan, 7–11 December 2020; IEEE: New York, NY, USA, 2020; pp. 1–6. [Google Scholar]
Gruebler, A.; McDonald-Maier, K.D.; Alheeti, K.M.A. An intrusion detection system against black hole attacks on the communication network of self-driving cars. In Proceedings of the 2015 Sixth International Conference on Emerging Security Technologies (EST), Braunschweig, Germany, 3–5 September 2015; IEEE: New York, NY, USA, 2015; pp. 86–91. [Google Scholar]
Rani, P.; Sharma, C.; Ramesh, J.V.N.; Verma, S.; Sharma, R.; Alkhayyat, A.; Kumar, S. Federated learning-based misbehaviour detection for the 5G-enabled internet of vehicles. IEEE Trans. Consum. Electron. 2023, 70, 4656–4664. [Google Scholar] [CrossRef]
Gou, W.; Zhang, H.; Zhang, R. Multi-classification and tree-based ensemble network for the intrusion detection system in the internet of vehicles. Sensors 2023, 23, 8788. [Google Scholar] [CrossRef]
Ferrag, M.A.; Friha, O.; Hamouda, D.; Maglaras, L.; Janicke, H. Edge-IIoTset: A new comprehensive realistic cyber security dataset of IoT and IIoT applications for centralized and federated learning. IEEE Access 2022, 10, 40281–40306. [Google Scholar] [CrossRef]
Li, J.; Zhang, Z.; Li, Y.; Guo, X.; Li, H. FIDS: Detecting DDoS through federated learning based method. In Proceedings of the 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Shenyang, China, 20–22 October 2021; IEEE: New York, NY, USA, 2021; pp. 856–862. [Google Scholar]
Hamza, N.; Lakmal, H.; Maduranga, M.; Kathriarachchi, R. Malware Detection of IoT Networks Using Machine Learning: An Experimental Study with Edge IIoT Dataset. In Proceedings of the 30th Annual Technical Conference-IET Sri Lanka Network, Colombo, Sri Lanka, 5 August 2023. [Google Scholar]
Moustafa, N.; Keshky, M.; Debiez, E.; Janicke, H. Federated TON_IoT Windows datasets for evaluating AI-based security applications. In Proceedings of the 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Guangzhou, China, 29 December–1 January 2020; IEEE: New York, NY, USA, 2020; pp. 848–855. [Google Scholar]
Qu, Z.; Cai, Z. FEDSA-ResnetV2: An Efficient Intrusion Detection System for Vehicle Road Cooperation Based on Federated Learning. IEEE Internet Things J. 2024. [Google Scholar] [CrossRef]
Qin, Y.; Kondo, M. Federated learning-based network intrusion detection with a feature selection approach. In Proceedings of the 2021 International Conference on Electrical, Communication, and Computer Engineering (ICECCE), Kuala Lumpur, Malaysia, 12–13 June 2021; IEEE: New York, NY, USA, 2021; pp. 1–6. [Google Scholar]
Khan, I.A.; Moustafa, N.; Pi, D.; Haider, W.; Li, B.; Jolfaei, A. An enhanced multi-stage deep learning framework for detecting malicious activities from autonomous vehicles. IEEE Trans. Intell. Transp. Syst. 2021, 23, 25469–25478. [Google Scholar] [CrossRef]
McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the Artificial Intelligence and Statistics, PMLR, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
Song, H.M.; Kim, H.K. Self-supervised anomaly detection for in-vehicle network using noised pseudo normal data. IEEE Trans. Veh. Technol. 2021, 70, 1098–1108. [Google Scholar] [CrossRef]
Moustafa, N. A new distributed architecture for evaluating AI-based security systems at the edge: Network TON_IoT datasets. Sustain. Cities Soc. 2021, 72, 102994. [Google Scholar] [CrossRef]
Alsaedi, A.; Moustafa, N.; Tari, Z.; Mahmood, A.; Anwar, A. TON_IoT telemetry dataset: A new generation dataset of IoT and IIoT for data-driven intrusion detection systems. IEEE Access 2020, 8, 165130–165150. [Google Scholar] [CrossRef]
Anyanwu, G.O.; Nwakanma, C.I.; Lee, J.M.; Kim, D.S. Real-time position falsification attack detection system for internet of vehicles. In Proceedings of the 2021 26th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Vasteras, Sweden, 7–10 September 2021; IEEE: New York, NY, USA, 2021; pp. 1–4. [Google Scholar]
Otoum, Y.; Nayak, A. Signature-over-the-air with transfer learning ids for intelligent connected vehicles (icv). In Proceedings of the 2021 IEEE Globecom Workshops (GC Wkshps), Madrid, Spain, 7–11 December 2021; IEEE: New York, NY, USA, 2021; pp. 1–6. [Google Scholar]
Makkar, A.; Kim, T.W.; Singh, A.K.; Kang, J.; Park, J.H. Secureiiot environment: Federated learning empowered approach for securing iiot from data breach. IEEE Trans. Ind. Inform. 2022, 18, 6406–6414. [Google Scholar] [CrossRef]
Abou El Houda, Z.; Naboulsi, D.; Kaddoum, G. A privacy-preserving collaborative jamming attacks detection framework using federated learning. IEEE Internet Things J. 2023, 11, 12153–12164. [Google Scholar] [CrossRef]
Alanazi, M.; Aljuhani, A. Anomaly Detection for Internet of Things Cyberattacks. Comput. Mater. Contin. 2022, 72, 261–279. [Google Scholar] [CrossRef]
Polat, H.; Turkoglu, M.; Polat, O. Deep network approach with stacked sparse autoencoders in detection of DDoS attacks on SDN-based VANET. Iet Commun. 2020, 14, 4089–4100. [Google Scholar] [CrossRef]
Shah, S.A.A.; Ahmed, E.; Imran, M.; Zeadally, S. 5G for vehicular communications. IEEE Commun. Mag. 2018, 56, 111–117. [Google Scholar] [CrossRef]
Aloqaily, M.; Otoum, S.; Al Ridhawi, I.; Jararweh, Y. An intrusion detection system for connected vehicles in smart cities. Ad Hoc Netw. 2019, 90, 101842. [Google Scholar] [CrossRef]
Kosmanos, D.; Pappas, A.; Maglaras, L.; Moschoyiannis, S.; Aparicio-Navarro, F.J.; Argyriou, A.; Janicke, H. A novel intrusion detection system against spoofing attacks in connected electric vehicles. Array 2020, 5, 100013. [Google Scholar] [CrossRef]
Sharafaldin, I.; Lashkari, A.H.; Hakak, S.; Ghorbani, A.A. Developing Realistic Distributed Denial of Service (DDoS) Attack Dataset and Taxonomy. In Proceedings of the 2019 International Carnahan Conference on Security Technology (ICCST), Chennai, India, 1–3 October 2019; pp. 1–8. [Google Scholar] [CrossRef]
Korium, M.S.; Saber, M.; Beattie, A.; Narayanan, A.; Sahoo, S.; Nardelli, P.H. Intrusion detection system for cyberattacks in the Internet of Vehicles environment. Ad Hoc Netw. 2024, 153, 103330. [Google Scholar] [CrossRef]
Limouchi, E.; Chan, F. Optimized Machine Learning-Based Intrusion Detection System for Internet of Vehicles. In Proceedings of the 2023 IEEE Symposium Series on Computational Intelligence (SSCI), Mexico City, Mexico, 5–8 December 2023; IEEE: New York, NY, USA, 2023; pp. 1151–1157. [Google Scholar]
Li, X.; Zhang, H. A survey on DDoS attacks in IoV and corresponding detection mechanisms. IEEE Commun. Surv. Tutor. 2019, 21, 312–336. [Google Scholar]
Feng, J.; Li, Z. An intelligent collaborative edge computing approach for DDoS detection in IoV. IEEE Access 2019, 7, 40596–40605. [Google Scholar]
Liu, Y.; Wang, L. Detecting DDoS attacks in IoV using deep learning techniques. IEEE Trans. Intell. Transp. Syst. 2018, 19, 2306–2317. [Google Scholar]

Figure 1. Internet of Vehicles (IoV) illustrating various communication types.

Figure 2. IoV network attack detection system.

Figure 3. Selected important features.

Figure 4. Distribution of different attack types in the dataset.

Figure 5. Federated learning and machine learning process flow.

Figure 6. ROC Curves for Decision Tree, Random Forest, XGBoost, Gradient Boosting, and K-Nearest Neighbors models; comparing classification performance across multiple classes: (a) ROC Curve for Decision Tree Model. (b) ROC Curve for Random Forest Model. (c) ROC Curve for XGBoost Model. (d) ROC Curve for Gradient Boosting Model. (e) ROC Curve for K-Nearest Neighbors Model.

Figure 7. Confusion matrices for Model (a), Model (b), Model (c), Model (d), and Model (e): (a) Confusion Matrix for Decision Tree Model. (b) Confusion Matrix for Random Forest Model. (c) Confusion Matrix for XGBoost Model. (d) Confusion Matrix for Gradient Boosting Model. (e) Confusion Matrix for K-Nearest Neighbors Model. In each confusion matrix, color intensity shows prediction frequency, with darker shades indicating higher values and lighter shades showing lower values, helping to spot misclassifications.

Figure 8. Running time of models under different scenarios.

Figure 9. Memory usage of models under different scenarios.

Table 1. Overview of various datasets for evaluating security measures in IoT, IIoT, and IoV applications.

Paper Ref.	Year	Dataset	Description	Features	ML Techniques	Testbed	IoT/IIoT/IOV Devices	Threats	Learning Approach	Traffic
Akshat Gaurav [19]	2023	FL-IoV2023	Built for detecting misbehavior in 5G-enabled IoV using Federated Learning.	50	FL, CNN, RNN, LSTM	Real-world	Multiple IoV devices	Malicious communication, DDoS attacks	Centralized (✗) FL (✓)	IoT (✓) IoV (✓)
Goncalves, F. [20]	2022	IoV-Sec2022	Focuses on security in IoV networks with data collected from real traffic.	45	SVM, RF, GB, DT, Federated SVM	Simulated	IoV devices	DDoS, data injection, malware	Centralized (✓) FL (✓)	IoT (✓) IoV (✗)
Gruebler, A. [21]	2022	5G-IoT2021	Dataset for 5G IoT traffic with multiple types of IoT devices and various attack scenarios.	40	DT, KNN, FL-Boosting	Virtual	Various IoT devices	Spoofing, eavesdropping, DDoS	Centralized (✓) FL (✓)	IoT (✓) IoV (✗)
P. Rani [22]	2023	SecureIoV	Collected from 5G-enabled IoV networks including different types of vehicular communication attacks.	55	Federated DT, RNN, LSTM	Real-world	IoV and IIoT devices	Vehicle hijacking, DDoS	Centralized (✗) FL (✓)	IoT (✓) IoV (✓)
Wanting [23]	2023	CIC-IDS2017	Used for evaluating IDS with various types of attacks including DDoS, Brute Force, and Port-Scan.	80	DT, RF, ET, XGBoost, KNN, SVM	Real-world	Multiple IoV devices	BENIGN, Brute Force, DoS, Port-Scan, Web Attack, Botnet	Centralized (✓) FL (✗)	IoT (✓) IoV (✗)
Ferrag, M.A. [24]	2023	Industrial IoT	Ensuring security of IIoT environments using federated learning.	35	FL, SVM, KNN, RF	Simulated	Various IIoT sensors	Data breaches, unauthorized access	Centralized (✗) FL (✓)	IIoT (✓) IoV (✗)
Jingyi [25]	2021	CIC-IDS2017	Dataset focused on network intrusion detection.	78	SVM, KNN, DT, RF	Real-world	IoV devices	DDoS, Web attacks, Botnets	Centralized (✓)	IoT (✓) IIoT (✗)
Hamza, N. [26]	2023	Edge-IIoTset	Comprehensive cybersecurity dataset for IoT and IIoT applications supporting both centralized and FL modes.	61	DT, RF, SVM, KNN, DNN	Real-world	Various IoT/IIoT devices	Malware, network intrusions	Centralized (✓) FL (✓)	IoT (✓) IIoT (✓) IoV (✗)
Mustafa [27]	2020	Federated TON_IoT	Created for evaluating IoT/IIoT security using FL. Includes data from various sources and 9 attack categories.	50	Various	Real-world	Various IoT devices	DoS/DDoS, scanning, ransomware, backdoor, injection, XSS, password, MITM	Centralized (✗) FL (✓)	IoT (✓) IIoT (✓) IoV (✗)
Ours	2024	CICDDOS 2019	Efficient DDoS attack detection in IoV using Gini Index and FL.	25	DT, RF, XGBoost, GB, KNN	Simulated	Various VANET devices	Various DDoS attacks	Centralized (✗) FL (✓)	IoT (✓) IoV (✓)

Table 2. 25 Selected features with their data types.

Sr. No.	Feature Name	Data Type
1	`source_port`	`int64`
2	`destination_port`	`int64`
3	`flow_duration`	`int64`
4	`total_length_of_fwd_Spackets`	`int64`
5	`Fwd_Packet_Length_Max`	`int64`
6	`Fwd_Packet_Length_Min`	`int64`
7	`Fwd_Packet_Length_Mean`	`float64`
8	`Flow_IAT_Mean`	`float64`
9	`Flow_IAT_Max`	`int64`
10	`Flow_IAT_Min`	`int64`
11	`Fwd_IAT_Total`	`int64`
12	`Fwd_IAT_Max`	`int64`
13	`Fwd_IAT_Min`	`int64`
14	`Fwd_Header_Length`	`float64`
15	`fwd_packets/s`	`float64`
16	`Min_Packet_Length`	`int64`
17	`Max_Packet_Length`	`int64`
18	`Packet_Length_Mean`	`float64`
19	`Average_Packet_Size`	`float64`
20	`Avg_Fwd_Segment_Size`	`float64`
21	`Fwd_Header_Length.1`	`float64`
22	`Subflow_Fwd_Bytes`	`int64`
23	`min_seg_size_forward`	`int64`
24	`time`	`object`
25	`label`	`object`

Table 3. Attack statistics.

Sr. No.	Attack Type	Total Samples	Training Samples	Testing Samples
1	BENIGN	183	144	39
2	DrDoS_DNS	21,635	17,234	4401
3	DrDoS_LDAP	1250	1020	230
4	DrDoS_NTP	7	5	2
5	DrDoS_NetBIOS	454	365	89
6	DrDoS_SNMP	21,856	17,461	4395
7	DrDoS_SSDP	131	112	19
8	DrDoS_UDP	6949	5591	1358
9	LDAP	2173	1738	435
10	NetBIOS	15,563	12,468	3095
11	Portmap	799	652	147
12	Syn	22	15	7
13	TFTP	579	461	118
14	UDP	3684	2966	718
15	UDP-lag	218	171	47
16	UDPLag	5	4	1
17	WebDDoS	2	1	1

Table 4. Performance of machine learning models with all 25 features.

Models	All Features Acc	25 Features Acc
Decision Tree	0.92%	0.93%
Random Forest	0.92%	0.93%
XGBoost Classifier	0.94%	0.94%
Gradient Boosting	0.93%	0.93%
KNeighborsClassifier	0.85%	0.85%

Table 5. Comparison of classification performance using 25 selected features with machine learning and federated learning.

Model	Accuracy	Macro Avg Precision	Macro Avg Recall	Macro Avg F1-Score
Using Machine Learning Models
Decision Tree	0.93	0.69	0.70	0.70
Random Forest	0.93	0.73	0.69	0.70
XGBoost	0.94	0.71	0.69	0.69
Gradient Boosting	0.93	0.71	0.65	0.67
KNN	0.85	0.60	0.44	0.46
Using Federated Learning Models
Decision Tree	0.92	0.68	0.67	0.67
Random Forest	0.92	0.69	0.68	0.68
XGBoost	0.93	0.68	0.67	0.67
Gradient Boosting	0.92	0.66	0.67	0.66
KNN	0.88	0.68	0.67	0.67

Table 6. Power consumption during machine learning and federated learning tasks.

Task Type	Model Complexity	Models Used	Total Time (Hours)	Power Consumption (kWh)
ML	Simpler Models	Decision Tree, K-Neighbors Classifier	5 to 15	0.25 to 0.75
ML	Complex Models	Random Forest, XGBoost, Gradient Boosting	15 to 35	0.75 to 1.75
FL	Simpler Models	Decision Tree, K-Neighbors Classifier	7 to 20	0.35 to 1.00
FL	Complex Models	Random Forest, XGBoost, Gradient Boosting	10 to 30	0.417 to 1.25

Table 7. Performance comparison of machine learning models under different scenarios.

Model	Scenario	Running Time	Memory Usage	Performance Summary
Decision Tree	All Features	1.05 s	1157.10 MB	Fastest model with all features but moderate memory consumption
	ML Model (25 Features)	0.82 s	570.51 MiB	Faster with reduced memory (about 50% memory saved)
	FL Models (25 Features)	2.75 s	837.93 MB	Slightly slower but manageable memory usage in federated learning
	Avg. Round Per Client (FL)	0.0075 s	0.0456 MB	Efficient in FL with minimal memory and time per round
Random Forest	All Features	18.91 s	1171.41 MB	Balanced performance, slower than Decision Tree but consistent
	25 Features	14.02 s	594.97 MiB	Noticeable speedup, memory reduction by 50%
	FL Models (25 Features)	16.53 s	954.91 MB	Similar performance to 25 features but slightly higher memory in FL
	Avg. Round Per Client (FL)	0.8199 s	0.1748 MB	Moderate time and memory usage per round in FL
XGBoost	All Features	60.24 s	1186.96 MB	Much slower execution with high memory consumption
	25 Features	13.82 s	612.95 MiB	Dramatic improvement in execution and memory usage
	FL Models (25 Features)	14.10 s	851.43 MB	Stable performance in FL, faster than full-feature version
	Avg. Round Per Client (FL)	0.4015 s	0.3639 MB	Moderate time with slightly higher memory in FL
Gradient Boosting	All Features	460.57 s	1194.47 MB	Extremely slow compared to all other models with high memory demand
	25 Features	358.78 s	617.83 MiB	Better performance with feature reduction, but still much slower than others
	FL Models (25 Features)	400.20 s	862.06 MB	Similar slow performance in FL, but decent memory savings compared to all features
	Avg. Round Per Client (FL)	1.5169 s	0.1964 MB	Longer time with moderate memory in FL
KNN	All Features	11.78 s	1291.00 MB	Quick execution but highest memory usage with all features
	25 Features	0.23 s	627.52 MiB	Fastest model with lowest memory usage drop (50%)
	FL Models (25 Features)	0.21 s	838.70 MB	Continues to be the fastest even in federated learning, though memory usage increases
	Avg. Round Per Client (FL)	0.0022 s	0.0452 MB	Fastest in FL with minimal memory usage

The color coding in the table provides a quick visual guide to performance levels. Green shows the best or most efficient performance, yellow indicates moderate or satisfactory performance, and red highlights areas where performance is the weakest or most resource-intensive.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dilshad, M.; Syed, M.H.; Rehman, S. Efficient Distributed Denial of Service Attack Detection in Internet of Vehicles Using Gini Index Feature Selection and Federated Learning. Future Internet 2025, 17, 9. https://doi.org/10.3390/fi17010009

AMA Style

Dilshad M, Syed MH, Rehman S. Efficient Distributed Denial of Service Attack Detection in Internet of Vehicles Using Gini Index Feature Selection and Federated Learning. Future Internet. 2025; 17(1):9. https://doi.org/10.3390/fi17010009

Chicago/Turabian Style

Dilshad, Muhammad, Madiha Haider Syed, and Semeen Rehman. 2025. "Efficient Distributed Denial of Service Attack Detection in Internet of Vehicles Using Gini Index Feature Selection and Federated Learning" Future Internet 17, no. 1: 9. https://doi.org/10.3390/fi17010009

APA Style

Dilshad, M., Syed, M. H., & Rehman, S. (2025). Efficient Distributed Denial of Service Attack Detection in Internet of Vehicles Using Gini Index Feature Selection and Federated Learning. Future Internet, 17(1), 9. https://doi.org/10.3390/fi17010009

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Efficient Distributed Denial of Service Attack Detection in Internet of Vehicles Using Gini Index Feature Selection and Federated Learning

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Dataset Description

3.2. Machine Learning Models

3.2.1. Decision Tree

3.2.2. Random Forest

3.2.3. XGBoost Classifier

3.2.4. Gradient Boosting

3.2.5. KNeighborsClassifier

3.3. Federated Learning Approach

4. Experiments and Results

4.1. Experimental Setup

4.2. Hyperparameter Tuning

4.3. Evaluation

5. Discussion

5.1. Benefits and Challenges of Federated Learning

5.2. Comparative Analysis and Achievements

5.3. Implications for Real-World Applications

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI