1. Introduction
Road safety is a major public health concern. According to the World Health Organization (WHO), traffic-related deaths account for around 1.19 million deaths per year, primarily affecting those aged 5 to 29 [
1]. As urban areas become more complex and the number of vehicles increases, investigating the different elements that contribute to traffic accidents and developing predictive models to address these issues is becoming crucial [
2,
3]. Recent research has highlighted the critical role of incorporating sophisticated analytical methods, such as Machine Learning (ML) and DL [
4], to strengthen road safety initiatives. A thorough examination of ML applications in the analysis of traffic accidents revealed 191 pertinent studies, illustrating the capacity of these technologies to enhance predictive accuracy and guide preventive measures [
1,
5]. ML techniques enable researchers to scrutinize massive datasets, identify trends, and forecast accident risks, thereby supporting the worldwide endeavor to diminish traffic-related fatalities and injuries by 2030, in accordance with the objectives set by the WHO [
1].
The prediction of traffic accident severity (TAS) has, recently, garnered significant interest, largely due to the growing focus on road safety and public health concerns. Conventional ML techniques, such as decision trees, random forests (RF), and support vector machines (SVM), have been widely used for analyzing accident datasets [
6]. While these approaches have achieved good performance, they often struggle to capture the intricate and non-linear relationships in large and multi-dimensional datasets like those associated with traffic accidents. To address these challenges, this study proposes the use of a ResNet, a model originally designed for image recognition, as a new approach to predict TAS using structured CSV data. Although the success of ResNet is strongly associated with image processing, its architecture—especially the use of residual connections—proves advantageous for modeling structured data with high-dimensional attributes. We choose ResNet for its ability to learn complex models through residual connections, which helps mitigate the vanishing gradient problem [
7]. This ability to learn more profound representations makes ResNet well-suited for tasks that involve complex interactions among multiple features. Such characteristic is particularly pertinent in the context of predicting road accident severity, where the outcomes are influenced by a multitude of interrelated factors, such as weather conditions, types of roads, driver behavior, and traffic signals.
In fact, DL methods, including Resnet, frequently operate as “black boxes”, limiting stakeholders’ comprehension of the factors influencing predictions. eXplainable Artificial Intelligence (XAI) addresses this issue, of model interpretability [
8], by using methods that explain how different factors -such as weather, road conditions, and driver conduct- affect accident severity. XAI enables stakeholders to assess model results and provide information for policy-making, thus improving accountability and confidence in decision-making processes. Incorporating XAI into TAS prediction enhances these systems’ effectiveness and makes forecasts consistent with social norms, thereby promoting the development of more intelligent and safe transportation networks.
Several XAI techniques exist, each designed for specific goals and suited for different applications. Among these techniques, SHAP has proven to be a powerful interpretability tool. Using SHAP in the context of predicting TAS presents significant benefits, primarily due to its solid theoretical foundations and reliable interpretability. The SHAP values, rooted in cooperative game theory, ensure an equitable evaluation of each feature’s contribution, which is essential in critical domains such as traffic safety. In contrast to other XAI methodologies, SHAP delivers both local (specific predictions) and global (overall feature significance) insights, enabling stakeholders to comprehend not only the rationale behind individual predictions but also the collective influence of various factors on the model’s overall performance. Furthermore, SHAP effectively incorporates feature interactions, offering a detailed perspective on how different variables—such as weather conditions, time of day, and driver behavior—interact to influence accident severity. This aspect is particularly pertinent in traffic situations where multiple factors frequently converge. Additionally, the visualization capabilities of SHAP enhance the clarity of results for a wide range of stakeholders, promoting informed dialogue and fostering trust in the model’s outputs. In summary, the integration of theoretical robustness, extensive interpretability, and efficient communication renders SHAP an exemplary tool for enhancing transparency and accountability in models predicting TAS.
The main goal of this paper is to develop a predictive model for accident injury severity utilizing a ResNet architecture. Through this model and SHAP, we aim to improve the understanding of the determinants of TAS, which may contribute to reducing both the frequency and severity of accidents in the near future. The dataset, used for this study, contains data on traffic accidents in the United Kingdom (UK), from 2005 to 2013. The main contributions of this work can be summarized as follows:
ResNet Framework for TAS Prediction: We introduce a predictive model based on ResNet, specifically designed to achieve precise and efficient TAS prediction.
Comparative Evaluation: A comprehensive assessment is performed to compare the proposed ResNet model against various DL architectures, such as CNN, LSTM, Darknet, and Xception.
SHAP for Analyzing Feature Significance: By employing SHAP, we uncover the primary factors that affect the predictions made by the model, offering clear and interpretable visual representations of the contributions of different features.
Generalizability Assessment: The effectiveness of the proposed ResNet model is tested on an alternative dataset, demonstrating its robustness and capability to generalize successfully across diverse datasets.
The remainder of this paper is structured as follows. In
Section 2, the relevant previous research is examined.
Section 3 provides a detailed presentation of the proposed system design.
Section 4 describes the design and results of the experiment.
Section 5 discusses the conclusions and suggestions for future research.
2. Related Work
Over the past decade, ML and DL have emerged a prominent method for predicting the severity of accidents, primarily due to their ability to identify underlying patterns and deliver results that surpass those obtained through conventional statistical methods. This section reviews the approaches previously employed in TAS prediction. A summary of these approaches and their limitations in provided in
Table 1.
Numerous studies have explored ML approaches to enhance TAS prediction and hotspots identification, often leveraging the RF algorithm for its robust predictive capabilities. Ahmed et al. [
9] address the critical issue of road traffic accidents, applying ensemble ML models such as RF and Decision Jungle (DJ) to data from New Zealand’s Ministry of Transport (2016–2020). Their findings reveal that RF achieved the highest accuracy (81.45%) in predicting accident severity, and Shapley value analysis further underscores the importance of features like road type and vehicle density. This illustrates the effectiveness of ensemble models, particularly RF, in improving predictive accuracy. Similarly, Santos et al. [
10] investigate ML methodologies for analyzing accident severity and identifying potential hotspots using data from Setúbal, Portugal (2016–2019). They employ supervised techniques, including RF and decision trees, and unsupervised methods such as DBSCAN clustering. Their analysis shows that while the C5.0 algorithm was especially effective in identifying key factors influencing accident severity, RF demonstrated strong capabilities in predicting accident hotspots. This consistent performance across both studies highlights RF’s versatility and effectiveness in TAS prediction and hotspot analysis.
Gradient Boosting and Extreme gradient Boosting (XGBoost) share several common characteristics as both belong to the gradient boosting family of ensemble learning methods. Their application in predicting TAS has shown significant promise across various studies, particularly for their ability to handle class imbalance and improve predictive accuracy. The research conducted by Zhang et al. [
11] introduces a novel methodology for forecasting the severity of injuries resulting from traffic accidents, employing ordinal classification techniques. In contrast to conventional nominal classification methods that fail to account for the ordered nature of injury severity, this investigation establishes a framework that adheres to the principles of rank consistency and monotonicity. The authors evaluate the efficacy of three distinct classifiers: Multi-Layer Perceptron (MLP), XGBoost, and Support Vector Machine (SVM), implementing a severity category-combination strategy to mitigate class imbalance within the crash data. The findings indicate that the proposed ordinal classification approach markedly surpasses traditional nominal classification methods, achieving an accuracy rate of 85% on a practical crash dataset. Additionally, key determinants affecting injury severity are discerned through permutation feature importance analysis. The study concludes that categorizing severity levels into three distinct classes—minor, moderate, and serious injuries—improves predictive accuracy while effectively addressing the issues associated with class imbalance in traffic accident data. In another study, Behboudi et al. [
12] conduct an extensive examination of various ML techniques utilized for predicting the severity of traffic accidents. The research assesses multiple models, such as RF, SVM, and Gradient Boosting, leveraging a dataset encompassing traffic accident records from numerous urban locations over an extended period. The authors note that the Gradient Boosting model achieved the highest performance, with an accuracy rate of 92%, underscoring its capability to manage intricate data and address class imbalances frequently encountered in traffic accident datasets. Additionally, the study underscores the significance of feature selection and data preprocessing in improving model efficacy. Nonetheless, the authors recognize certain limitations concerning the applicability of their findings across different geographical areas and accident types, indicating a need for further investigation to confirm these models in varied contexts.
Among ML models, logistic regression is considered as a statistical model widely used for binary classification tasks. Chong et al. [
13] investigate various ML algorithms, such as Logistic Regression, XGBoost, and RF, to assess the severity of traffic accidents using data from Texas spanning the years 2011 to 2021. Their findings indicate that Logistic Regression achieved the highest performance, with an accuracy rate of 88%, successfully pinpointing key factors that lead to accidents. The authors emphasize the importance of precise predictive models in reducing future incidents and enhancing road safety initiatives. Although their research provides significant insights into the classification of accident severity, they acknowledge that traditional models like Logistic Regression may not adequately capture more intricate relationships compared to DL methodologies. This study enhances the understanding of how various ML techniques can be utilized to improve predictions related to traffic safety.
The research conducted by Aboulola et al. [
14] explores the prediction of traffic accident severity through the implementation of diverse transfer learning techniques, while simultaneously improving model interpretability via Shapley values. The study employs a MobileNet architecture, achieving a notable accuracy of 98.17%. The dataset under examination spans five years (2016–2020) from New Zealand and includes factors such as accident location, weather conditions, and vehicle attributes. By clarifying the influence of various features on the severity of accidents, the research aims to bolster safety protocols and enhance predictive precision. However, the reliance on large labeled datasets and the significant computational resources required for explainability methods may present obstacles to the practical application of the findings in real-time contexts, especially in environments with limited resources.
Recent advancements in DL (Neural Network) approaches, including RNN, MobileNet, and Deep Spatiotemporal Hybrid Network (DSHN), have significantly contributed to predicting TAS. The research conducted by Sameen et al. [
15] examines the application of RNNs in forecasting TAS. The proposed model achieved an accuracy rate of 71.77% using a dataset sourced from the Crash Analysis System in New Zealand, covering the years 2016 to 2020. However, the study acknowledges certain limitations, such as dataset imbalance, which may influence the model’s predictive performance, as well as concerns regarding potential overfitting during the training process. In a further study, Aboulola et al. [
16] explore the use of transfer learning methodologies utilizing MobileNet to assess TAS. By analyzing a dataset from New Zealand collected between 2016 and 2020, the authors achieved a remarkable accuracy rate of 98.17%, demonstrating the model’s proficiency in categorizing accident severity levels. This research emphasizes the importance of XAI approaches, enabling stakeholders to understand critical factors contributing to accidents. The implementation of MobileNet addresses the limitations of traditional models, which often lack interpretability. Nevertheless, the authors caution that the model’s effectiveness may be confined to the specific dataset utilized, raising concerns about its applicability to different geographical areas or datasets. Overall, this study significantly contributes to the field by showcasing how sophisticated ML techniques can enhance predictions related to traffic safety. In a different research project, Kashifi et al. [
17] introduce an innovative methodology that leverages the DSHN to enhance traffic accident prediction. This framework integrates diverse data sources, including extensive traffic datasets from the Paris road network, meteorological conditions, and historical accident records. The DSHN combines CNN, LSTM, and Artificial Neural Networks (ANN), yielding an accuracy rate of 75.7% and an area under the curve (AUC) of approximately 0.800. This research highlights the critical role of road sensor data in improving predictive accuracy while addressing the complexities associated with data integration and preprocessing.
Deep Forest is a notable instance of Ensemble ML, classified as a Tree-Based Ensemble and Non-Parametric method. In their study, Jing Gan et al. [
18] proposed the Deep Forest algorithm as a viable alternative for predicting TAS. This innovative methodology addresses key limitations of traditional DL models, particularly their dependency on large datasets and substantial computational resources. The Deep Forest algorithm, which utilizes an ensemble of decision trees, was rigorously evaluated using the 2016 road safety dataset from the UK. The results revealed that this approach attained an impressive accuracy rate of 85.23%, while exhibiting a markedly reduced demand for computational resources compared to conventional neural networks. However, the authors recognized that the scalability of Deep Forests might be constrained in contexts involving large datasets. This research offers a more accessible yet effective strategy for predicting accident severity, thereby contributing to the growing body of knowledge in traffic safety analytics.
In their 2024 research, Saxena et al. [
19] introduce an advanced version of the YOLOv4 model aimed at enhancing the detection of traffic signs in difficult conditions for autonomous vehicles. The model utilizes CSPDarknet53 as its foundational architecture and incorporates strategies such as nighttime image enhancement and K-Means clustering combined with GIoU for optimizing anchor boxes, which allows for improved detection of smaller traffic signs. Evaluated on datasets including Tsinghua-Tencent 100K (TT-100K) and Mapillary Traffic Sign Dataset (MTSD), the model achieved accuracy rates of 94.80% and 80.71%, respectively. Although this model surpasses previous approaches in performance, its high computational requirements pose a challenge for real-time implementation on devices with limited resources.
3. Materials and Methods
This section describes the framework of our study, including the dataset used, the proposed Resnet and the models used for comparison. Each step in the process is meticulously designed to provide an in-depth understanding of the elements influencing accident severity and to improve the model’s predictive performance. The complete workflow of the study is illustrated in
Figure 1.
3.1. Dataset
The UK road accidents dataset has been extensively used in various research endeavors to develop predictive models for assessing accident severity [
20], pinpointing high-risk areas, and analyzing the efficacy of safety measures [
21]. Despite its wide use, the dataset presents some limitations, such as potential underreporting of incidents and inconsistencies in data collection among different police jurisdictions [
21]. The dataset covers accidents in the UK from 2005 to 2013, providing significant insights into trends and statistics related to road safety during this period. The accidents are classified into three classes according to their levels of severity, as outlined in
Table 2.
The dataset contains various attributes that are pertinent for analyzing traffic accidents, including information on accident locations, dates and vehicles involved.
Table 3 provides a detailed list of these features with their description. This diversity in the content of UK data allows for comprehensive and in-depth analysis of relationships among various factors influencing road safety.
3.2. Data Pre-Processing
Data pre-processing is a critical phase in developing predictive models, particularly for assessing TAS. This step is essential to ensure that the dataset is clean, consistent, and suitably organized for the application of ML algorithms. Fundamental pre-processing tasks include data cleaning, feature selection, data transformation, and data splitting into training and testing sets.
Data Cleaning: Data cleaning addresses issues within a dataset that may negatively affect the performance of predictive models. Initially, the analysis of the dataset revealed the presence of missing values in essential columns, including Number_of_Casualties and Weather_Conditions. To address the varying degrees of missing data, different imputation methods were utilized, specifically applying the mean or median for numerical attributes, while rows with a significant number of missing values in categorical fields were excluded from the dataset. Variables with a high percentage of missing and deficient data were excluded from our analysis. Excluded variables include Pedestrian_Crossing (90% missing), Pedestrian_CrossingPhysical_Facilities (80% missing), Carriageway_Hazards and Special_Conditions_at_Site (90% deficient). We also omitted the variable Did_Police_Officer_Attend_Scene_of_Accident as law enforcement generally occurs post-incident and does not necessarily correlate with accident severity. Accident_Index was also eliminated since it merely serves to count the number of accidents. Lastly, the Date and Time of Accidents were excluded as they do not logically affect the severity of the incidents. Furthermore, outlier detection techniques, such as Z-score analysis, were employed to identify and manage outliers in continuous variables like Speed_limit, thereby preserving the integrity of the data. In addition, normalization and standardization methods were applied to continuous variables such as Longitude and Latitude, ensuring that these features contribute equally to the model’s predictive capabilities.
Feature Selection: Feature selection plays a crucial role in improving model accuracy and mitigating overfitting by identifying the most relevant variables. To assess the relationships between various features and
Accident_severity, a correlation matrix was generated. This matrix offers valuable insights into the magnitude and orientation of linear relationships between the numerical variables. Features that exhibit significant correlations were retained for modeling. The correlation matrix for UK dataset features is visualized in
Figure 2. The correlation analysis indicates that
Light_Conditions and
Weather_Conditions have minimal impact on the outcomes, suggesting that these variables can be excluded from further consideration.
Data Transformation: Data transformation is essential for adapting categorical features into a format that is compatible with ML models. Categorical variables, such as Police_Force and Day_of_Week, were converted through one-hot encoding and label encoding, facilitating their numerical representation. One-hot encoding proved particularly effective for nominal categories by generating binary columns for each distinct category, whereas label encoding was utilized for ordinal variables. Additionally, feature engineering was performed to generate new variables that improve predictive accuracy. For instance, Time was categorized into morning, afternoon, evening, and night, and a binary feature was added to indicate whether an accident occurred on a weekend.
Data Splitting: Data Splitting is an essential process for assessing the performance of ML models. The dataset was split into three distinct subsets: training, validation, and testing. Approximately 70% of the data was used for training. A validation subset, consisting of 15% of the total data, was utilized to fine-tune model parameters and optimize hyperparameters, which helps mitigate the risk of overfitting. The remaining 15% of the dataset was kept for testing purposes.
3.3. DL Architectures for Severity Classification
The classification phase focuses on training the ResNet model and evaluating its performance with other architectural frameworks. The models have been selected based on their suitability for the task of TAS prediction. ResNet was selected due to its good performance in extracting complex hierarchical features; the other models, namely DarkNet, CNN, Xception, and LSTM, were used as baselines for comparing the performance with each other. Those models have been chosen, keeping in mind the variety of deep learning architectures that can be applied to the accident severity prediction.
3.3.1. Proposed ResNet Model
Residual Networks, commonly referred to as ResNet, represent a significant advancement in the field of DL. Developed by He et al. [
22], ResNet has shown notable success in various competitions and across multiple computer vision tasks, including image classification, object detection, and segmentation. Using skip connections, ResNet can preserve information across different layers, ensuring that even in extremely deep networks, information from earlier layers is retained and integrated with the more intricate features learned in deeper layers. This makes the model capable of capturing both low-level and high-level representations of the data. In this study, we propose an adaptation of the ResNet framework for structured data, specifically designed to predict TAS. Traffic accident datasets, often formatted as CSV files, encompass a combination of categorical and numerical variables (such as weather conditions, types of roads, and vehicle counts). The proposed ResNet is designed to explore the complex interrelationships among these features, which may not be linear or directly correlated, through residual connections. By employing residual learning, the network’s ability to discern deep hierarchical patterns without sacrificing performance.
The architecture of the proposed ResNet, illustrated in
Figure 3, consists of (1) an initial input layer to process dense features, (2) three residual blocks, each containing two dense layers with ReLU activation and batch normalization, interconnected by skip connections (3) a dropout layer to prevent overfitting, (4) a fully connected layer followed by an output layer for severity classification. By using dense layers for initial processing, residual blocks for robust feature extraction and a classification layer for prediction, the proposed architecture can effectively manage structured, non-image data, thereby encapsulating the intricacies involved in predicting the severity of accidents.
3.3.2. Architectures Compared
To assess the performance of ResNet, we conducted a comparative study with other popular DL architectures, namely CNN, LSTM, Xception and DarkNet.
Convolutional Neural Networks (CNNs) are a powerful class of DL architectures that excel in extracting spatial hierarchies of features from visual inputs, making them ideal for image and video recognition tasks due to their ability to discern intricate patterns and nuanced relationships within data [
23]. Utilizing CNNs to analyze TAS within CSV-based datasets requires transforming structured data, such as weather conditions, time of day, and road types, into grid-like structures or image formats. This preprocessing step allows CNNs to effectively detect complex spatial correlations among features, enhancing predictive accuracy by enabling the network to recognize interactions among variables that contribute to accident severity. As a result, CNNs enhance both the predictive capability and interpretability of TAS models, making them well-suited for identifying critical patterns within structured accident severity data.
The Xception is an evolution of the Inception architecture that employs depthwise separable convolutions to improve both depth and efficiency [
24]. This design makes Xception particularly effective at capturing fine-grained details within complex data, making it a robust choice for sophisticated recognition tasks. When adapted for predicting TAS, Xception can process CSV data transformed into a multi-channel format, enabling it to capture intricate feature interactions. Its suitability lies in its proven precision and ability to handle high-dimensional, structured datasets, where it can unveil relationships between variables that are crucial for accurately predicting accident severity.
Long Short-Term Memory (LSTM) networks are a specialized type of RNN designed to process sequential data and address the limitations of conventional RNNs, particularly with respect to long-term dependencies. Their architecture includes memory cells that preserve information across longer sequences, making them well-suited for tasks requiring temporal analysis [
25]. In TAS prediction, temporal factors—such as the chronological order of events leading up to an accident—are crucial. LSTMs can model these temporal relationships, leveraging historical sequences of accident data to identify patterns and trends that inform severity levels. This capability makes LSTMs particularly effective for datasets containing time-sensitive variables, as they can capture recurring patterns that are vital for precise predictions in accident severity scenarios.
Darknet is a neural network framework built in C and CUDA, designed to support the YOLO (You Only Look Once) family of models, widely recognized for their speed and accuracy in real-time object detection tasks [
26]. Although initially developed for image-based processing, Darknet can be adapted to handle structured CSV data through encoding techniques that leverage its efficient architecture. When applied to TAS data, attributes can be transformed into formats suitable for Darknet, enabling the framework to extract crucial patterns and characteristics related to accident severity. Its speed and processing efficiency make Darknet highly suitable for real-time systems aimed at predicting accident severity, where quick, reliable predictions are essential for adaptive traffic management and monitoring applications.
3.4. Model Interpretability
Comprehending the forecasts generated by the model is essential for real-world applications, particularly in domains such as traffic safety, where the outputs of the model significantly shape decision-making.
SHAP is a comprehensive framework designed for the interpretation of ML models, especially those characterized by their complex architecture regarded as “black boxes”, such as deep neural networks [
27]. Based on principles from cooperative game theory, SHAP assigns importance values to individual features for specific predictions, thereby promoting transparency and equity in decision-making processes. A typical SHAP summary plot provides a visual overview of feature importance and the nature of each feature’s impact on the model’s output. The
Y-axis ranks features in descending order of importance, according to their average absolute SHAP values across all instances. This ordering allows a quick identification of the features that have the most significant impact on predictions. The
X-axis displays the SHAP values, showing whether a feature influences prediction positively or negatively. Features associated with positive SHAP values enhance the prediction, whereas those with negative values reduce it. In the graphical representation, each data point is assigned a color based on the value of the associated feature for that particular instance. Generally, high feature values are depicted in red, while low values are represented in blue. This color scheme facilitates a rapid visual evaluation of the relationship between feature values and their influence on predictions. Every point on the SHAP plot corresponds to an individual observation from the dataset, illustrating the impact of varying feature values on predictions across numerous instances. This distribution aids in uncovering patterns and correlations between feature values and the outputs generated by the model.
4. Results and Discussion
This section describes the parameters and outcomes of the experiments performed. The objective is to compare the performance of the proposed TS classifier model against other established architectures and assess its effectiveness in predicting accident severity.
4.1. Experimental Setup
The experiments were conducted on a machine running Ubuntu 20.04 equipped with an NVIDIA GTX 1080 Ti graphics processing unit, featuring a boost clock frequency of 1582 MHz. All models were implemented using the TensorFlow framework, a well-recognized DL library known for its robust support of parallel computing and its effective integration with GPU technology. TensorFlow leverages the parallel processing power of NVIDIA GPUs via CUDA programming, facilitating the efficient utilization of GPU resources to enhance training speed.
The ResNet architecture was optimized using a learning rate of 0.001, a batch size of 32, and three residual blocks, each comprising 128 neurons. The model used categorical cross-entropy as the loss function, suitable for multi-class classification, allowing the model to output the probability distribution for each class. To mitigate the risk of overfitting, a dropout rate of 0.3 was implemented. The training process utilized the Adam optimizer with its default settings (beta1 = 0.9, beta2 = 0.999, epsilon = ) to enhance training efficiency. The model underwent training for 50 epochs, incorporating early stopping mechanisms to further prevent overfitting. The selection of these hyperparameters was achieved through a combination of grid search and manual tuning, aimed at maximizing performance for TAS prediction.
4.2. Evaluation Metrics
The performance of the implemented models was assessed through standard evaluation metrics based on the following definitions:
TP: Instances where the model correctly identifies an accident as belonging to a specific severity class.
TN: Instances where the model correctly predicted an accident as not belonging to a given severity class.
FP: Instances where the model incorrectly identifies an accident as belonging to a specific class when it does not.
FN: Instances where the model fails to identify an accident as belonging to the correct severity class.
Based on these definitions, the following evaluation metrics are defined:
Accuracy measures the model’s overall correctness by calculating the ratio of correct predictions to the total predictions. It represents the model’s predictive performance and provides a holistic assessment of its success rate in classification tasks, as illustrated in Equation (
1).
Precision quantifies the model’s effectiveness in correctly identifying positive instances by comparing TP predictions to the total positive predictions, highlighting its ability to minimize FP. As shown in Equation (
2).
Recall measures the model’s ability to identify all positive instances by calculating the proportion of TP to actual positives. It highlights the model’s capacity to minimize FN, ensuring that relevant instances, such as traffic signs, are correctly identified. As depicted in Equation (
3).
F1-score combines precision and recall into a single metric by calculating their harmonic mean. It provides a balanced evaluation of the model’s performance, making it particularly useful when the goal is to optimize both precision and recall simultaneously. As defined in Equation (
4).
We also assess the performance of the models by using class-wise accuracy. The dataset encompasses three distinct severity categories: Slight Injury, Serious Injury, and Fatal Accident. Due to the class imbalance, characterized by a lower frequency of fatal accidents relative to slight injuries, Class-wise accuracy evaluates the performance of a model for each individual class, providing a detailed understanding of its effectiveness in predicting varying degrees of severity. This metric quantifies the ratio of accurately classified instances within a specific class to the overall number of instances belonging to that class. Such an analysis yields valuable insights into the model’s performance on a class-by-class basis, which is especially beneficial in the context of multi-class classification challenges. As outlined in Equation (
5).
4.3. Performance Results
Table 4 presents a comprehensive evaluation of the performance metrics for the various models utilized for TAS detection, namely DarkNet, CNN, Xception, LSTM, and ResNet. The findings indicate that ResNet outperforms the other models, achieving the highest accuracy of 98.22%, along with precision of 98.40%, recall of 98.95%, and an F1 score of 98.53%. Notably, DarkNet ranks as the second-best performer, achieving an accuracy of 95.38%, precision of 96.36%, recall of 98.30%, and an F1 score of 97.78%. In contrast, the LSTM model exhibits comparatively lower performance, with precision at 82.29% and an F1 score of 84.72%, suggesting potential areas for enhancement. The Xception and CNN models also perform well with accuracy scores of 92.39% and 91.41%, respectively.
For further evaluation,
Table 5 shows the class-wise accuracy for the three severity levels. The results show that the models are effective in differentiating between severity levels, achieving the highest accuracy for predicting slight injuries (99%), followed by serious injuries (98%) and fatal injuries (95%).
To robustly assess the model’s performance, a 10-fold cross-validation strategy was implemented, ensuring that each data fold has the opportunity to serve as both training and validation data across multiple iterations. This approach maximizes the utility of the dataset while offering a reliable measure of model stability and generalization. As shown in
Table 6, each fold was trained on a specific combination of years (e.g., 2011, 2012) and validated against a separate year (e.g., 2013). This year-based cross-validation approach was chosen to simulate temporal data variation, helping the model adapt to patterns that may vary slightly over different years. The findings show an average precision of 99.80%, highlighting the model capability to minimize false positive predictions, a crucial factor for maintaining the reliability of interventions related to identified severe accident cases. Furthermore, with an average recall of 99.52%, the model exhibits a strong proficiency in identifying actual severe accidents, thereby reducing the likelihood of missing critical incidents. This aspect is especially vital in safety-sensitive areas, where the failure to recognize a severe case could lead to catastrophic outcomes. Moreover, The mean F1-score of 99.64% indicates a robust equilibrium between precision and recall, emphasizing the model’s ability to excel in various contexts without prioritizing one metric at the expense of another. Additionally, the overall accuracy of 98.85% validates the model’s consistent ability to accurately classify the severity of traffic accidents, thereby enhancing its credibility as a predictive instrument. These results underscore the model’s prospective uses in supporting road safety initiatives and guiding policy formulation. The consistently high metrics, combined with low standard deviation values, reflect the robust and stable performance of ResNet, reinforcing its credibility for TAS detection. The exceptional performance across all evaluated metrics implies that predictive analytics can significantly contribute to the enhancement of traffic safety measures and, ultimately, the reduction of accident occurrences.
Additionnaly, we also validated the proposed approach using a second dataset—”India accidents (2017–2022)” dataset. The dataset contains 32 features and encompasses 12,316 accident records. It includes various factors related to Road Accident Severity in India, including temporal variables such as the time of day and day of the week, demographic details like the age and gender of drivers, educational levels, vehicle characteristics, driving history, road conditions, and the severity of accidents. Analyzing this dataset allows for a deeper understanding of trends, correlations, and potential risk factors linked to vehicular accidents. As summarized in
Table 7, testing the proposed ResNet.
The notion of time complexity in learning models primarily relates to the training phase, where it evaluates the time required to modify the model’s parameters based on the input data. This complexity is shaped by several factors, including the complexity of the model’s architecture, the size of the training dataset, and the optimization techniques utilized.
Table 8 illustrates a comparison of the training and testing times for the models utilized in this research. Notably, the ResNet model exhibits remarkable computational efficiency, completing its training in just 225 s, which is significantly less than the training times of the other transfer learning models analyzed in this study. Importantly, this improved training efficiency does not compromise the model’s accuracy, as it consistently outperforms individual models in terms of predictive accuracy. However, when evaluating different models, variations in computational resources and training durations are critical factors, particularly for practical implementation. These elements can affect model selection based on the resources at hand and the specific requirements of the application. While ResNet provides excellent accuracy with a relatively short training time, other models, such as LSTM, which demand considerably longer training periods, may prove impractical for scenarios with limited computational resources or time limitations. Consequently, the selection of a model should aim to balance performance with resource availability, ensuring both optimal accuracy and computational practicality.
4.4. SHapley Additive exPlanations (SHAP)
The SHAP summary plot, showing the impact of features on the ResNet model predictions, is depicted in
Figure 4. The plot shows that among the features, the Number of Vehicles stands out as the most significant attribute, exhibiting a wide range of SHAP values. This indicates that variations in vehicle counts, whether high or low, play a substantial role in shaping the model’s predictions. A higher number of vehicles (illustrated by red points) typically correlates with high predicted severity, suggesting an increased risk or severity according to the model. Conversely, lower vehicle counts demonstrate a more varied influence, yielding both positive and negative contributions, likely due to contextual elements where a reduced number of vehicles can still be associated with high-impact occurrences.
Geographical characteristics, such as Location_Northing_OSGR and Latitude, also influence significantly the decision-making process of the model. The variation in SHAP values associated with these features demonstrates that geographical location impacts predictions in multiple ways. The color gradient, which illustrates feature values, indicates that the model is responsive to both elevated and diminished values of these features, potentially reflecting region-specific elements such as the quality of infrastructure, road design, or environmental conditions.
The Local_Authority_(District) characteristic, although more regionally focused, exhibits a comparably extensive range of SHAP values, highlighting the significance of local disparities. These disparities may indicate variations in enforcement strategies, traffic circumstances, or specific hazards pertinent to the area, all of which the model incorporates in its predictive analyses.
The features of Speed_limit and Urban_or_Rural_Area provide valuable insights into contextual influences. The distribution of SHAP values for Speed_limit is predominantly centered around zero, indicating a generally neutral effect on most predictions. Nevertheless, higher Speed_limit (represented by red points) demonstrate a marginally positive influence on the model’s output, implying that extreme Speed_limit may be associated with increased risk. In contrast, the Urban or Rural Area feature reveals a significant differentiation: urban regions (indicated by red points) exhibit a positive effect, whereas rural regions (shown by blue points) have a negative contribution. This observation is consistent with existing research on traffic incidents, which suggests that urban environments, characterized by greater traffic density and complexity, are more prone to accidents or severe incidents compared to their rural counterparts.
The variables 2nd_Road_Class and 2nd_Road_Number, although incorporated into the model, exhibit a limited range of SHAP values, suggesting a diminished overall impact on the predictions. Their effect appears to be contingent on specific contexts, serving a subordinate function in comparison to more influential factors like vehicle count and geographical location.
4.5. Discussion
Two studies were selected for comparison with the highest accuracy in our related work. The efficacy of predictive models is critical in the rapidly developing field of TAS prediction in order to develop strong safety actions. This study presents a ResNet-based model that achieves an impressive 98.22% accuracy, showcasing its ability to recognize complex correlations in traffic data. Moreover, this model not only performs better than existing methods, but it also addresses important weaknesses found in recent research.
In comparison, the model developed by Sameen et al. [
15] employs a RNN approach, reporting an accuracy of 71.77%. RNNs are well-suited for processing sequential data, utilizing internal memory to capture temporal dependencies in traffic patterns and analyze sequences of events leading to accidents. However, RNNs can face challenges with long-range dependencies due to vanishing gradient issues, potentially hindering their ability to learn complex relationships in large datasets. Moreover, while RNNs effectively capture temporal patterns, they may not fully leverage the spatial relationships inherent in traffic data, which are crucial for accurate severity prediction. Our ResNet model, characterized by its deep architecture, effectively addresses these complexities, resulting in enhanced predictive performance. Likewise, Aboulola [
16] employs a MobileNet transfer learning model, which, despite its efficiency and the incorporation of SHAP for improved interpretability, does not achieve the same level of depth and accuracy as our ResNet implementation.
Our model’s integration of SHAP is a noteworthy advantage as it provides a clear and concise understanding of the contributions of various variables to the forecasts. Unlike existing models that might not be as interpretable, our use of SHAP makes it possible for stakeholders to comprehend the factors affecting accident severity estimates. For experts in the field, this capability is crucial since it enables them to identify high-risk situations and carry out targeted actions. The clarity that SHAP offers is crucial because it increases user trust in the model’s predictions, which leads to improved outcomes for road safety.
Our ResNet-based model establishes a novel standard in predictive accuracy for TAS prediction while also prioritizing interpretability through the use of SHAP on a UK traffic dataset. This combination of strengths significantly increases its relevance in practical applications, where comprehending the intricacies of predictions is equally crucial as the predictions themselves. As traffic systems advance, the integration of DL frameworks such as ResNet, along with effective explainability mechanisms, will be essential for guiding policy decisions and enhancing safety protocols in Intelligent Transportation Systems.
5. Conclusions and Future Work
In conclusion, traffic accidents remain a major concern due to the injuries, fatalities, and traffic disruptions they cause. This study compared various DL architectures for TAS prediction, namely ResNet, CNN, LSTM, Xception and DarkNet, and applied Shapley values to identify critical factors impacting accident severity. To evaluate the models, the UK accident dataset was used. Among the tested models, ResNet demonstrated the highest accuracy at 98.22%, underscoring its potential as a reliable tool in predicting accident severity. Beyond the UK traffic dataset, ResNet maintained consistent on the India Accidents dataset, achieving an accuracy score of 98.16%. By enhancing the transparency, interpretability, and precision of severity predictions, this study contributes valuable insights for policymakers and stakeholders aiming to implement data-driven strategies to improve road safety.
While ResNet demonstrates consistent high performance across UK and India datasets, we acknowledge the importance of testing it further on additional datasets from other regions. Future work will include evaluating the model on datasets representing a wider range of geographic and socio-economic factors to ensure the model’s robustness and adaptability to different traffic environments. To advance accident severity prediction models, future research could integrate advanced ML and DL approaches, such as ensemble and hybrid models, to refine accuracy and improve generalizability. Expanding the model to include external factors, such as weather conditions, road quality, and driver behavior, will allow a more comprehensive understanding of accident causation. Employing real-time data from IoT traffic sensors and alternative data sources will further support accident prevention efforts. Moreover, addressing class imbalance and refining statistical methods for robust validation, particularly in high-risk zones, will contribute to a more holistic and reliable framework for TAS prediction.