Investigating Injury Outcomes of Horse-and-Buggy Crashes in Rural Michigan by Mining Crash Reports Using NLP and CNN Algorithms
<p>Data extraction framework.</p> "> Figure 2
<p>Crash narratives analysis flow.</p> "> Figure 3
<p>Unigram displayed as word clouds. (<b>a</b>) for minor/no injury; (<b>b</b>) for fatal/serious injury.</p> "> Figure 4
<p>(<b>a</b>) Trigram word frequency for fatal/serious injury outcome for horse-and-buggy crashes. (<b>b</b>) Trigram word frequency for minor/no injury outcome for horse-and-buggy crashes.</p> "> Figure 5
<p>The confusion matrix of the logistic regression model.</p> ">
Abstract
:1. Introduction
2. Materials and Methods
2.1. Crash Data
2.2. Methodology
2.2.1. Crash Diagrams Preprocessing
2.2.2. K-Fold Cross-Validation
2.2.3. AlexNet Convolutional Neural Network Architecture
2.2.4. Natural Language Processing (NLP) Techniques
2.2.5. Logistic Regression
- P(Y= 1|X) is the probability of the outcome being 1 given the value of the independent variable X.
- β0 is the intercept term.
- β1 is the coefficient associated with the independent variable X.
3. Results and Discussion
3.1. AlexNet-CNN Model for Crash Diagrams Processing
3.2. NLP Techniques for Crash Narratives Analysis
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Becklinger, N. An assessment of horse-drawn vehicle incidents from US news media reports within AgInjuryNews. Safety 2023, 9, 21. [Google Scholar] [CrossRef]
- Stein, R.E.; Dewalt, M.W. Prevalence of Accidents in Smaller Amish Settlements: 2015–2022. J. Plain Anabapt. Communities 2024, 4, 1–22. [Google Scholar] [CrossRef]
- Anderson, C. Horse and buggy crash study I: Common crash scenarios between a motor vehicle and the Amish/Old Order Mennonite horse and buggy. J. Amish Plain Anabapt. Stud. 2014, 2, 79–99. [Google Scholar] [CrossRef]
- Pérez-Zuriaga, A.M.; Dols, J.; Nespereira, M.; Garcia, A.; Sajurjo-de-No, A. Analysis of the consequences of car to micromobility user side impact crashes. J. Safety Res. 2023, 87, 168–175. [Google Scholar] [CrossRef]
- Yang, H.; Ma, Q.; Wang, Z.; Cai, Q.; Xie, K.; Yang, D. Safety of micro-mobility: Analysis of E-Scooter crashes by mining news reports. Accid. Anal. Prev. 2020, 143, 105608. [Google Scholar] [CrossRef]
- Kwayu, K.M.; Kwigizile, V.; Lee, K.; Oh, J.-S. Discovering latent themes in traffic fatal crash narratives using text mining analytics and network topology. Accid. Anal. Prev. 2021, 150, 105899. [Google Scholar] [CrossRef]
- Zhang, Y.; Lu, H.; Qu, W. Geographical detection of traffic accidents spatial stratified heterogeneity and influence factors. Int. J. Environ. Res. Public Health 2020, 17, 572. [Google Scholar] [CrossRef]
- Santos, K.; Dias, J.P.; Amado, C. A literature review of machine learning algorithms for crash injury severity prediction. J. Saf. Res. 2022, 80, 254–269. [Google Scholar] [CrossRef]
- Azhar, A.; Ariff, N.M.; Bakar, M.A.A.; Roslan, A. Classification of driver injury severity for accidents involving heavy vehicles with decision tree and random forest. Sustainability 2022, 14, 4101. [Google Scholar] [CrossRef]
- Muhammad, I.; Liu, L.; Muhammad, Z.; Arshad, J. A comparative study of machine learning classifiers for injury severity prediction of crashes involving three-wheeled motorized rickshaw. Accid. Anal. Prev. 2021, 154, 106094. [Google Scholar] [CrossRef]
- Nayak, R.; Piyatrapoomi, N.; Weligamage, J. Application of text mining in analysing road crashes for road asset management. In Engineering Asset Lifecycle Management, Proceedings of the 4th World Congress on Engineering Asset Management (WCEAM 2009), Athens, Greece, 28–30 September 2009; Springer: London, UK, 2010; pp. 49–58. [Google Scholar]
- Rahman, M.; Kockelman, K.M.; Perrine, K.A. Investigating risk factors associated with pedestrian crash occurrence and injury severity in Texas. Traffic Inj. Prev. 2022, 23, 283–289. [Google Scholar] [CrossRef] [PubMed]
- Kwayu, K.M.; Kwigizile, V.; Zhang, J.; Oh, J.-S. Semantic N-gram feature analysis and machine learning–based classification of drivers’ hazardous actions at signal-controlled intersections. J. Comput. Civ. Eng. 2020, 34, 4020015. [Google Scholar] [CrossRef]
- Athuraliya, C.D.; Gunasekara, M.K.H.; Perera, S.; Suhothayan, S. Real-time natural language processing for crowdsourced road traffic alerts. In Proceedings of the 2015 Fifteenth International Conference on Advances in ICT for Emerging Regions (ICTer), Colombo, Sri Lanka, 24–26 August 2015; pp. 58–62. [Google Scholar]
- Gao, L.; Wu, H. Verb-based text mining of road crash report. In Proceedings of the 92nd Annual Meeting of the Transportation Research Board, Washington, DC, USA, 13–17 January 2013. [Google Scholar]
- Hou, L.; Chen, H.; Zhang, G.; Wang, X. Deep learning-based applications for safety management in the AEC industry: A review. Appl. Sci. 2021, 11, 821. [Google Scholar] [CrossRef]
- Nixon, M.; Aguado, A. Feature Extraction and Image Processing for Computer Vision; Academic Press: Cambridge, MA, USA, 2019. [Google Scholar]
- O’Mahony, N.; Campbell, S.; Carvalho, A.; Harapanahalli, S.; Hernandez, G.V.; Krpalkova, L.; Riordan, D.; Walsh, J. Deep learning vs. traditional computer vision. In Advances in Computer Vision, Proceedings of the 2019 Computer Vision Conference (CVC), Las Vegas, NV, USA, 25–26 April 2019; Springer: Cham, Switzerland, 2020; Volume 1, pp. 128–144. [Google Scholar]
- Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 1–74. [Google Scholar] [CrossRef]
- Yuan, Z.-W.; Zhang, J. Feature extraction and image retrieval based on AlexNet. In Proceedings of the Eighth International Conference on Digital Image Processing (ICDIP 2016), Chengu, China, 20–22 May 2016; Volume 10033, pp. 65–69. [Google Scholar]
- MTCF. Michigan Traffic Crash Facts (MTCF). 2024. Available online: https://www.michigantrafficcrashfacts.org/ (accessed on 1 June 2024).
- Calhoun, B.C.; Uselman, H.; Olle, E.W. Development of Artificial Intelligence Image Classification Models for Determination of Umbilical Cord Vascular Anomalies. J. Ultrasound Med. 2024, 43, 881–897. [Google Scholar] [CrossRef]
- Abbas, R.F. Review on some methods used in image restoration. Int. Multidiscip. Res. J. 2020, 10, 13–16. [Google Scholar] [CrossRef]
- Samir, S.; Emary, E.; El-Sayed, K.; Onsi, H. Optimization of a pre-trained AlexNet model for detecting and localizing image forgeries. Information 2020, 11, 275. [Google Scholar] [CrossRef]
- Chen, H.-C.; Widodo, A.M.; Wisnujati, A.; Rahaman, M.; Lin, J.C.-W.; Chen, L.; Weng, C.-E. AlexNet convolutional neural network for disease detection and classification of tomato leaf. Electronics 2022, 11, 951. [Google Scholar] [CrossRef]
- Al Tawil, A.; Shaban, A.; Almazaydeh, L. A comparative analysis of convolutional neural networks for breast cancer prediction. Int. J. Electr. Comput. Eng. 2024, 14, 3406–3414. [Google Scholar] [CrossRef]
- Fang, A.; Kornblith, S.; Schmidt, L. Does progress on ImageNet transfer to real-world datasets? Adv. Neural Inf. Process. Syst. 2024, 36, 25050–25080. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Arteaga, C.; Paz, A.; Park, J. Injury severity on traffic crashes: A text mining with an interpretable machine-learning approach. Saf. Sci. 2020, 132, 104988. [Google Scholar] [CrossRef]
- Banks, G.C.; Woznyj, H.M.; Wesslen, R.S.; Ross, R.L. A review of best practice recommendations for text analysis in R (and a user-friendly app). J. Bus. Psychol. 2018, 33, 445–459. [Google Scholar] [CrossRef]
- Manning, C.; Schutze, H. Foundations of Statistical Natural Language Processing; MIT Press: Cambridge, MA, USA, 1999. [Google Scholar]
- Manning, C.D.; Surdeanu, M.; Bauer, J.; Finkel, J.R.; Bethard, S.; McClosky, D. The Stanford CoreNLP natural language processing toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Baltimore, MD, USA, 23–24 June 2014; pp. 55–60. [Google Scholar]
- Feldman, R.; Sanger, J. The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data; Cambridge University Press: Cambridge, MA, USA, 2007. [Google Scholar]
- NLTK. Natural Language Toolkit. 2024. Available online: https://www.nltk.org/ (accessed on 1 June 2024).
- Hadi, Z.; Sunyoto, A. Detecting Fake Reviews Using N-gram Model and Chi-Square. In Proceedings of the 2023 6th International Conference on Information and Communications Technology (ICOIACT), Yogyakarta, Indonesia, 10 November 2023; pp. 454–458. [Google Scholar]
- Fachrurrozi, S.; Shidik, G.F.; Fanani, A.Z.; Al Zami, F. Increasing Accuracy of Support Vector Machine (SVM) By Applying N-Gram and Chi-Square Feature Selection for Text Classification. In Proceedings of the 2021 International Seminar on Application for Technology of Information and Communication (iSemantic), Virtual, 18–19 September 2021; pp. 42–47. [Google Scholar]
- Cavnar, W.B.; Trenkle, J.M. N-gram-based text categorization. In Proceedings of the SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, NV, USA, 11–13 April 1994; Volume 161175, p. 14. [Google Scholar]
- Kaggle. 2024. Available online: https://www.kaggle.com (accessed on 15 June 2024).
- Wang, S.-H.; Xie, S.; Chen, X.; Guttery, D.S.; Tang, C.; Sun, J.; Zhang, Y.-D. Alcoholism identification based on an AlexNet transfer learning model. Front. Psychiatry 2019, 10, 454348. [Google Scholar] [CrossRef]
- Kalaiarasi, P.; Rani, P.E. A comparative analysis of AlexNet and GoogLeNet with a simple DCNN for face recognition. In Advances in Smart System Technologies: Select Proceedings of ICFSST 2019; Springer: Singapore, 2021; pp. 655–668. [Google Scholar]
- Singh, I.; Goyal, G.; Chandel, A. AlexNet architecture based convolutional neural network for toxic comments classification. J. King Saud Univ. Inf. Sci. 2022, 34, 7547–7558. [Google Scholar] [CrossRef]
- Schonlau, M.; Zou, R.Y. The random forest algorithm for statistical learning. Stata J. 2020, 20, 3–29. [Google Scholar] [CrossRef]
- Gorucu, S.; Murphy, D.J.; Kassab, C. Injury risks for on-road farm equipment and horse and buggy crashes in Pennsylvania: 2010–2013. Traffic Inj. Prev. 2017, 18, 286–292. [Google Scholar] [CrossRef]
- Babić, D.; Babić, D.; Fiolic, M.; Ferko, M. Road markings and signs in road safety. Encyclopedia 2022, 2, 1738–1752. [Google Scholar] [CrossRef]
- Batouli, G.; Guo, M.; Janson, B.; Marshall, W. Analysis of pedestrian-vehicle crash injury severity factors in Colorado 2006–2016. Accid. Anal. Prev. 2020, 148, 105782. [Google Scholar] [CrossRef]
- Houten, R.V.; Kwigizile, V.; Oh, J.S.; Mwende, S.; Qawasmeh, B. Effective Pedestrian/Non-Motorized Crossing Enhancements Along Higher Speed Corridors. No. SPR-1734; Michigan Department of Transportation, Research Administration: Lansing, MI, USA, 2023. [Google Scholar]
- Franklin, R.C.; King, J.C.; Riggs, M. A systematic review of large agriculture vehicles use and crash incidents on public roads. J. Agromed. 2020, 25, 14–27. [Google Scholar] [CrossRef]
- Qawasmeh, B.; Oh, J.S.; Kwigizile, V. Micro-Mobility Safety Assessment: Analyzing Factors Influencing the Micro-Mobility Injuries in Michigan by Mining Crash Reports. Future Transp. 2024, 4, 1580–1601. [Google Scholar] [CrossRef]
- Qawasmeh, B.S. Safety Assessment for Vulnerable Road Users Using Automated Data Extraction with Machine-Learning Techniques. Ph.D. Thesis, Western Michigan University, Kalamazoo, MI, USA, 2024. [Google Scholar]
- Janstrup, K.H.; Kostic, B.; Møller, M.; Rodrigues, F.; Borysov, S.; Pereira, F.C. Predicting injury-severity for cyclist crashes using natural language processing and neural network modelling. Saf. Sci. 2023, 164, 106153. [Google Scholar] [CrossRef]
- Wang, J.; Zhang, S.; Guo, H.; Tian, Y.; Liu, S.; Du, C.; Wu, J. Stereoscopic monitoring of transportation infrastructure. Autom. Constr. 2024, 164, 105472. [Google Scholar] [CrossRef]
- Abedi, M.M.; Sacchi, E. A machine learning tool for collecting and analyzing subjective road safety data from Twitter. Expert Syst. Appl. 2024, 240, 122582. [Google Scholar] [CrossRef]
- Qawasmeh, B. Enhancing Work Zone Safety: Evaluating Static Merge Strategies Through Microscopic Traffic Simulation. Open Transp. J. 2024, 18, e18744478330254. [Google Scholar] [CrossRef]
- Qawasmeh, B.; Oh, J.-S.; Kwigizile, V.; Qawasmeh, D.; Al Tawil, A.; Aldalqamouni, A. Analyzing Daytime/Nighttime Pedestrian Crash Patterns in Michigan Using Unsupervised Machine Learning Techniques and their Potential as a Decision-Making Tool. Open Transpl. J. 2024, 18, e26671212352718. [Google Scholar] [CrossRef]
- Al Tawil, A.; Almazaydeh, L.; Qawasmeh, D.; Qawasmeh, B.; Alshinwan, M.; Elleithy, K. Comparative Analysis of Machine Learning Algorithms for Email Phishing Detection Using TF-IDF, Word2Vec, and BERT. Comput. Mater. Contin 2024, 81, 3395–3412. [Google Scholar] [CrossRef]
Layer Type | Output Shape | Number of Filters | Kernel Size | Stride |
---|---|---|---|---|
Input | 227 × 227 × 3 | - | - | - |
Convolutional 1 | 55 × 55 × 96 | 96 | 11 × 11 | 4 |
Max Pooling 1 | 27 × 27 × 96 | - | 3 × 3 | 2 |
Convolutional 2 | 27 × 27 × 256 | 256 | 5 × 5 | 1 |
Max Pooling 2 | 13 × 13 × 256 | - | 3 × 3 | 2 |
Convolutional 3 | 13 × 13 × 384 | 384 | 3 × 3 | 1 |
Convolutional 4 | 13 × 13 × 384 | 384 | 3 × 3 | 1 |
Convolutional 5 | 13 × 13 × 256 | 256 | 3 × 3 | 1 |
Max Pooling 3 | 6 × 6 × 256 | - | 3 × 3 | 2 |
Fully Connected 1 | 4096 | - | - | - |
Fully Connected 2 | 4096 | - | - | - |
Fully Connected 3 | 1000 | - | - | - |
(a) | ||||||
#Epoch | Learning Rate | Fold | Accuracy | Precision | Recall | F-Score |
10 | 0.01 | 1 | 0.7832 | 0.8362 | 0.7622 | 0.7975 |
2 | 0.7924 | 0.8428 | 0.7672 | 0.8032 | ||
3 | 0.8015 | 0.8732 | 0.7365 | 0.7990 | ||
4 | 0.8035 | 0.8263 | 0.7993 | 0.8126 | ||
5 | 0.8126 | 0.8223 | 0.8165 | 0.8194 | ||
Mean | 0.80 | 0.84 | 0.78 | 0.81 | ||
10 | 0.001 | 1 | 0.8463 | 0.8846 | 0.8256 | 0.8541 |
2 | 0.8554 | 0.9003 | 0.8049 | 0.8499 | ||
3 | 0.8328 | 0.8954 | 0.7726 | 0.8295 | ||
4 | 0.8990 | 0.8460 | 0.9224 | 0.8825 | ||
5 | 0.8628 | 0.8243 | 0.8963 | 0.8588 | ||
Mean | 0.86 | 0.87 | 0.84 | 0.85 | ||
(b) | ||||||
#Epoch | Learning Rate | Fold | Accuracy | Precision | Recall | F-Score |
10 | 0.01 | 1 | 0.7946 | 0.8473 | 0.7592 | 0.8008 |
2 | 0.8003 | 0.8846 | 0.7473 | 0.8102 | ||
3 | 0.7938 | 0.8532 | 0.7587 | 0.8032 | ||
4 | 0.8199 | 0.8324 | 0.8091 | 0.8206 | ||
5 | 0.8235 | 0.8638 | 0.7837 | 0.8218 | ||
Mean | 0.81 | 0.86 | 0.77 | 0.81 | ||
10 | 0.001 | 1 | 0.8375 | 0.9105 | 0.7728 | 0.8360 |
2 | 0.8475 | 0.9035 | 0.7831 | 0.8390 | ||
3 | 0.8583 | 0.8854 | 0.8166 | 0.8496 | ||
4 | 0.9023 | 0.8994 | 0.8982 | 0.8988 | ||
5 | 0.9206 | 0.8489 | 1.0082 | 0.9217 | ||
Mean | 0.86 | 0.87 | 0.87 | 0.89 |
# Feature | Top Features (Trigram) | Chi-Square Score | p-Value |
---|---|---|---|
1 | Activity inside vehicle | 5.524 | 0.0187 |
2 | Distracted activity inside | 5.524 | 0.0187 |
3 | Fail yield stop | 5.069 | 0.0244 |
4 | Driver overtaking passing | 4.414 | 0.0356 |
5 | Stop sign older | 4.344 | 0.0371 |
6 | Rear left corner | 4.344 | 0.0371 |
7 | Inside vehicle eating | 4.143 | 0.0418 |
8 | Driver night road | 4.143 | 0.0418 |
9 | Driver careless drive | 4.143 | 0.0418 |
10 | Older driver fail | 4.143 | 0.0418 |
Severity Outcome | Precision | Recall | F1-Score | Model Accuracy |
---|---|---|---|---|
Minor/No Injury = 1 | 0.70 | 1.00 | 0.82 | 0.83 |
Fatal/Serious Injury = 2 | 1.00 | 0.82 | 0.90 |
Features (from the Trigrams) | Coefficient | p-Value | Significant at 95% CL |
---|---|---|---|
Stop sign older | −0.321024 | 0.01245 | Significant |
Fail stop acd | −0.318333 | 0.00863 | Significant |
Driver overtaking passing | −0.309256 | 0.02653 | Significant |
Older driver fail | −0.226745 | 0.02435 | Significant |
Driver night road | 0.245440 | 0.00258 | Significant |
Driver careless drive | 0.247515 | 0.00023 | Significant |
Driver going straight | 0.294194 | 0.36525 | Not Significant |
Activity inside vehicle | 0.300843 | 0.00385 | Significant |
Distracted activity inside | 0.300843 | 0.00385 | Significant |
Going straight horse | 0.346014 | 0.25314 | Not Significant |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Qawasmeh, B.; Oh, J.-S.; Kwigizile, V. Investigating Injury Outcomes of Horse-and-Buggy Crashes in Rural Michigan by Mining Crash Reports Using NLP and CNN Algorithms. Safety 2025, 11, 1. https://doi.org/10.3390/safety11010001
Qawasmeh B, Oh J-S, Kwigizile V. Investigating Injury Outcomes of Horse-and-Buggy Crashes in Rural Michigan by Mining Crash Reports Using NLP and CNN Algorithms. Safety. 2025; 11(1):1. https://doi.org/10.3390/safety11010001
Chicago/Turabian StyleQawasmeh, Baraah, Jun-Seok Oh, and Valerian Kwigizile. 2025. "Investigating Injury Outcomes of Horse-and-Buggy Crashes in Rural Michigan by Mining Crash Reports Using NLP and CNN Algorithms" Safety 11, no. 1: 1. https://doi.org/10.3390/safety11010001
APA StyleQawasmeh, B., Oh, J. -S., & Kwigizile, V. (2025). Investigating Injury Outcomes of Horse-and-Buggy Crashes in Rural Michigan by Mining Crash Reports Using NLP and CNN Algorithms. Safety, 11(1), 1. https://doi.org/10.3390/safety11010001