Abstract
Orthopaedic diseases, which affect millions of people globally, present significant diagnostic challenges, often leading to long-term disability and chronic pain. There is an ongoing debate across the literature regarding the trustworthiness of artificial intelligence (AI) in detecting orthopaedic diseases. This systematic review aims to provide a comprehensive taxonomy of AI applications in orthopaedic disease detection. A thorough literature search was conducted across five major databases (Science Direct, Scopus, IEEE Xplore, PubMed, and Web of Science) covering publications from January 2019 to 2024. Following rigorous screening on the basis of predefined inclusion criteria, 85 relevant studies were identified and critically evaluated. For the first time, this review classifies AI contributions into six key categories of orthopaedic conditions on the basis of medical perspective: arthritis, tumours, deformities, fractures, osteoporosis, and general bone abnormalities. In addition to analyzing motivations, challenges, and recommendations for future research, this review highlights the various AI techniques employed, including deep learning (DL), machine learning (ML), explainable AI (XAI), fuzzy logic, and multicriteria decision-making (MCDM), as well as the datasets utilized. Furthermore, the trustworthiness of AI models is evaluated on the basis of seven AI trustworthiness components, aligned with European Union guidelines, within each category. These findings underscore the need for high-quality research to ensure that AI computational systems in orthopaedic disease detection are reliable, safe, and ethical. Future research should focus on optimizing AI algorithms, improving dataset diversity, and addressing ethical and regulatory challenges to ensure successful integration into clinical practice.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Orthopaedic diseases have a great impact on the musculoskeletal system, such as bones, cartilage, ligaments, tendons, and connective tissues, leading to various problems, including chronic pain, reduced mobility, and chronic disability [1]. According to the Global Burden of Disease Study, 1.71 billion people globally have musculoskeletal disorders such as osteoarthritis, rheumatoid arthritis, and low back pain, highlighting the need for better diagnosis and treatment [2]. Various major types of diseases exist under this line of thought, including fracture, bone tomour, arthritis (e.g., osteoarthritis and rheumatoid arthritis), and degenerative diseases such as osteoporosis and musculoskeletal injuries [3]. Recent advancements in artificial intelligence (AI) have had a tremendous impact on the enhancement of orthopaedic diseases diagnosis and paved the way for the implementation of intelligent systems in diagnostics [4]. Machine learning (ML), deep learning (DL) and multicriteria decision-making (MCDM) techniques have been applied in different branches of orthopaedics to assist specialists in handling complex radiological images (including magnetic resonance imaging (MRI), X-ray, and computed tomography (CT) [5]. To illustrate, ML/DL computational models have been effectively deployed to enable the detection of pathologies such as complex fractures, mild forms of osteoarthritis, and even early-stage bone tumours in a quick and efficient manner [6, 7]. Furthermore, clinicians use MCDM and fuzzy logic methods in orthopaedics to support sensible decision-making and assess several medical standards, promoting accuracy in measurements as well as tailored treatment recommendations [8,9,10]. These results indicate that the evolution and ongoing advancement of modern mechanisms will be vital for the operative management of contemporary orthopaedic disorders.
While computational artificial intelligence (AI) systems have transformed orthopaedic disease detection, they also pose considerable challenges regarding their trustworthiness and interpretability. The dark side is the complexity of the detection models, specifically the ML/DL algorithms in intelligent detection systems. This commonly results in a loss of transparency regarding the decision-making process, which may lead to reluctance among clinicians to fully trust such systems [11]. This is exactly where trustworthy AI comes into play. Trustworthy AI systems are designed with fundamental principles such as fairness, transparency, privacy, and accountability, thereby enabling their safe inclusion in clinical practice [12, 13]. Several disciplines have explored the area of trustworthy AI, for example, disaster management, where AI plays a pivotal role in early warning and decision-making regarding real-time interventions, and healthcare where AI is utilized for diagnosis and treatment planning [14, 15], 16. In orthopaedic disease detection, trustworthy AI must not only provide highly accurate diagnoses but also offer explainable and interpretable outcomes so that clinicians can understand and validate the AI decision-making process [17, 18]. The AI trustworthiness guidelines of the European Union emphasize legality, ethics, and sustainability to promote robust and fair AI solutions for orthopaedic disease detection and stakeholder trust on the basis of seven key components [19]. These components include ensuring human agency and oversight by upholding human rights and involving humans in critical decisions [20]. Additionally, technical robustness and safety are emphasized through security measures, backup systems, and ensuring the accuracy, reproducibility, and reliability of the AI-based computational detection system. Additionally, privacy and data governance prioritize maintaining data quality, protecting privacy, and ensuring data availability. Transparency involves clearly explaining decision-making processes to stakeholders. Diversity, nondiscrimination, and fairness aim to eliminate bias, ensure accessibility, adopt a universal design, and advocate for all users, including those with disabilities [21]. Environmental and societal well-being consider environmental, sustainability, and social impacts and promote democracy. Finally, accountability minimizes and reports negative impacts, manages trade-offs and provides redress when necessary [22].
Several attempts have been made to review AI-based orthopaedic disorder diagnoses in the literature. The survey [23] provided a structured review of existing studies from 2017 to 2021 in the PubMed, MEDLINE, and Embase databases that applied the DL model to various orthopaedic conditions. This study highlighted significant advancements and performance metrics in fracture detection, osteoarthritis diagnosis, and soft tissue disease classification. While it reviewed various studies, it failed to organize them into a detailed taxonomy. This would help in understanding the different types of computational AI systems more comprehensively. Moreover, a review [24] provided a comprehensive overview of the articles that use AI systems in musculoskeletal imaging, covering trauma, bone age estimation, osteoarthritis, tumours, and orthopaedic implants. This highlights the importance of AI for assisting radiologists in optimizing workflows, improving diagnostic accuracy, and handling increased workloads. However, this survey lacked a systematic method for literature evaluation, which may have biased the study selection and findings. There is little discussion regarding AI research dataset availability and quality, which is essential for developing robust and generalizable AI computational models. A recent review [25] suggested two ways to improve AI fracture diagnosis via orthopaedic X-rays. First, the training dataset quantity and quality should be increased, and more advanced deep learning algorithms should be used. To obtain a more complete diagnosis, the second technique integrates AI algorithms with CT and MRI. Table 1 shows the differences and coverage between the other reviews and the present study. It illustrates the coverage of multiple aspects and directions integrated with the context of computational AI systems that are used in orthopaedics, such as developed taxonomy, discussion analysis, and trustworthiness.
Despite prior efforts, the current evaluation lacks a defined classification, making intelligent system adoption comparisons difficult. Many reviews fail to analyze the literature thoroughly, suggesting bias and erroneous results. These studies focus on improvements and performance indicators, but they often disregard the reasons for incorporating AI systems into orthopaedics, such as diagnostic efficiency, and neglect dataset availability and quality for robust AI models. Additionally, the absence of detailed recommendations for future research and practical implementation leaves a gap in guiding the development and deployment of AI technologies in clinical applications. The literature review framework for the trustworthiness measurement of the contribution of intelligent systems in orthopaedic disease detection is presented in Fig. 1. Furthermore, the primary contributions of the presented systematic review are as follows:
-
Provide a structured and detailed taxonomy of AI computational systems that are used in orthopaedic disease detection.
-
Elucidating the crucial AI models and datasets used in orthopaedic disease detection to address key challenges in AI research effectively in this field.
-
Identifying and organizing the key findings into motivations, challenges, and recommendations makes it easier for readers to understand the current obstacles that need to be addressed for future advancements.
-
Provide a broad picture of the trustworthiness measurement requirements for orthopaedics studies based on trustworthy AI components and helps scholars identify current gaps and potential solutions.
The paper is structured as follows: Sect. 2 outlines the methodology employed for the systematic literature review, detailing the search strategy, eligibility criteria, and study selection process. Section 3 provides a comprehensive review of previous studies, with a focus on studies that used intelligent systems in orthopaedic disease detection. Section 4 discusses the key findings from the systematic review, highlighting the motivations for AI adoption, the challenges encountered, and future research opportunities in this field. Section 5 shows a classification of the AI methods and the dataset types that are identified in the literature, organizing them by functionality in orthopaedic diagnostics. Section 6 concludes the paper by summarizing the insights gained from the study, including recommendations for future research and the critical need for trustworthy AI systems in clinical practice.
2 Methodology
This systematic literature review was conducted in accordance with the "Preferred Reporting Items for Systematic Reviews and Meta-Analyses" (PRISMA) criteria and aligned with previous research on quality databases [27,28,29,30]. By broadening its scope beyond the trustworthiness of AI systems, this investigation aims to systematically assess contributions to orthopaedic disease detection by AI, to what extent these contributions span the entire spectrum of AI from development through deployment in addition to assessment. Although trustworthiness is an important dimension, this review provides a broader analysis of the implications for orthopaedic diagnostics dealing with the overarching research question described in the introduction section in greater detail. Literature searches of the IEEE Xplore, PubMed, Web of Science, Science Direct, and Scopus databases were chosen for their relevance and coverage in the healthcare, AI, and orthopaedics domains. PubMed was added because it covers biomedical literature, which includes good-quality research regarding musculoskeletal diseases. Scopus and Web of Science were selected on the basis of their extensive multidisciplinary coverage, which offers a wide collection of peer-reviewed studies across clinically related domains of AI. IEEE Xplore was selected because of its large store of state-of-the-art research in AI methodologies, which are foundational to this review. Finally, Science Direct was included in response to the high representation of technical and applied research on health status and health quality data in AI. Collectively, these two databases provide a comprehensive platform to capture the breadth of studies at the intersection of AI and orthopaedic disease detection. In parallel, where relevant, the review covers trustworthiness, which is consistent with the high-level aim of assessing the feasibility of intelligent systems and their responsible assimilation into clinical practice.
2.1 Search Strategy
For this investigation, five databases were thoroughly searched for English-language scholarly literature. The search ranged from January 2019 to 2024, when the detection of musculoskeletal diseases via AI methods increased due to breakthroughs in methods, technologies, and knowledge. The "OR" operator was used to connect "Musculoskeletal disease," "Orthopaedic," and "Bone disease," and the "AND" operator linked these phrases to "Artificial intelligence," as shown at the top of Fig. 2. This search approach was used to find relevant scholarly literature.
2.2 Eligibility Criteria
The systematic literature review employed a predetermined set of inclusion and exclusion criteria to guide the selection of pertinent contributions. The criteria were implemented to guarantee that the chosen literature was in accordance with the particular study aims and that a rigorous degree of methodology was used. The inclusion or selection of papers was determined on the basis of the following criteria:
-
The manuscripts must be composed in the English language and disseminated either through a scholarly journal or included in the official records of a prestigious conference.
-
These studies should focus primarily on AI methodologies and techniques for the detection of orthopaedic diseases in humans.
The exclusion criteria for the present study were as follows:
-
Any study focusing on orthopaedic disease detection outside the realm of AI applications was excluded.
-
Study types that are not pertinent to the subject matter, such as animal studies, letters to the editors, and case reports.
-
Studies published in languages other than English.
These criteria ensure that the focus remains tightly bound to the intersection of AI and orthopaedic disease detection, enhancing the relevance and quality of the findings.
2.3 Study Selection
Two reviewers separately gathered data from the chosen studies to eliminate bias and ensure accuracy. Standardized data extraction forms were employed by the reviewers to ensure uniformity throughout the gathered data. A third reviewer was consulted to settle any disputes that might have arisen between the reviewers. In an effort to improve the data's dependability, the reviewers additionally made an email correspondence with the study investigators to clarify any unclear or missing material from the reports. Furthermore, the first step of related studies that were gathered from the five mentioned databases did not utilize any automation techniques, meaning that all of the data were manually retrieved and confirmed by the reviewers. This manual method further ensures the integrity of the data utilized in the research by enabling careful evaluation of the context and subtleties within each report. The methodology involved several stages, as summarized in Fig. 2, starting with the removal of duplicate studies via Mendeley software. Titles and abstracts were reviewed to eliminate unrelated works, with discrepancies resolved by the corresponding author. A comprehensive examination of the full texts of the articles was performed according to predefined inclusion criteria. Initially, 1,657 entries were identified from various databases. After 344 duplicates were removed, 1,313 papers remained. Title and abstract evaluations excluded 628 articles, leaving 685 for detailed examination. Finally, 600 studies were excluded, resulting in 85 relevant studies included for in-depth analysis. In our present systematic review, essential information was extracted from the hard analysis of the resulting studies on several important variables to conduct subgroup analyses, including the type of AI technology used, the specific orthopaedic disease type targeted, the performance metrics applied to assess the AI models, dataset availability, and the primary findings reported.
3 Orthopaedic Disease Detection-Based AI: Taxonomy
The findings from the selected articles are detailed in this section, with each article analyzed and categorized on the basis of its objectives and contributions. The 85 articles that met the predefined criteria were organized into six primary categories, as shown in Fig. 3. This structured approach ensures systematic analysis on the basis of objective evidence from relevant studies. To provide a more detailed examination, subcategories were established within the major categories, further structuring and presenting the findings. These divisions allow for a wide-ranging overview of AI techniques in orthopaedic disease detection, highlighting advancements and challenges in the field. The categories, encompassing 85 articles, are as follows:
-
Arthritis: 4 out of 85 contributions (4.71%).
-
Tumours: 15 out of the 85 contributions (17.65%).
-
Deformities: 10 out of 85 contributions (11.76%)
-
Fractures: 45 out of 85 contributions (52.94%).
-
Osteoporosis: 2 out of 85 contributions (2.35%).
-
General bone abnormalities: 9 out of 85 contributions (10.59%).
3.1 Arthritis
The arthritis category included four of the 85 selected articles that focused on the role of AI in diagnosing conditions such as hip–knee osteoarthritis and rheumatoid arthritis, aiming to enhance patient care methodologies.
Reliance on gait data is affected by environmental conditions, marker placement consistency, and patient-specific gait patterns. Additionally, the relatively small size of the dataset introduces sampling bias, potentially limiting the model's ability to generalize to larger populations. To address these challenges, the study [31] developed a robust vision-based dataset using passive marker-based techniques to ensure consistent data collection. It optimized feature extraction via the Fractional Order Darwinian Particle Swarm Optimization (FODPSO) algorithm, enabling the precise identification of key regions of interest (ROIs) and the classification of abnormal gaits in knee osteoarthritis (KOA) patients via the k-nearest neighbor (KNN) algorithm, achieving high accuracy across different severity levels. Moreover, a previous study [32] demonstrated the use of DL models, specifically EfficientNetb7, along with XAI techniques such as Grad-CAM, to diagnose KOA via 8,260 knee X-ray images, which offers a valuable understanding of the model's decision-making process [32]. A significant disparity was observed between the number of normal (Grade 0) and severe osteoarthritis cases (Grade 4), creating challenges in training models to effectively recognize the underrepresented categories. To address the class imbalance issue, data augmentation techniques, such as histogram equalization and contrast enhancement, were applied, increasing the representation of severe cases from 51 to 357 samples. As a result, the model achieved a remarkable accuracy of 99.13% and excelled in distinguishing normal and severe cases. Furthermore, the study [8] introduced a framework using MCDM methods, including the Analytical Hierarchy Process (AHP) and fuzzy multichoice goal programming (FMCGP), to enhance shared decision-making for patients with KOA.
Furthermore, the study [8] introduced a framework using MCDM methods, including the AHP and fuzzy multichoice goal programming (FMCGP), to enhance shared decision-making for patients with KOA. While the dataset captures diverse patient goals, the lack of external validation datasets limits the generalizability of the model. Despite these limitations, the system has the potential for enhancing healthcare quality by addressing common challenges in decision-making and achieving 94.74% adherence to international patient decision-aid standards. For rheumatoid arthritis, which can lead to severe joint damage if untreated, one study [33] introduced a technique called automated rheumatoid arthritis classification via an arithmetic optimization algorithm with deep learning (ARAC-AOADL) using biomechanical images. The dataset in this study comprised 310 samples divided into three classes: Hernia (60), Spondylolisthesis (150), and Normal (100). The class imbalance and reliance on biomechanical features may not capture all underlying conditions influencing rheumatoid arthritis. Despite the importance of considering adversarial attacks during model development, the mentioned studies could be evaluated for not considering adversarial attacks during model development, which is crucial for ensuring model robustness. Adversarial attacks, which involve introducing imperceptible perturbations to input data to mislead models, pose significant risks to the reliability and robustness of AI systems, especially in critical applications such as medical imaging.
With respect to AI studies for arthritis detection, ensuring the trustworthiness of AI systems is paramount for their effective integration into clinical practice. Trustworthy AI involves several components, as outlined by the European Union's guidelines: technical robustness and safety; diversity; transparency; privacy and data governance; human agency and oversight; societal and environmental well-being; nondiscrimination and fairness; and accountability. These components help ensure that AI models operate in a reliable, transparent, and ethical manner. To evaluate the level of trustworthiness in the studies collected for this review, the authors categorized them on the basis of these trustworthiness components and rated their adherence to these criteria as very low (V-L), low (L), high (H), medium (M), or very high (V-H), highlighting both the strengths and areas that require significant improvement. The analysis of the collected studies, as shown in Table 2, reveals that technical robustness and safety are the most addressed components, with several studies achieving H- or V-H ratings. However, other critical areas, such as human agency, privacy, and transparency, consistently received L-to-V-L ratings [31, 32]. This suggests that while AI models for arthritis detection demonstrate considerable focus on technical performance, they frequently overlook key trustworthiness principles necessary for ethical and reliable clinical application. Moreover, the consistently low ratings of privacy, fairness, and accountability highlight a significant gap in ensuring that these AI systems adhere to broader ethical standards [32, 33]. This indicates a pressing need for future research to not only improve technical outcomes but also address the essential trustworthiness criteria that underpin the safe and responsible integration of AI in healthcare settings.
3.2 Tumours
In this research, tumours are categorized from 15 of the 85 total collected contributions into primary, metastatic, and unspecified-origin tumours, and potential computational intelligent techniques for determining the prognosis of this bone disease are scrutinized.
-
Primary tumours
Primary tumours are the initial cancer origin in the body [34], and the disease is further divided into three types, each linked to the impacted area: Ewing sarcoma, chondrosarcoma, and osteosarcoma. Ewing sarcoma is a rare type of malignancy that predominantly impacts pediatric and adolescent populations [35]. One study [36] leveraged the transfer learning approach, using a dataset of 182 radiographs, to develop a DL framework capable of distinguishing between osteomyelitis and Ewing sarcoma, achieving accuracies of 94.4% and 90.6% on validation and test data, respectively. Despite its high accuracy, the study does not address the model’s interpretability or how it handles various imaging conditions, both of which are critical for clinical adoption. However, chondrosarcoma is a type of cancer that typically starts with cells that produce cartilage, the tough, flexible tissue that cushions bones. Owing to the difficulty in accurately diagnosing atypical cartilaginous tumours and appendicular chondrosarcomas via traditional diagnostic methods, the authors of that study [37] developed the LogitBoost ML technique to address this challenge. The study involved 120 patients with confirmed lesions, and the classifier achieved an accuracy of 81% in the training group and 75% in the external test group, with a good intraclass correlation coefficient (ICC). Osteosarcoma, also known as osteogenic sarcoma, is the most common type of primary bone cancer. The study [38] proposed the use of the Honey Badger Optimization with Deep Learning Automated Osteosarcoma Classification (HBODL-AOC) model to identify the existence of osteosarcoma via fuzzy logic and medical images. However, the study could improve by providing a clearer strategy for model validation beyond accuracy, including metrics such as sensitivity and specificity.
-
Metastatic Tumours
Numerous investigations have explored the utilization of AI in detecting metastatic tumours that have spread to other parts of the body from their original part of the body [39]. Within the field of ML, the present study [40] aimed to assess the effectiveness of support vector machine (SVM) and random forest (RF) classifiers in identifying patients with incidental osteoblastic metastases of the spine by screening 200 dual-energy X-ray absorptiometry (DEXA) studies. This study's dependence on a relatively small dataset may limit its robustness and scalability. Furthermore, multi-input Convolutional Neural Network (CNN) models and Adaptive Moment Estimation (Adam) optimizer with two evaluation strategies were used to diagnose bone metastasis and metabolic bone diseases [41]. One study [42] proposed a CNN-based pipeline that incorporated an ML/DL component to predict the development of bone metastasis. The pipeline was used to construct several ML detection models on the basis of gene expression data. The best results were achieved with InceptionResNet-v2 for normal-abnormal differentiation and Inception-v3 for metabolic-metastatic differentiation. Furthermore, it does not consider the adversarial robustness of the model, which could be critical in real-world applications. In the same DL field, the study [43] developed an automatic image interpretation system using the ResNet-50 model to assist physicians in diagnosing cancer bone metastasis through bone scintigraphy, but the authors did not elaborate on the interpretability of the model, which is critical for clinical validation.
-
Unspecified Tumour Origin
Different studies do not specify whether the tumours under consideration are primary or metastatic and encompass a wider scope of topics about the detection of tumours via AI. One study [44] concentrated on the morphological characteristics of cancerous versus healthy bones, employing edge detection algorithms and histogram of oriented gradients (HOG) features, which resulted in an F1 score of 0.92 with the SVM model. Despite its effectiveness, the study did not examine the performance of the model under conditions of noise or other imaging artifacts, which are frequently encountered in practical scenarios. Another study [45] implemented bilateral filtering for noise reduction, adaptive histogram equalization for segmentation, and SVM for classification. In the same area, [46] employed a U-Net architecture with ResNet34 as a training baseline for the neural network, utilizing 12 data augmentation techniques. The study achieved an accuracy of 99.72% and an intersection over union (IoU) of 87.43%. Despite these impressive results, a more in-depth analysis of how different data augmentation techniques impact the model's robustness and performance could further enhance the study. In this study, [47], three DL models based on contrast-enhanced MR images were developed to improve the diagnostic efficacy for musculoskeletal tumors. These models significantly enhanced the diagnostic sensitivities of oncologists and orthopedists without impairing their specificities. However, the study focused primarily on sensitivity and specificity metrics, neglecting other important evaluation metrics such as precision and recall. Studies [48,49,50] have explored advanced DL techniques for the detection and classification of bone tumours, each highlighting unique approaches and results. In the study [48], a new approach for detecting bone tumour necrosis rates was presented by combining generative adversarial networks (GANs) and CNNs to simulate biopsy-based necrosis rate results. [49] developed a DL computational model for bone tumour assessment on radiographs by segmentation, bounding box placement, and classification via a mask region-based CNN (Mask-RCNN). Although the model's performance was comparable to that of musculoskeletal fellowship-trained radiologists, the study did not elaborate on the interpretability of the model’s predictions or its robustness in the presence of noisy or incomplete data. Furthermore, the study [50] aimed to assist physicians in detecting and classifying knee bone tumors via the Seg-Unet model with global and patch-based approaches. The model achieved an accuracy of 99.05% for classification and a mean IoU of 84.84% for segmentation. The patch-based approach improved malignant tumor detection. Moreover, femoral bone tumors abnormally grow in the femur, which is the largest bone in the human body and extends from the hip to the knee. However, one study showed that various CNN algorithms were used to detect and classify lesions in the proximal femur and achieved better performance than did practising orthopedic surgeons with varying experience levels [51]. The reviewed studies reveal that many datasets used in tumor detection AI lack sufficient representation of diverse patient populations. For instance, datasets such as the CNUH dataset [50] and TCIA dataset [38, 47] often provide limited information on ethnicity or geographic variability. Although some datasets include gender distributions and an extensive age range, they do not assess how these factors influence model performance, leaving gaps in understanding demographic-specific biases [44, 48]. Furthermore, models trained on homogeneous datasets risk suboptimal performance in underrepresented populations, as evidenced in the study, which struggled to detect osteosarcoma features in non-Caucasian populations [40]. Future research should prioritize multicenter collaborations to develop datasets that reflect global diversity.
With respect to the trustworthiness measurement for the studies related to AI-based tumours detection, the results in Table 3 reveal a clear focus on technical robustness and safety, with several studies achieving high ratings [37, 44]. However, critical trustworthiness components such as human agency, privacy, and transparency are consistently rated L or V-L [36, 38, 40]. This imbalance raises concerns about the ethical and practical integration of these AI models into clinical settings, where human oversight and data protection are essential. Additionally, the lack of attention given to diversity and fairness risks introduces biases in AI studies tumour detection [37, 43, 44]. While technical performance is prioritized, addressing these gaps is crucial for ensuring both the safety and ethical use of AI in orthopaedics healthcare centers.
3.3 Deformities
Among the 85 related studies, 10 were specifically dedicated to deformities and were divided into four significant subcategories: Kashin-Beck disease (KBD), with one contribution; developmental dysplasia of the hip (DDH), with three contributions; knee malalignment syndrome, with one contribution; and spine deformities, with five contributions.
-
Kashin-Beck Disease (KBD)
KBD is a chronic, endemic osteoarthropathy marked by deformity of the joints, especially in the hands and knees [52]. The early detection of KBD is particularly challenging due to the subtle radiographic changes in the metaphyseal zones and carpals during the initial stages. For this purpose, the sole study [53] on KBD focused on developing an algorithm to automatically screen for KBD using hand X-ray images. This method employs a CNN to extract both global and local features from images, which are then fed into a neural network for classification. The study achieved an accuracy of 98.5% and a sensitivity rate of 97.6%, outperforming methods that use only global features.
-
Developmental dysplasia of the hip (DDH)
DDH is a condition observed in infants and young children where the ball and socket joints of the hip do not form properly. The manual process of diagnosing DDH is time intensive, requiring clinicians to spend 150–200 s per case for tasks such as classification, angle measurements, and landmark detection. This lengthy process limits efficiency, especially in high-volume clinical settings, and increases the potential for fatigue-related errors [54]. Thus, the authors of that study [55] developed a pyramid nonlocal UNet (PN-UNet) to accurately detect and identify Misshapen anatomical landmarks, which are crucial for diagnosing DDH. It significantly reduced the diagnosis time to only 1.21 s per case while maintaining comparable diagnostic accuracy (86–95%) and reliability, demonstrating its potential to streamline workflows and improve productivity in clinical practice. Furthermore, the authors of [56] modified the U-net and ResNet architectures for radiographic measurements of the hip in adults, which could improve the efficiency and accuracy of diagnosing hip conditions. However, the mentioned studies could be criticized for not addressing the robustness of the model under both white-box and black-box adversarial examples, which are critical for ensuring reliable real-world applications.
-
Knee Malalignment Syndrome
Knee malalignment syndrome is a physiological condition in which the kneecap (patella) does not move normally in the cavity of the thigh bone [57]. This typically occurs due to abnormalities in the structure or mechanics of the lower body, leading to an imbalance or misalignment at the knee joint. One study [58] described the YOLO and ResNet landmark regression algorithm (YARLA) for the fully automated assessment of knee alignment from a full-leg X-ray dataset.
-
Spine Deformities
Spine deformities cover a vital area in orthopaedics and involve abnormalities that affect the structure and function of the spine [59]. One of the main challenges of scoliosis diagnosis is that orthopedic surgeons' manual measurement of the Cobb angle often leads to inconsistent and subjective diagnoses [60]. An automated system was adopted in the study [61] based on four Faster R-CNNs and ResNets with the Stochastic Gradient Descent (SGD) optimizer using X-ray images to automate Cobb angle measurement and classify scoliosis. This significantly reduced the reliance on manual input, ensuring consistency and objectivity in diagnosis. Moreover, identifying early-stage scoliosis with mild Cobb angle deformities poses challenges because subtle radiographic changes are difficult for both clinicians and models to detect accurately [62, 63]. This study [64] employed U-Net, which is specifically designed for biomedical image segmentation, to automatically segment vertebrae, locate endplates, and calculate Cobb angles with a prediction accuracy of 94.42%. Furthermore, employing DeepLabV3 and EfficientNet-B4 with optimization for segmenting and classifying lordosis on cervical X-rays achieved better accuracy than did two other surgeons [65].
The trustworthiness analysis of the studies on deformity detection, as shown in Table 4, highlights technical robustness and safety as the most emphasized component, with several studies achieving M- or V-H ratings [53, 55, 61]. However, other essential areas, including human agency, privacy, and transparency, consistently receive L or V-L ratings [53, 54, 58]. This pattern indicates that while the technical aspects of AI models for deformity detection are prioritized, the ethical dimensions are frequently overlooked. Furthermore, the consistently low ratings of diversity and fairness [60, 61] and accountability [63, 65] suggest that many studies lack adequate consideration of ethical inclusivity and the responsibility for potential negative outcomes.
3.4 Fractures
According to the taxonomy analysis of our systematic review, 45 out of 85 significant contributions were made within the fracture category. These contributions reflect the substantial focus on fractures in orthopaedic research, highlighting the importance of developing accurate and reliable AI models for diagnosing and managing various types of fractures. The high number of studies indicates a strong interest in leveraging AI to improve fracture detection, classification, and treatment, which is crucial for streamlining clinical workflows. For general fracture detection, traditional classifiers, such as KNN and SVM, often underperform because of incomplete feature extraction and their inability to handle high-dimensional image data effectively. Additionally, relying solely on traditional features, such as texture and shape, limits the depth of information used for classification, reducing overall accuracy. Therefore [66], utilized AlexNet for deep feature extraction and integrated learning to train classifiers, assigning different weight values on the basis of each classifier's contribution, which increased the accuracy of the clinical diagnosis. Furthermore, researchers have focused on adopting different natural language processing (NLP) methods to classify radiology reports in orthopaedic trauma, comparing different ML approaches, including a DL-based BERT model [67].
-
Femoral Fractures
Many studies have shown that AI methods can help with femoral fracture diagnosis. Notably, advanced CNN architectures such as VGG19, InceptionV3, and ResNet50 have been employed to improve the radiographic diagnosis of atypical femoral fractures, achieving an average accuracy rate of 81.93% [68]. Accurately diagnosing fractures can be challenging due to variations in imaging conditions. To address this, a novel encoder-decoder neural network is proposed, which incorporates radiology reports as additional information during training [69]. Conventional feature extraction procedures, such as Histogram of Oriented Gradients (HOG) and Local Binary Patterns (LBP), delivered limited performance due to their inability to fully capture the complex features required for accurate fracture detection [70]. The features extracted from the CNN layers were further refined using Bidirectional LSTM (BiLSTM) and Long Short-Term Memory (LSTM) architectures, which enhanced the classification performance for femoral fractures [71]. However, these studies often lack the use of selection and benchmarking techniques to choose the optimal detection model. Without these techniques, ascertaining the most effective model for clinical application is challenging, as there is no standardized method for comparing model performance across different datasets and conditions.
-
Ankle and Foot Fractures
The detection of ankle fractures that are occult presents several significant challenges. These fractures are subtle and not easily visible in standard radiographs, often leading to a greater risk of misdiagnosis. Additionally, less obvious fractures are underrepresented than easily identifiable fractures are, creating a class imbalance that can bias AI models during training. Therefore, multiple Deep Convolutional Neural Networks (DCNNs) have been employed to adapt features learned from general image recognition tasks (e.g., ImageNet) to the specific task of fracture detection [72, 73]. Furthermore, data augmentation techniques, including flipping and rotation (± 10°), were applied to increase the diversity of training samples, improving model robustness and generalizability. Moreover, single-view radiographs (e.g., anteroposterior) often fail to capture all relevant features of the ankle structure, making it difficult to detect fractures comprehensively. The study [74] addressed this issue by employing three-view radiographs with Inception-V3 and ResNet-50 models demonstrated superior performance compared with single-view radiographs, significantly enhancing the diagnostic process. A foot fracture diagnosis assistance system developed with the Gradient-weighted Class Activation Mapping (Grad-CAM) XAI technique and ensemble learning techniques was tested across various proficiency levels and showed marked improvements in diagnostic accuracy [75]. The identification of the optimal model is a critical step in ensuring the success and applicability of intelligent systems for detecting ankle and foot fractures. While the reviewed studies demonstrate significant advancements in developing AI models, they often fail to prioritize or identify the most effective detection model from the set of developed approaches. This oversight limits the potential to achieve the highest possible performance in terms of accuracy, sensitivity, and specificity.
-
Calcaneus and Vertebral Fractures
High-energy events often cause calcaneal fractures, which can severely impact mobility. Studies exploring the use of augmented and nonaugmented images with DL techniques have shown significant potential in classifying and detecting these fractures in CT images [76, 77]. Furthermore, an automated system utilizing a U-Net model was employed for anatomical angle measurement, fracture identification, and segmentation in radiographs, with high accuracy [78]. Vertebral fractures involving the collapse or breakage of spinal bones are particularly challenging. The primary challenge in vertebral fracture detection is class imbalance in the dataset, with a significantly greater number of normal vertebrae than fractured vertebrae [79]. This imbalance introduces bias in model training, making it difficult for AI algorithms to detect fractures accurately, particularly subtle or severe cases [80]. To address this, the study [80] balanced the dataset by undersampling normal vertebrae and employed YOLOv3 for precise vertebra localization and classification, combined with ensemble learning to improve accuracy and interpretability. Moreover, this study [81] utilized ensemble learning for the VGG16, VGG19, DenseNet201, and ResNet50 architectures with MRI data for vertebral fractures and achieved superior diagnostic efficiency compared with that of spine surgeons. However, there is a notable absence of consideration for adversarial examples, which can significantly impact the robustness and reliability of DL models.
Furthermore, the CNN tends to generate false positives, often misinterpreting degenerative changes, nutrient foramina, and ligament ossifications as fractures. Grad-CAM heatmaps, combined with a two-stage detection process, were used to visually highlight fracture regions, assisting radiologists in verifying AI predictions and reducing diagnostic errors [82]. A novel DL model combining YOLOv4 and ResUNet has been introduced to detect vertebral fractures from X-ray images [83]. It achieved high precision rates of 99% for healthy vertebrae, 74% for compression fractures, and 94% for burst fractures. However, these studies focused primarily on X-ray images, which are less detailed than CT or MR images. This might limit the model's applicability to more complex or subtle fracture cases.
-
Pelvic, Wrist, and Hand Fractures
According to the literature, many studies have explored different pelvic regions, including the sacral, hip, and acetabulum areas, each showing that AI methods can optimize current detection methodologies [84,85,86]. Pelvic and hip fractures involve multiple anatomical sites with complex morphologies, making it difficult to define fracture boundaries and accurately classify fracture types [87]. To address this issue, the study developed PelviXNet, a DL algorithm trained with point-based annotations instead of traditional bounding boxes [88]. This method efficiently captures regional information, enhancing the model's ability to detect fractures across diverse categories. One critical aspect of acetabular fracture classification is detecting iliopectineal line fractures, which can appear as minor disruptions that are difficult to identify via conventional radiographs, especially when noise, poor image quality, or complex anatomical structures obscure fracture visibility. To address this, Gaussian filtering was used to denoise radiographs while preserving essential edge details, and the Derivative of Gaussian (DoG) method was applied to highlight the iliopectineal line while minimizing false edges caused by noise or overlapping anatomical structures [89].
Distal radius (wrist fracture) detection typically requires a large dataset (thousands of images) to train DL models effectively [90]. Furthermore, studies [91, 92] have reported that the accuracy of fracture detection via CNNs is comparable to published values despite the low number of training datasets. However, this study [93] aimed to develop an intelligent system capable of accurately diagnosing distal radius fractures via a small biplane plain X-ray dataset. The AI system was trained via VGG16, which was originally trained on the ImageNet dataset, to compensate for the small dataset size, and its performance was evaluated via several metrics.
A major challenge in AI-based fracture detection is the black-box nature of models, which limits clinician trust due to the lack of transparency in decision-making [94]. To address this, Grad-CAM was applied to visualize key areas influencing predictions, making the EfficientNet-B4 model more interpretable and fostering trust among clinicians [95]. The study [96] evaluated the effectiveness of YOLO single-stage DL models (YOLOv5, YOLOv6, YOLOv7, and YOLOv8) for detecting wrist fractures in pediatric X-rays and reported that they were superior to the Faster R-CNN model. YOLOv8m showed the highest sensitivity (0.92) and mean average precision (mAP) of 0.95. This research underscores the potential of YOLO models to increase the accuracy and efficiency of pediatric wrist fracture diagnosis. However, the study evaluated the ability of a CNN to diagnose distal radius fractures via frontal and lateral wrist radiographs. The study included 503 patients with wrist fractures diagnosed via plain radiographs and 289 patients without fractures. However, the dataset used may not fully represent the diversity and variability observed in clinical practice. The relatively small and specific sample size could limit the generalizability of the CNN model to broader populations or different clinical environments. Furthermore, the manual cropping of radiographs used in that study might have introduced biases or inconsistencies. Scaphoid fracture detection poses significant challenges due to the unique structure and imaging characteristics of this small bone. A major difficulty lies in its small ROI, as the scaphoid occupies a tiny area in hand images. This results in a pronounced class imbalance during model training, with a significantly greater number of negative samples (nonfracture areas) than positive samples (fracture areas), making accurate detection more difficult [97,98,99]. To address this, the study [97] introduced a specialized network architecture called CSR-Net, which leverages cross-scale residual connections. These connections enable the model to effectively integrate features from different layers and scales, allowing it to focus on the small ROI of the scaphoid and improving its ability to detect fractures despite their small size. Additionally, the structural similarity between the scaphoid and surrounding carpal bones further complicates fracture identification, as these bones share similar radiographic features, making it challenging for models to distinguish subtle fractures from adjacent structures. Employing data augmentation techniques, such as rotation and contrast enhancement, alongside CNN models, provides a more balanced training set and improved generalizability [98, 99]. However, no study has yet developed a model capable of efficiently detecting these types of fractures while accounting for diverse patient demographic information.
-
Supracondylar and tibial plateau fractures
Supracondylar fractures, which commonly occur in children, present unique challenges due to their anatomical complexity, requiring a high level of expertise for accurate diagnosis. The variability in ossification centers and the presence of incomplete fractures in pediatric patients further complicate detection. Therefore, a dual-input CNN-based algorithm utilizing two identical ResNet-50 models for anteroposterior and lateral elbow radiographs demonstrated substantial efficacy in the automated detection of pediatric supracondylar fractures via conventional radiography [100]. Tibial plateau fractures, which occur at the top of the tibia, are critical because of their impact on knee stability and mobility. AI has made advances in this area with the development of a RetinaNet model trained on a substantial dataset of 542 X-rays [101]. This model performed robustly, indicating the ability of DL to handle the complexities of diagnosing fractures in load-bearing joints.
-
Rib Fractures
Rib fractures, especially incomplete and subtle fractures, are difficult to identify because of their small size and overlapping structures in chest CT images [102]. One study [103] employed a 3D object detection model and analyzed high-energy trauma patients’ CT scans via a CNN, which proved more effective than radiologist evaluations. The model retained 3D spatial continuity between CT slices, improving the detection of subtle and incomplete fractures. However, the study lacked a comparison of the transfer learning models with a standardized benchmark, making it difficult to objectively assess their relative performance. Moreover, the CCE-Net model, which incorporates contralateral, contextual, and edge-enhanced modules, uses a large dataset of 1639 digital radiography images for training, achieving high accuracy and demonstrating the potential of DL models to handle large and complex datasets [104].
-
Knee fractures and meniscus tears
Knee fractures, particularly those around complex knee joints, require precise classification to guide treatment strategies. The high granularity of the classification system requires the model to distinguish subtle differences between fracture types, increasing the risk of misclassification. Additionally, the lack of transparency in AI decision-making makes it challenging for clinicians to trust the model’s predictions, particularly for complex or ambiguous fracture types. To address this, a modified 26-layer ResNet-based CNN architecture was employed, enabling the model to extract detailed features and improve accuracy in distinguishing subtle fracture types [105]. Furthermore, Grad-CAM XAI techniques were utilized to generate heatmaps highlighting fracture regions. This approach enhances model interpretability and fosters clinician trust by providing visual explanations for AI decisions. According to the literature, MRI plays a crucial role in diagnosing meniscus tears, which are common knee injuries, by providing detailed visualization of cartilage structure, edema, and subchondral bone damage [106]. However, its effectiveness is often hindered by image degradation and blurred boundaries, posing significant challenges to accurate lesion detection and classification. The Dragonfly Optimization and Regional Similarity Transformation Algorithm (DO-RSTA) was employed to enhance this issue by correcting noise and uneven illumination along with a modified version of the AlexNet model, facilitating the classification of different lesion levels [107]. Compared with medial tears, DL models have demonstrated reduced sensitivity in detecting lateral meniscus tears, highlighting the need for further optimization in identifying these more subtle or less apparent disruptions [108]. The study utilized a multistep CNN architecture comprising coronal and sagittal convolutional blocks, batch normalization layers, and inception modules. These layers effectively extract subtle features, enhancing classification accuracy when trained on over 18,500 MRI scans from multiple institutions [109].
In measuring the effects of trustworthiness components on fracture detection studies, the component’s technical robustness and safety are notably well covered, with several studies achieving M–H ratings [66, 68, 79], as shown in Table 5. However, other critical aspects, such as human oversight, data governance, and clarity in decision-making, are consistently underrepresented, with most studies receiving V-L scores [66, 67, 71]. This finding points to a persistent gap in ensuring that AI models for fracture detection uphold essential ethical standards. The low focus on fairness and responsibility [70, 71] further highlights the need for a more balanced approach that prioritizes both the technical and ethical dimensions required for trustworthy AI in the medical field.
3.5 Osteoporosis
Osteoporosis is often characterized by a gradual loss of bone mineral density, leading to subtle fractures or deformities that are difficult to detect on conventional radiographs. Two of the 85 total collected contributions discussed the effectiveness of AI models in osteoporosis detection. Osteoporosis detection via dual-energy X-ray absorptiometry (DXA) faces several significant challenges. Inconsistent regions of ROI selection, often due to operator-dependent errors in identifying areas such as the lumbar spine or femur, affect the precision of bone mineral density (BMD) calculations and lead to variability in measurements. To address these issues, [110] proposed a detection system based on a multilayer perceptron neural network using digital images that accurately measured BMD via an automated model for ROI selection (e.g., the lumbar spine and femur), which minimizes operator dependency and ensures consistent and precise ROI localization. Moreover, variability in human measurements of the second metacarpal cortical percentage (2MCP), often due to manual annotation errors, reduces the reliability of osteoporosis diagnosis. Therefore [111], implemented automated laterality correction and vertical alignment normalization to standardize radiographs, which were integrated into a fully convolutional network (FCN-8) for segmenting the second metacarpal, enabling precise ROI extraction while minimizing human intervention. Moreover, the studies lack a clear outline of the methodological phases, making it difficult to recognize the step-by-step process and reproduce the experiments. A transparent and detailed methodology is crucial for other researchers to validate and build upon the findings.
The evaluation of trustworthiness in AI models for osteoporosis detection studies reveals significant issues, particularly with respect to human agency, privacy, and accountability, with both studies reporting V-L in these areas [110, 111]. This underscores a lack of clinician oversight and poor handling of sensitive patient data, which are critical for building trust in healthcare computer-aided detection systems.
3.6 General Bone Abnormalities
Many studies in the literature have highlighted issues related to the datasets used in abnormality detection. High image variability is a common challenge, as datasets often include diverse lower extremity radiographs (e.g., foot, ankle, knee, and hip) captured under varying imaging conditions, such as resolution, contrast, and noise, making uniform feature extraction difficult [112]. The study utilized DenseNet-161, which effectively captured complex features from lower extremity radiographs and visualized them via the Grad-CAM technique [113]. The model was pretrained on ImageNet and MURA datasets to leverage prior knowledge, enhancing performance on small datasets and reducing the need for extensive labeled training data [114]. Additionally, class imbalance poses a significant problem, with a disproportionate number of normal images compared with abnormal images, which biases model predictions and limits the detection of rare abnormalities. Addressing these issues is essential for improving model performance and reliability. Moreover, the authors of [115] proposed a new model based on DenseNet-169, DenseNet-201, and InceptionResNetV2 to enhance the detection of upper extremity abnormalities using limited data. However, the absence of selection and benchmarking techniques to choose the most suitable detection model implies a lack of application of those models in healthcare centers. Furthermore, owing to the limited number of abnormal cases in most of the bone diseases studied, many models face the risk of overfitting, particularly with DL architectures [116]. To address this issue, studies have combined multiple models, such as VGG-19 and ResNet, using an ensemble technique that assigns weights to each model's predictions [117, 118] This approach has improved overall accuracy and mitigated biases from individual models. Additionally, cross-validation was employed during training to evaluate the model’s robustness and stability, ensuring that the ensemble system generalized well across different data subsets [119, 120].
The trustworthiness evaluation of detecting general bone abnormalities studies shows notable deficiencies, particularly in terms of human agency, privacy, and accountability, where most studies have received V-L ratings [112, 113, 115, 117], as shown in Table 6. These issues highlight the need for improved clinician involvement and better data governance practices to ensure secure and ethical AI implementation. While some studies have performed moderately in terms of transparency and fairness, receiving M and H ratings [112, 114], the overall trustworthiness of these models is compromised by a lack of societal and environmental considerations.
4 Discussion
This section aims to present a detailed overview of the reasons, issues, and prospects of the use of AI in the detection of orthopaedic disease studies. It provides an essential contribution to the assessment of the already available literature by determining the major reasons in favor of using AI in orthopaedics, the technical or ethical barriers that impede such usage, and the directions for further research aimed at producing more effective, safe and trustworthy AI as an augmentative tool in orthopaedics.
4.1 Motivations
Many reviewed studies have shown that computational AI models in orthopaedic diagnostic systems have been developed to accurately identify complicated disorders such as acetabular fractures, scaphoid fractures, and bone cancers in a manner similar to that used by emergency room clinicians, improving doctors' diagnoses and patient outcomes [97]. Additionally, AI simplifies orthopaedic diagnostic workflows in healthcare centers, which can reduce diagnostic errors among doctors and optimize clinical outcomes [47]. Compared with traditional methods, automated AI systems detect vertebral and knee bone tumours more accurately and quickly [75]. AI models also improve decision-making, prevent misdiagnoses in critical conditions such as malignant tumours and fractures, and automate measurements, enhancing consistency and reducing subjectivity. Furthermore, many AI research initiatives are motivated by the significant impact that early diagnosis and preventive healthcare can have on treatment outcomes and quality of life [60, 107]. Intelligent computational models are being developed to reliably detect early signs of conditions such as wrist fractures and osteoarthritis [61]. Additionally, improving diagnostic procedures for atypical cartilaginous tumours and pelvic radiography and predicting high-risk bone metastasis demonstrated the ability of AI to provide targeted diagnostic support, which is essential for timely interventions [37]. In addition to improving diagnostic accuracy and clinical outcomes, the integration of trustworthy AI in orthopaedic diagnostics is vital for building clinician and patient confidence [8]. AI systems designed with transparency, explainability, and fairness are more likely to expand acceptance in healthcare centers [32]. By ensuring that AI algorithms are interpretable and provide a clear rationale behind their decisions, healthcare professionals can trust the system's output, which is crucial when dealing with complex and high-stakes diagnoses, such as bone tumours or fractures [50, 95]. Figure 4 illustrates the three key reasons in the literature for integrating AI into orthopaedics.
4.2 Challenges
According to the literature, acquiring large, well-labeled datasets for effective AI training in orthopaedics is time-consuming and labor-intensive, leading to data quality and availability challenges [37]. Small dataset sizes, high-resolution requirements, and the unique appearance of medical images complicate AI model training. Studies highlight these challenges, particularly in knee bone tumour detection, owing to the uncommon appearance and variety of poses in X-ray images [36, 50]. Without enough data, distinguishing between similar conditions such as Ewing sarcoma and acute osteomyelitis becomes difficult, leading to potential misdiagnoses and reduced accuracy. Many studies have reported that diverse datasets representing various patient demographics and medical conditions are crucial for the good generalization of computational models across populations [38, 51]. Strategic collaboration, imaging modalities such as X-rays, CT scans, and MRIs, and high-quality data annotation are essential for increasing dataset size and diversity for complete bone disease diagnosis [70]. Many studies have highlighted the need for precise and reliable integrated diagnostic tools to improve patient healthcare and clinical adoption, especially for bone tumour diagnosis and osteosarcoma classification [46, 86]. By showing AI's ability to match or exceed human diagnostic accuracy in orthopaedic illnesses, especially femoral intertrochanteric fractures, AI technologies gain credibility and practical use [107]. Current orthopaedic diagnosis methods are difficult since they rely on interpretation competence, which can cause errors due to skill level, weariness, or minor symptoms. This issue is crucial for diagnosing proximal femur bone cancers and scaphoid fractures where misdiagnosis can cause prolonged pain and disability [113]. Even with two- and three-dimensional CT scans, the pelvic anatomy and unusual fracture forms, such as acetabulum fractures, make diagnosis problematic [66]. More efficient orthopaedic AI diagnosis systems are needed due to these limitations. Figure 5 presents the challenges in integrating AI in the context of orthopaedic disease detection according to three key directions.
4.3 Future Research Avenues
According to the literature, expanding and diversifying orthopaedic AI datasets improves performance and generalizability, which can improve the clinical feasibility and statistical reliability of these models, increasing their adaptability across healthcare institutions [107]. Multimodal imaging with detailed labeling and varied angles can increase diagnostic accuracy, especially for challenging classifications such as knee cartilage lesions [44, 75]. These recommendations emphasize the importance of diverse and comprehensive datasets for developing reliable AI diagnostic tools for orthopaedic diseases. Additionally, improving diagnostic tools involves optimizing features and expanding landmark identification for better accuracy and adaptability [66]. Key strategies include hybrid computational models that merge traditional and deep features, mobile apps for remote diagnostics, and diverse datasets with various fractures and imaging modalities [53]. Exploring different AI architectures and using metaheuristic algorithms such as ant colony and gray wolf methods can also enhance fracture detection [71]. These methods collectively emphasize the need for robust AI optimization to advance orthopaedic diagnostics. Another area where orthopaedic diagnostic AI needs clinical confirmation. External validation and clinical data integration are needed for meniscus tears, ankle fractures, and vertebral fractures [72, 109]. Accurate testing against established benchmarks is essential for ensuring reliability and effectiveness, enhancing diagnostic accuracy, and supporting clinical decision-making [81]. These steps are critical for successful intelligent system deployment in healthcare. Additionally, ethical deployment, regulatory approval, and the clinical advantages of trustworthy AI over traditional approaches should be addressed to enable the models to be applied in healthcare centers [88]. Validation and adherence to ethical guidelines are necessary to build trust and protect patient privacy [84]. Future work should explore the implications of computational detection models based on AI in healthcare, including potential biases and the need for transparency in AI algorithms, to promote ethical practices in the medical field. The four key directions of the literature review recommendations for integrating AI techniques with bone disease diagnosis are shown in Fig. 6.
5 AI Techniques and Datasets for Orthopaedic Disease Detection
This section aims to provide a comprehensive overview of the AI techniques and datasets used in the orthopaedic disease detection literature, highlighting the advancements and challenges in this field. SubSect. 5.1 focuses on categorizing the key AI methods and demonstrating their applications and effectiveness in orthopaedic diagnosis. SubSect. 5.2 discusses the availability and utilization of medical datasets, including X-ray, MRI, and CT scans, and explores how these datasets support diagnostics in orthopaedic research.
5.1 AI Directions
This subsection focuses on detailing the five main AI techniques (DL, ML, XAI, fuzzy logic, and MCDM) and their significance in orthopaedic disease detection. This study aims to categorize and analyze how these techniques contribute to improving diagnostic accuracy and decision-making in orthopaedic healthcare. In evaluating the effectiveness of AI techniques for orthopaedic disease detection, several key performance metrics are commonly used to assess model efficiency. Metrics such as accuracy, precision, recall, F1 score, IoU, confidence interval (CI), and AUC (Area Under the Curve) are frequently applied across studies. By employing these methods, researchers can address various challenges, such as image interpretation, complex disease classification, and personalized care.
5.1.1 Deep Learning (DL)
DL has emerged as a cornerstone in medical imaging, particularly for musculoskeletal disease detection [70]. Its ability to automatically extract complex features and learn hierarchical representations from imaging data surpasses traditional ML methods, making it indispensable for tasks such as fracture classification, osteoarthritis grading, and tumour detection [109, 114]. This section aims to explore the transformative role of DL techniques in the detection of orthopaedic diseases that improve patient outcomes [72, 90]. Tables 7, 8 and 9 illustrate the various applications of DL models in orthopaedic research, highlighting their effectiveness in identifying pathological changes and facilitating early diagnosis.
CNN-based models have shown high accuracy in diagnosing various conditions, such as vertebral fractures, with an accuracy of 86% [79], and intertrochanteric fractures, with an accuracy of 88% [70], aiding clinicians in detecting subtle patterns in medical images. Moreover, advanced models such as the Mask-RCNN for ankle fractures achieve high performance with 89% accuracy, emphasizing the potential of CNNs in automating fracture detection and assisting clinicians in making faster, more reliable decisions, as shown in Table 7 [72].
However, the reviewed models face several significant challenges that hinder their broader clinical adoption. They rely heavily on large, annotated datasets such as the MURA dataset, which are often scarce in orthopaedic diseases because of the expertise-driven nature of annotation. This limitation restricts their training on diverse and comprehensive datasets such as the Chinese triple-A grade hospital dataset [70], resulting in potential biases and overfitting. Moreover, variations in demographics, scanner types, and imaging protocols across institutions can lead to suboptimal performance when models are applied outside their training environment, further limiting their safe adoption in healthcare settings.
Pretrained CNN models have shown significant importance in orthopaedic disease detection, as highlighted in Table 8. By leveraging architectures like ResNet, EfficientNet, and DenseNet, pretrained models achieve higher accuracy in detecting conditions such as fractures, osteoarthritis, and bone metastases, with accuracies greater than 98%. These models, pretrained on large datasets such as ImageNet, transfer learned features to medical imaging tasks, enabling effective classification and segmentation even with limited annotated datasets such as the Pusan National University Hospital dataset [75].
For example, EfficientNet demonstrated exceptional performance in fracture detection, with accuracies exceeding 90%, whereas DenseNet showed its ability to perform tumour classification by reducing the number of false negatives. This transfer learning approach not only mitigates the need for extensive training data but also enhances generalizability, making pretrained CNNs indispensable for advancing orthopaedic AI systems. The reviewed studies employ various performance metrics, including accuracy, precision, recall, F1 score, and AUC, to evaluate the effectiveness of DL models for orthopaedic disease detection. However, a critical gap lies in the inconsistency of reporting metrics such as the False Positive Rate (FPR), which is crucial for understanding a model's propensity to generate false alarms. Additionally, some studies fail to report confidence intervals (CI) or standard deviations, both of which are essential for assessing the statistical reliability and robustness of the results.
Optimization algorithms, such as SGD and Adam, are essential for fine-tuning DL models, enabling them to learn more effectively from complex medical datasets, including radiographic images and patient data [84]. By refining the learning process, these algorithms help DL models converge on optimal solutions, improving diagnostic accuracy and reducing the likelihood of overfitting [51]. This is especially beneficial in detecting conditions such as fractures, musculoskeletal disorders, scoliosis, and bone metastasis, where precision is critical for treatment planning [78]. According to the literature, many studies have shown that the SGD algorithm enhances the diagnosis of various bone diseases [53, 88], as outlined in Table 9. Moreover, the Adam optimizer has proven effective in other studies for improving fracture detection, scoliosis, and bone metastasis [113]. Additionally, seven studies have combined both the SGD and Adam methods to achieve more efficient diagnostics [50, 51].
One challenge is that optimization algorithms require substantial computational resources, making real-time diagnostics difficult in resource-constrained environments, such as smaller healthcare facilities. Moreover, while methods such as SGD and Adam have shown promise, they are not immune to challenges such as local minima or convergence issues, particularly when dealing with highly complex or noisy data. As a result, further research is needed to explore more efficient optimization techniques and establish standardized practices that can enhance the robustness and scalability of DL-based diagnostic systems in orthopaedics.
While various DL architectures, such as ResNet-50, PelviXNet, DenseNet, and YOLOv8, have been employed to detect conditions such as fractures, scoliosis, and musculoskeletal disorders, there is no standardized approach for selecting the best model. This gap in the literature presents a significant challenge, as choosing the best model is crucial for ensuring the reliability of diagnostic systems. Without a clear methodology for model benchmarking, it becomes difficult to compare the performance of different models or determine which one is best suited for clinical applications in orthopaedics.
A review of related studies reveals that DL models for orthopaedic disease detection are often deployed online or integrated into cloud-based systems. While this integration enhances accessibility and scalability, it also potentially exposes these models to adversarial attacks, which can severely compromise the accuracy and reliability of diagnostic outcomes [121]. Despite this significant vulnerability, there is a notable absence of studies addressing this issue in the literature. No research has specifically explored the integration of AI-based orthopaedic detection models with considerations for adversarial attack mitigation, leaving a critical gap in ensuring the robustness and security of these systems in real-world applications.
5.1.2 Machine Learning (ML)
ML technologies like Linear Regression (LR) [89], SVM [45], and RF [40] have significantly improved the detection of various bone diseases, with a focus on their use in conditions such as bone cancer, fractures, and osteoporosis, as mentioned in Table 10. Notably, five studies have demonstrated the effectiveness of SVM alone or in combination with RF, particularly for complex cases such as osseous metastases, where some studies reported near-perfect sensitivity and specificity [40, 44]. Additionally, LogitBoost was specifically applied in one study for detecting chondrosarcomas, highlighting its potential in specialized diagnostic settings [37].
Moreover, Table 11 provides a detailed overview of the integration of ML/DL techniques for bone disease detection, highlighting various AI models applied to different types of orthopaedic diseases such as musculoskeletal disorders, fractures, and tumours. The models include advanced architectures such as ResNet, DenseNet, and ensemble learning approaches, with accuracy and other performance metrics such as precision, recall, and AUC reported across studies [36, 42, 81]. For example, CNN ensemble learning models for Ewing sarcoma achieve an accuracy of 94.4%, whereas ensemble learning models for musculoskeletal disorders report accuracies of approximately 83%. A key insight from this table is the broad adoption of ensemble learning combined with DL techniques, which consistently demonstrates strong performance in diverse applications, such as vertebral fracture and bone metastasis detection. However, there is an absence of clear methodological development phases for the models in some studies, making it difficult to replicate results or understand the processes involved. Additionally, practical results are not provided in several studies, limiting their applicability in orthopaedic health centers. Additionally, the lack of standardized benchmarks makes comparing AI models difficult.
5.1.3 Explainable AI (XAI)
Despite the transformative potential of XAI in enhancing the transparency and trustworthiness of AI models for musculoskeletal disease detection, its application remains limited, with only six studies addressing this critical aspect. A notable limitation is the exclusive reliance on Grad-CAM, leaving other advanced techniques, such as SHAP, LIME, or counterfactual explanations, unexplored [82, 95, 113]. These alternative methods could provide deeper insights into model decision-making and foster greater trust in AI systems. While integrating Grad-CAM with pretrained models offers a promising research direction, particularly in bone metastasis detection [32, 75, 105], reliance on a single XAI approach limits opportunities to improve interpretability and clinical adoption. Expanding the application of diverse XAI techniques is essential for developing robust and trusted AI systems.
5.1.4 Fuzzy Logic
Despite their limited application, fuzzy logic studies have shown significant potential for enhancing orthopaedic disease detection by addressing the inherent uncertainty in medical data [8, 38]. In particular, its integration with other AI methods could enable more precise diagnostic outcomes in complex conditions such as bone tumours and fractures. The limited use of fuzzy logic in orthopaedics suggests the need for more research exploring its potential in other bone conditions, where its ability to handle uncertain information could significantly increase diagnostic accuracy.
5.1.5 MultiCriteria Decision-Making (MCDM)
By systematically evaluating multiple criteria, MCDM provides a structured framework that enhances the objectivity and comprehensiveness of decision-making, ultimately leading to more personalized and effective patient care. However, only one study has shown how the integration of benchmarking methods with orthopaedic disease detection can significantly improve shared decision-making [8].
5.2 Dataset Availability
The purpose of this section is to underscore the importance of medical datasets in the development of AI and its application in the diagnosis of orthopaedic diseases. Data such as CT scans, MRIs, and X-rays serve to enhance and improve AI detection models in terms of fracture, tumours, and osteoporosis detection. According to the literature, several key datasets have been widely used as sources of raw data for AI models, enabling them to learn and distinguish fine features in medical images, thereby enhancing diagnostic accuracy and treatment options. One notable dataset is the Osteoarthritis Initiative (OAI), which contains longitudinal imaging data from over 4,796 subjects [31]. Its standardized imaging protocols and rich temporal data support disease progression modeling and risk prediction [58]. However, a significant limitation of the OAI is its lack of diversity in patient ethnicity and geographical representation, which restricts its applicability to global populations [122]. Among the studies reviewed in this paper, the MURA dataset developed by the Stanford ML Group has emerged as one of the most frequently used datasets for developing and validating AI models in orthopaedic applications [66]. It comprises over 40,000 radiographs across seven musculoskeletal regions and supports general-purpose model training [113]. Despite its extensive coverage, MURA is limited by its binary labels (normal or abnormal), which restrict the nuanced understanding of disease severity [118]. Additionally, it lacks contextual patient data, such as demographics or clinical history, which could improve model personalization [119]. Another frequently used resource is the Knee Osteoarthritis Severity Grading Dataset, which contains 8260 X-ray images graded via the Kellgren-Lawrence scale [32]. This dataset’s large sample size and well-annotated severity labels make it ideal for training models focused on severity classification [123]. However, it faces significant class imbalance, with an underrepresentation of Grade 4 cases, which can bias model predictions toward less severe conditions. To address these limitations, augmentation techniques have been applied to improve data representation for underrepresented categories.
According to the literature, the issue of bias in medical datasets is a critical factor affecting the performance and generalizability of AI models [80]. Biases can emerge due to unequal representations of patient demographics, imaging protocols, and disease severity levels [78, 80]. For example, datasets skewed toward specific populations, such as predominantly Caucasian or male participants, can lead to AI models that perform suboptimally in underrepresented groups, such as individuals from diverse ethnic backgrounds or female patients [40]. This bias in representation not only limits the clinical applicability of the models but also poses ethical challenges in ensuring equitable healthcare access. Additionally, imaging biases, including variations in scanner types, imaging settings, and data preprocessing techniques such as manual cropping, further compound the problem [44, 48]. For example, models trained predominantly on high-quality imaging data may struggle to perform effectively on noisy or low-resolution images commonly encountered in resource-limited settings [93]. The lack of external validation datasets to evaluate the performance of AI models across diverse clinical environments exacerbates these limitations. Addressing these biases is essential for developing robust and trustworthy AI systems. Data augmentation and undersampling of the normal class can be utilized alongside AI models to provide a more balanced training set and improved generalizability [98]. Furthermore, transfer learning techniques and synthetic data generation can mitigate class imbalance challenges, but they are not substitutes for inherently diverse and well-curated datasets [89]. Additionally, three-view radiographs can be used to overcome issues related to the absence of relevant features required for the detection process [74]. Future studies should prioritize multicenter collaborations to collect balanced datasets reflecting global diversity in demographics, imaging conditions, and disease presentations. Furthermore, integrating fairness metrics into model evaluation pipelines can help quantify and address biases, ensuring that AI models provide reliable outcomes for all patient groups. Consequently, increasing the availability of these datasets, both for public and private projects, is necessary to further increase the availability of orthopaedics AI applications.
5.2.1 Orthopaedic via X-Rays
X-rays are essential tools in orthopaedic diagnostics because they allow for the evaluation of the bone structure, density, alignment, and any trauma or morphological disease, including fractures and osteoporosis [44]. They are still among the simplest and most inexpensive means of assessment, assisting medical practitioners in the better management of disorders of the musculoskeletal system [45, 68]. Complete information concerning the detection of diseases on several X-ray datasets constituting the major sections of this study can be found in Table 12. The datasets differ in scale as well as in the presence or absence of the source [85]. Public datasets, one containing 40,561 samples [66] and another with 14,863 images [112, 113], help with the development and testing of AI models in orthopaedics, as they are the most desirable. These databases, which are available publicly and contain high volumes, have contributed to an increase in research activity and the development of tools for diagnosis that are more generalized by AI. In contrast, a few private datasets contain as few as 100 [71] or as many as 10,000 samples [54]. These types of datasets are not easy to access and reproduce, especially because they are usually within specific research teams or research institutes. Despite these constraints, the inclusion of legally collected datasets ensures that ethical standards are maintained, which is crucial for the clinical application and regulatory approval of AI systems.
5.2.2 Orthopaedic via MRI
In contrast, MRI is recognized for its superior soft tissue contrast resolution, making it an ideal choice when disorders involve the surrounding marrow, cartilage, ligaments, or muscles and can visualize early changes that may not yet be visible via X-rays [47, 51]. A range of MRI datasets utilized in the detection of orthopaedic diseases is shown in Table 13, which specifies their sizes and sources. Most patients’ public datasets, e.g., 2253 patients [41] or 1144 samples [38], establish a great basis for research into AI models that work regardless of the clinical setting. Although private sets are good for research, their availability is usually limited, which thereby inhibits their widespread adoption, coordination, and consistent validation in various studies as well [107, 120] Remarkably, in MRI, the size of the datasets in general is smaller than that of the X-ray datasets, which is attributed to the fact that MRI is more expensive and used for more specific applications. Some entries in the table also lack clarity about legal collection or dataset size, which can be a limitation for research transparency and ethical standards.
5.2.3 Orthopaedic via CT Scanning
Medical contributions have frequently highlighted the benefits of CT scans by providing detailed cross-sectional images of bone structures and complex anatomical regions, making them critical tools not only for the initial diagnosis of bone conditions but also for guiding surgical planning and assessing postoperative outcomes [97, 102]. Compared with X-rays, which provide two-dimensional images, CT scans provide three-dimensional images, making it easier to evaluate bone fractures, malformations, and malformations during presurgery [37]. Furthermore, CT scans provide a much better look at bones that are hard to evaluate on MRI scans, although MRI is better at soft tissue imaging [82]. Hence, CT scans become essential, particularly in the assessment and planning of surgeries where it is critical to understand the bony anatomy fully and how it may change after the procedure [99]. The datasets in Table 14 range from smaller private collections (e.g., 14 or 65 patients) [76, 103] to larger public collections (e.g., 2340 samples) [91], reflecting both focus research opportunities and broad-scale model training. Private datasets are beneficial for developing treatments for uncommon bone diseases, but their scale is often insufficient for performing AI generalization. Public datasets, on the other hand, contain a much wider array of information and thus reinforce the validation of AI models in different clinical settings through the provision of rich data.
5.2.4 Orthopaedic via Uncommon Types of Medical Images
Biomedical images, DEXA, and digital radiography play significant roles in enhancing the diagnosis of bone disorders. These imaging techniques add information that goes beyond the more common approaches, including X-rays, MRI, or CT imaging. For example, DEXA scans are particularly useful for accurate measurement of the amount of mineralized bone, which helps in the diagnosis of osteoporosis [40]. Biomedical imaging, especially with the use of nuclear medicine techniques, helps in understanding bone dynamics and pathology [33], whereas high-quality digital radiography images improve the quality and resolution of the images [104]. However, only two studies have explored the use of DEXA images for detecting bone disease, with 200 samples from patients with metastases of the spine and 615 samples from patients with osteoporosis [110]. Moreover, single studies of biomedical images and digital radiography have been conducted and performed for the detection of bone abnormalities. Increasing the development of such datasets for the creation of AI models could lead to substantial advances in diagnosis, especially for bone densitometry and metabolic bone disorders, which are often limited by imaging techniques.
6 Conclusions
The present systematic review aims to minimize potential biases and highlight how computational AI systems enhance diagnostic accuracy, decision-making, and patient care. Additionally, it evaluates the reliability of intelligent systems on the basis of ethical, legal, and technical norms through a critical analysis of prior research. This review thoroughly examines the use of AI models in detecting orthopaedic diseases via a structured and systematic approach. It highlights gaps in current research, focusing on advancements, challenges, and future directions. Despite the efforts of previous studies, many reviews lack systematic evaluation and structured taxonomy, leading to potential biases and unreliable conclusions. They often fail to address the motivations behind integrating intelligent systems in orthopaedics, such as the need for improved diagnostic efficiency. Moreover, the absence of detailed recommendations for future research and practical implementation leaves an issue in guiding the development and deployment of computational AI technologies in orthopaedic healthcare centers. Adhering to trustworthy AI principles is crucial for creating reliable, safe, and ethical healthcare solutions. Our study evaluated the contribution of the 85 reviewed studies in using intelligent detection models for orthopaedic diseases against trustworthy components, which emphasizes the need for ethical considerations in those models. While some studies met high standards, many fell into the V-L category for several criteria, indicating a need for higher-quality research. The primary goal was often to improve model efficiency rather than address trustworthiness issues. Researchers and practitioners should prioritize developing intelligent detection systems that emphasize explainability, fairness, privacy preservation, causality, and robustness, which are aligned with trustworthy AI principles in orthopaedic healthcare. The presented review also underscores the need for more research on reproducibility, model interpretability, and human oversight in computational AI detection techniques. Future AI-based orthopaedic disease detection applications may become more advanced and accessible, but AI is not a straightforward solution. The success of intelligent detection systems will depend on advances in computational models by exploring different architectural models and evaluating their performance, the availability of large and diverse datasets by utilizing multicenter studies and strategic collaborations, and the ability of detection tools to handle complex orthopaedic problems.
Data Availability
No datasets were generated or analysed during the current study.
References
Gupta, P., Marigi, E.M., Sanchez-Sotelo, J.: Research on artificial intelligence in shoulder and elbow surgery is increasing. JSES Int. 7, 158–161 (2023). https://doi.org/10.1016/j.jseint.2022.10.004
Overstreet, D.S., Strath, L.J., Jordan, M., Jordan, I.A., Hobson, J.M., Owens, M.A., Williams, A.C., Edwards, R.R., Meints, S.M.: A brief overview: sex differences in prevalent chronic musculoskeletal conditions. Int. J. Environ. Res. Public Health (2023). https://doi.org/10.3390/ijerph20054521
Guo, H., Gao, Y., Li, T., Li, T., Lu, Y., Zheng, L., Liu, Y., Yang, T., Luo, F., Song, S., Wang, W., Yang, X., Nguyen, H.C., Zhang, H., Huang, A., Jin, A., Yang, H., Rao, Z., Ji, X.: Structures of Omicron spike complexes and implications for neutralizing antibody development. Cell Rep. 39, 110770 (2022). https://doi.org/10.1016/j.celrep.2022.110770
Boussona, V., Benoista, N., Guetata, P., Attane, G., Salvatc, C., Perronnea, L., Bousson, V., Benoist, N., Guetat, P., Attané, G., Salvat, C., Perronne, L., Boussona, V., Benoista, N., Guetata, P., Attane, G., Salvatc, C., Perronnea, L.: Application of artificial intelligence to imaging interpretations in the musculoskeletal area: where are we? Where are we going? Jt. Bone Spine 90, 105493 (2023). https://doi.org/10.1016/j.jbspin.2022.105493
Botwe, B.O., Akudjedu, T.N., Antwi, W.K., Rockson, P., Mkoloma, S.S., Balogun, E.O., Elshami, W., Bwambale, J., Barare, C., Mdletshe, S., Yao, B., Arkoh, S.: The integration of artificial intelligence in medical imaging practice: perspectives of African radiographers. Radiography 27, 861–866 (2021). https://doi.org/10.1016/j.radi.2021.01.008
Kumar, R., Sharma, R.: Leveraging blockchain for ensuring trust in IoT: a survey. J. King Saud Univ. Comput. Inf. Sci. (2021). https://doi.org/10.1016/j.jksuci.2021.09.004
Sarhan, A.M., Gobara, M., Yasser, S., Elsayed, Z., Sherif, G., Moataz, N., Yasir, Y., Moustafa, E., Ibrahim, S., Ali, H.A.: Knee osteoporosis diagnosis based on deep learning. Springer, Netherlands (2024). https://doi.org/10.1007/s44196-024-00615-4
Chang, K.M., Chang, T.Y., Cheng-Yuan Ku, C., Chiu, C.W., Ter Chang, C.: Sharing decision-making in knee osteoarthritis using the AHP-FMCGP method. Expert Syst. Appl. 249, 123610 (2024). https://doi.org/10.1016/j.eswa.2024.123610
Albahri, A.S.S., Hamid, R.A., Abdulnabi, A.R., Albahri, O.S.S., Alamoodi, A.H.H., Deveci, M., Pedrycz, W., Alzubaidi, L., Santamaría, J., Gu, Y.: Fuzzy decision-making framework for explainable golden multi-machine learning models for real-time adversarial attack detection in vehicular ad-hoc networks. Inf. Fusion 105, 102208 (2023). https://doi.org/10.1016/j.inffus.2023.102208
Shayea, G.G., Zabil, M.H.M., Albahri, A.S., Joudar, S.S., Hamid, R.A., Albahri, O.S., Alamoodi, A.H., Zahid, I.A., Sharaf, I.M.: Fuzzy evaluation and benchmarking framework for robust machine learning model in real-time autism triage applications. Int. J. Comput. Intell. Syst. 17, 151 (2024). https://doi.org/10.1007/s44196-024-00543-3
Zsidai, B., Hilkert, A.S., Kaarre, J., Narup, E., Senorski, E.H., Grassi, A., Ley, C., Longo, U.G., Herbst, E., Hirschmann, M.T., Kopf, S., Seil, R., Tischer, T., Samuelsson, K., Feldt, R.: A practical guide to the implementation of AI in orthopaedic research – part 1: opportunities in clinical application and overcoming existing challenges. J. Exp. Orthop. (2023). https://doi.org/10.1186/s40634-023-00683-z
Karim, M.R., Jiao, J., Dohmen, T., Cochez, M., Beyan, O., Rebholz-Schuhmann, D., Decker, S.: DeepKneeExplainer: explainable knee osteoarthritis diagnosis from radiographs and magnetic resonance imaging. IEEE Access 9, 39757–39780 (2021). https://doi.org/10.1109/ACCESS.2021.3062493
Alsalem, M.A., Alamoodi, A.H., Albahri, O.S., Albahri, A.S., Martínez, L., Yera, R., Duhaim, A.M., Sharaf, I.M.: Evaluation of trustworthy artificial intelligent healthcare applications using multi-criteria decision-making approach. Expert Syst. Appl. 246, 123066 (2024). https://doi.org/10.1016/j.eswa.2023.123066
Lu, S., Christie, G.A., Nguyen, T.T., Freeman, J.D., Hsu, E.B.: Applications of artificial intelligence and machine learning in disasters and public health emergencies. Disaster Med. Public Health Prep. 16, 1674–1681 (2022). https://doi.org/10.1017/dmp.2021.125
Crigger, E., Reinbold, K., Hanson, C., Kao, A., Blake, K., Irons, M.: Trustworthy augmented intelligence in health care. J. Med. Syst. 46, 1–11 (2022). https://doi.org/10.1007/s10916-021-01790-z
Albahri, A.S., Jassim, M.M., Alzubaidi, L., Hamid, R.A., Ahmed, M.A., Al-Qaysi, Z.T., Albahri, O.S., Alamoodi, A.H., Alqaysi, M.E., Mohammed, T.J., Kou, G., Alotaibi, F.S., Sharaf, I.M.: A trustworthy and explainable framework for benchmarking hybrid deep learning models based on chest x-ray analysis in CAD systems. Int. J. Inf. Technol. Decis. Mak. (2024). https://doi.org/10.1142/S0219622024500019
Holzinger, A., Dehmer, M., Emmert-Streib, F., Cucchiara, R., Augenstein, I., Del Ser, J., Samek, W., Jurisica, I., Díaz-Rodríguez, N.: Information fusion as an integrative cross-cutting enabler to achieve robust, explainable, and trustworthy medical artificial intelligence. Inf. Fusion 79, 263–278 (2022). https://doi.org/10.1016/j.inffus.2021.10.007
Alzubaidi, L., Dulaimi, K.A.L., Salhi, A., Alammar, Z., Fadhel, M.A., Albahri, A.S., Alamoodi, A.H., Albahri, O.S., Hasan, A.F., Bai, J., Gilliland, L., Peng, J., Branni, M., Shuker, T., Cutbush, K., Santamaría, J., Moreira, C., Ouyang, C., Duan, Y., Manoufali, M., Jomaa, M., Gupta, A., Abbosh, A., Gu, Y.: Comprehensive review of deep learning in orthopaedics: applications, challenges, trustworthiness, and fusion. Artif. Intell. Med. 155, 102935 (2024). https://doi.org/10.1016/j.artmed.2024.102935
Markus, A.F., Kors, J.A., Rijnbeek, P.R.: The role of explainability in creating trustworthy artificial intelligence for health care: a comprehensive survey of the terminology, design choices, and evaluation strategies. J. Biomed. Inform. 113, 103655 (2021). https://doi.org/10.1016/j.jbi.2020.103655
Albahri, A.S., Duhaim, A.M., Fadhel, M.A., Alnoor, A., Baqer, N.S., Alzubaidi, L., Albahri, O.S., Alamoodi, A.H., Bai, J., Salhi, A., Santamaría, J., Ouyang, C., Gupta, A., Gu, Y., Deveci, M.: A systematic review of trustworthy and explainable artificial intelligence in healthcare: assessment of quality, bias risk, and data fusion. Inf. Fusion 96, 156–191 (2023). https://doi.org/10.1016/j.inffus.2023.03.008
Hohma, E., Lütge, C.: From trustworthy principles to a trustworthy development process: the need and elements of trusted development of AI systems. AI. 4, 904–925 (2023). https://doi.org/10.3390/ai4040046
Morley, J., Floridi, L., Kinsey, L., Elhalal, A.: From what to how: an initial review of publicly available ai ethics tools, methods and research to translate principles into practices. Sci. Eng. Ethics 26, 2141–2168 (2020). https://doi.org/10.1007/s11948-019-00165-5
Lee, J., Chung, S.W.: Deep Learning for orthopedic disease based on medical image analysis: present and future. Appl. Sci. (2022). https://doi.org/10.3390/app12020681
Gitto, S., Serpi, F., Albano, D., Risoleo, G., Fusco, S., Messina, C., Sconfienza, L.M.: AI applications in musculoskeletal imaging: a narrative review. Eur. Radiol. Exp. (2024). https://doi.org/10.1186/s41747-024-00422-8
Sharma, S.: Artificial intelligence for fracture diagnosis in orthopedic X-rays: current developments and future potential. Sicot-J (2023). https://doi.org/10.1051/sicotj/2023018
Federer, S.J., Jones, G.G.: Artificial intelligence in orthopaedics: a scoping review. PLoS ONE 16, 1–11 (2021). https://doi.org/10.1371/journal.pone.0260471
Moher, D., Liberati, A., Tetzlaff, J., Altman, D.G.: Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Int. J. Surg. 8, 336–341 (2010). https://doi.org/10.1016/j.ijsu.2010.02.007
Magabaleh, A.A., Ghraibeh, L.L., Audeh, A.Y., Albahri, A.S., Deveci, M., Antucheviciene, J.: Systematic review of software engineering uses of multi-criteria decision-making methods: trends, bibliographic analysis, challenges, recommendations, and future directions. Appl. Soft Comput. 163, 111859 (2024). https://doi.org/10.1016/j.asoc.2024.111859
Albahri, A.S., Khaleel, Y.L., Habeeb, M.A., Ismael, R.D., Hameed, Q.A., Deveci, M., Homod, R.Z., Albahri, O.S., Alamoodi, A.H., Alzubaidi, L.: A systematic review of trustworthy artificial intelligence applications in natural disasters. Comput. Electr. Eng. 118, 109409 (2024). https://doi.org/10.1016/j.compeleceng.2024.109409
Fadhel, M.A., Duhaim, A.M., Albahri, A.S., Al-Qaysi, Z.T., Aktham, M.A., Chyad, M.A., Abd-Alaziz, W., Albahri, O.S., Alamoodi, A.H., Alzubaidi, L., Gupta, A., Gu, Y.: Navigating the metaverse: unraveling the impact of artificial intelligence—a comprehensive review and gap analysis. Artif. Intell. Rev. 57, 264 (2024). https://doi.org/10.1007/s10462-024-10881-5
Kour, N., Gupta, S., Arora, S.: A vision-based clinical analysis for classification of knee osteoarthritis, Parkinson’s disease and normal gait with severity based on k-nearest neighbour. Expert. Syst. (2022). https://doi.org/10.1111/exsy.12955
Ahmed, R., Imran, A.S.: Knee osteoarthritis analysis using deep learning and XAI on X-rays. IEEE Access 12, 68870–68879 (2024). https://doi.org/10.1109/ACCESS.2024.3400987
Obayya, M., Alamgeer, M., Alzahrani, J.S., Alabdan, R., Al-Wesabi, F.N., Mohamed, A., Alsaid Hassan, M.I.: Artificial intelligence driven biomedical image classification for robust rheumatoid arthritis classification. Biomedicines (2022). https://doi.org/10.3390/biomedicines10112714
Le, V.-H., Kha, Q.-H., Hung, T.N.K., Le, N.Q.K.: Risk score generated from CT-based radiomics signatures for overall survival prediction in non-small cell lung cancer. Cancers Basel (2021). https://doi.org/10.3390/cancers13143616
Li Brizzi, C.L., Rao, S.S., Wang, K.Y., Levin, A.S., Morris, C.D.: Survey of sarcoma surgery principles among orthopaedic oncologists. Surg. Oncol. 42, 101782 (2022). https://doi.org/10.1016/j.suronc.2022.101782
Consalvo, S., Hinterwimmer, F., Neumann, J., Steinborn, M., Salzmann, M., Seidl, F., Lenze, U., Knebel, C., Rueckert, D., Burgkart, R.H.H.: Two-phase deep learning algorithm for detection and differentiation of ewing sarcoma and acute osteomyelitis in paediatric radiographs. Anticancer Res 42, 4371–4380 (2022). https://doi.org/10.21873/anticanres.15937
Gitto, S., Cuocolo, R., Annovazzi, A., Anelli, V., Acquasanta, M., Cincotta, A., Albano, D., Chianca, V., Ferraresi, V., Messina, C., Zoccali, C., Armiraglio, E., Parafioriti, A., Sciuto, R., Luzzati, A., Biagini, R., Imbriaco, M., Sconfienza, L.M.: CT radiomics-based machine learning classification of atypical cartilaginous tumours and appendicular chondrosarcomas. EBioMedicine (2021). https://doi.org/10.1016/j.ebiom.2021.103407
Vaiyapuri, T., Jothi, A., Narayanasamy, K., Kamatchi, K., Kadry, S., Kim, J.: Design of a honey badger optimization algorithm with a deep transfer learning-based osteosarcoma classification model. Cancers (Basel). (2022). https://doi.org/10.3390/cancers14246066
Li, M.D., Ahmed, S.R., Choy, E., Lozano-Calderon, S.A., Kalpathy-Cramer, J., Chang, C.Y.: Artificial intelligence applied to musculoskeletal oncology: a systematic review. Skeletal Radiol. 51, 245–256 (2022). https://doi.org/10.1007/s00256-021-03820-w
Mehta, S.D., Sebro, R.: Random forest classifiers aid in the detection of incidental osteoblastic osseous metastases in DEXA studies. Int. J. Comput. Assist. Radiol. Surg. 14, 903–909 (2019). https://doi.org/10.1007/s11548-019-01933-1
Hajianfar, G., Sabouri, M., Bagheri, S., Salimi, Y., Oveisi, M., Shiri, I., Zaidi, H.: Dual input scintigraphy image-based fused deep neural networks for bone abnormalities detection and differentiation. In: 2021 IEEE Nucl. Sci. Symp. Med. Imaging Conf., 2021: pp. 1–3. https://doi.org/10.1109/NSS/MIC44867.2021.9875765
Albaradei, S., Uludag, M., Thafar, M.A., Gojobori, T., Essack, M., Gao, X.: Predicting bone metastasis using gene expression-based machine learning models. Front. Genet. (2021). https://doi.org/10.3389/fgene.2021.771092
Zhao, Z., Pi, Y., Jiang, L., Xiang, Y., Wei, J., Yang, P., Zhang, W., Zhong, X., Zhou, K., Li, Y., Li, L., Yi, Z., Cai, H.: Deep neural network based artificial intelligence assisted diagnosis of bone scintigraphy for cancer bone metastasis. Sci. Rep. (2020). https://doi.org/10.1038/s41598-020-74135-4
Sharma, A., Yadav, D.P., Garg, H., Kumar, M., Sharma, B., Koundal, D.: Bone cancer detection using feature extraction based machine learning model. Comput. Math. Methods Med. (2021). https://doi.org/10.1155/2021/7433186
J.J.B. Jayachandran, S. Ambigapathy, P. Abirami, K. Ishwaryalakshmi, X-ray image analysis in identification of bone cancer using laws features and machine learning model. In: 2022 Int. Conf. Data Sci. Agents Artif. Intell., 2022: pp. 1–5. https://doi.org/10.1109/ICDSAAI55433.2022.10028844.
Bloier, M., Hinterwimmer, F., Breden, S., Consalvo, S., Neumann, J., Wilhelm, N., von Eisenhart-Rothe, R., Rueckert, D., Burgkart, R.: Detection and segmentation of heterogeneous bone tumours in limited radiographs. Curr. Dir. Biomed. Eng. 8, 69–72 (2022). https://doi.org/10.1515/cdbme-2022-1019
Zhao, K., Zhang, M., Xie, Z., Yan, X., Wu, S., Liao, P., Lu, H., Shen, W., Fu, C., Cui, H., Fang, Q., Mei, J.: Deep learning assisted diagnosis of musculoskeletal tumors based on contrast-enhanced magnetic resonance imaging. J. Magn. Reson. Imaging 56, 99–107 (2022). https://doi.org/10.1002/jmri.28025
Xu, Z., Niu, K., Tang, S., Song, T., Rong, Y., Guo, W., He, Z.: Bone tumor necrosis rate detection in few-shot X-rays based on deep learning. Comput. Med. Imaging Graph. 102, 102141 (2022). https://doi.org/10.1016/j.compmedimag.2022.102141
von Schacky, C.E., Wilhelm, N.J., Schäfer, V.S., Leonhardt, Y., Gassert, F.G., Foreman, S.C., Gassert, F.T., Jung, M., Jungmann, P.M., Russe, M.F., Mogler, C., Knebel, C., von Eisenhart-Rothe, R., Makowski, M.R., Woertler, K., Burgkart, R., Gersing, A.S.: Multitask deep learning for segmentation and classification of primary bone tumors on radiographs. Radiology 301, 398–406 (2021). https://doi.org/10.1148/radiol.2021204531
Do, N.-T., Jung, S.-T., Yang, H.-J., Kim, S.-H.: Multi-Level seg-unet model with global and patch-based X-ray images for knee bone tumor detection. Diagnostics (Basel, Switzerland) (2021). https://doi.org/10.3390/diagnostics11040691
Park, C.-W., Oh, S.-J., Kim, K.-S., Jang, M.-C., Kim, I.S., Lee, Y.-K., Chung, M.J., Cho, B.H., Seo, S.-W.: Artificial intelligence-based classification of bone tumors in the proximal femur on plain radiographs: System development and validation. PLoS ONE 17, e0264140 (2022). https://doi.org/10.1371/journal.pone.0264140
Xu, J., Wang, J., Zhao, H.: The prevalence of kashin-beck disease in China: a systematic review and meta-analysis. Biol. Trace Elem. Res. 201, 3175–3184 (2023). https://doi.org/10.1007/s12011-022-03417-x
Dang, J., Li, H., Niu, K., Xu, Z., Lin, J., He, Z.: Kashin-beck disease diagnosis based on deep learning from hand X-ray images. Comput. Methods Programs Biomed. (2021). https://doi.org/10.1016/j.cmpb.2020.105919
Liu, C., Xie, H., Zhang, S., Mao, Z., Sun, J., Zhang, Y.: Misshapen Pelvis landmark detection with local-global feature learning for diagnosing developmental dysplasia of the hip. IEEE Trans. Med. Imaging 39, 3944–3954 (2020). https://doi.org/10.1109/TMI.2020.3008382
Xu, W., Shu, L., Gong, P., Huang, C., Xu, J., Zhao, J., Shu, Q., Zhu, M., Qi, G., Zhao, G., Yu, G.: A deep-learning aided diagnostic system in assessing developmental dysplasia of the hip on pediatric pelvic radiographs. Front. Pediatr. 9, 785480 (2022). https://doi.org/10.3389/fped.2021.785480
Jensen, J., Graumann, O., Overgaard, S., Gerke, O., Lundemann, M., Haubro, M.H., Varnum, C., Bak, L., Rasmussen, J., Olsen, L.B., Rasmussen, B.S.B.: A deep learning algorithm for radiographic measurements of the hip in adults-a reliability and agreement study. Diagnostics (Basel, Switzerland) (2022). https://doi.org/10.3390/diagnostics12112597
Hernigou, P., Safar, A., Hernigou, J., Ferre, B.: Subtalar axis determined by combining digital twins and artificial intelligence: influence of the orientation of this axis for hindfoot compensation of varus and valgus knees. Int. Orthop. 46, 999–1007 (2022). https://doi.org/10.1007/s00264-022-05311-6
Tack, A., Preim, B., Zachow, S.: Fully automated assessment of knee alignment from full-leg X-Rays employing a “YOLOv4 And Resnet Landmark regression Algorithm” (YARLA): data from the osteoarthritis initiative. Comput. Methods Programs Biomed. 205, 106080 (2021). https://doi.org/10.1016/j.cmpb.2021.106080
Van der Britt Kolk, Y.M., Jorik Slotman, D.J., Nijholt, I.M., van Osch, J.A.C., Snoeijink, T.J., Podlogar, M., van Hasselt, B.A.A.M., Boelhouwers, H.J., van Stralen, M., Seevinck, P.R., Schep, N.W.L., Maas, M., Boomsma, M.F.: Bone visualization of the cervical spine with deep learning-based synthetic CT compared to conventional CT: a single-center noninferiority study on image quality. Eur. J. Radiol. 154, 110414 (2022). https://doi.org/10.1016/j.ejrad.2022.110414
Chen, Q., Liao, R., Shalaginov, M.Y., Zeng, T.H.: Scoliosis detection with convolutional neural networks. In: 2022 IEEE Int. Conf. Bioinforma. Biomed., 2022: pp. 3785–3787. https://doi.org/10.1109/BIBM55620.2022.9995579
Chen, P., Zhou, Z., Yu, H., Chen, K., Yang, Y.: Computerized-assisted scoliosis diagnosis based on faster R-CNN and ResNet for the classification of spine X-ray images. Comput. Math. Methods Med. 2022, 1–13 (2022). https://doi.org/10.1155/2022/3796202
Konieczny, M.R., Senyurt, H., Krauspe, R.: Epidemiology of adolescent idiopathic scoliosis. J. Child. Orthop. 7, 3–9 (2013). https://doi.org/10.1007/s11832-012-0457-4
Nguyen, T.P., Chae, D.-S., Park, S.-J., Kang, K.-Y., Yoon, J.: Deep learning system for Meyerding classification and segmental motion measurement in diagnosis of lumbar spondylolisthesis. Biomed. Signal Process. Control 65, 102371 (2021). https://doi.org/10.1016/j.bspc.2020.102371
Makhdoomi, N.A., Gunawan, T.S., Idris, N.H., Khalifa, O.O., Karupiah, R.K, Bramantoro, A., Abdul Rahman, F.D., Zakaria Z.: Development of scoliotic spine severity detection using deep learning algorithms. In: 2022 IEEE 12th Annu. Comput. Commun. Work. Conf. CCWC 2022, 2022: pp. 574–579. https://doi.org/10.1109/CCWC54503.2022.9720906
Fujimori, T., Suzuki, Y., Takenaka, S., Kita, K., Kanie, Y., Kaito, T., Ukon, Y., Watabe, T., Nakajima, N., Kido, S., Okada, S.: Development of artificial intelligence for automated measurement of cervical lordosis on lateral radiographs. Sci. Rep. (2022). https://doi.org/10.1038/s41598-022-19914-x
Yang, F., Ding, B.: Computer aided fracture diagnosis based on integrated learning. In: 2020 IEEE 3rd Int. Conf. Inf. Syst. Comput. Aided Educ., 2020: pp. 523–527. https://doi.org/10.1109/ICISCAE51034.2020.9236917
Olthof, A.W., Shouche, P., Fennema, E.M., IJpma, F.F.A., Koolstra, R.H.C., Stirler, V.M.A., van Ooijen, P.M.A., Cornelissen, L.J.: Machine learning based natural language processing of radiology reports in orthopaedic trauma. Comput. Methods Programs Biomed. 208, 106304 (2021). https://doi.org/10.1016/j.cmpb.2021.106304
Zdolsek, G., Chen, Y., Bogl, H.-P., Wang, C., Woisetschlager, M., Schilcher, J.: Deep neural networks with promising diagnostic accuracy for the classification of atypical femoral fractures. ACTA Orthop. 92, 394–400 (2021). https://doi.org/10.1080/17453674.2021.1891512
Lee, C., Jang, J., Lee, S., Kim, Y.S., Jo, H.J., Kim, Y.: Classification of femur fracture in pelvic X-ray images using meta-learned deep neural network. Sci. Rep. (2020). https://doi.org/10.1038/s41598-020-70660-4
Liu, P., Lu, L., Chen, Y., Huo, T., Xue, M., Wang, H., Fang, Y., Xie, Y., Xie, M., Ye, Z.: Artificial intelligence to detect the femoral intertrochanteric fracture: the arrival of the intelligent-medicine era. Front. Bioeng. Biotechnol. 10, 927926 (2022). https://doi.org/10.3389/fbioe.2022.927926
Acici, K., Sumer, E., Beyaz, S.: Comparison of different machine learning approaches to detect femoral neck fractures in x-ray images. Health Technol. (Berl) 11, 643–653 (2021). https://doi.org/10.1007/s12553-021-00543-9
Prijs, J., Liao, Z., To, M.-S., Verjans, J., Jutte, P.C., Stirler, V., Olczak, J., Gordon, M., Guss, D., DiGiovanni, C.W., Jaarsma, R.R.L., IJpma, F.F.A., Doornberg, J.N., Aksakal, K., Barvelink, B., Beuker, B., Bultra, A.E., Oliviera, L.C., Colaris, J., de Klerk, H., Duckworth, A., Ten Duis, K., Fennema, E., Harbers, J., Hendrickx, R., Heng, M., Hoeksema, S., Hogervorst, M., Jadav, B., Jiang, J., Karhade, A., Kerkhoffs, G., Kuipers, J., Laane, C., Langerhuizen, D., Lubberts, B., Mallee, W., Mhmud, H., El Moumni, M., Nieboer, P., Nijhuis, K.O., van Ooijen, P., Oosterhoff, J., Rawat, J., Ring, D., Schilstra, S., Schwab, J., Sprague, S., Stufkens, S., Tijdens, E., van der Bekerom, M., van der Vet, P., de Vries, J.-P., Wendt, K., Wijffels, M., Worsley, D., the M.L. Consortium: Development and external validation of automated detection, classification, and localization of ankle fractures: inside the black box of a convolutional neural network (CNN). Eur. J. Trauma Emerg. Surg. (2022). https://doi.org/10.1007/s00068-022-02136-1
Olczak, J., Emilson, F., Razavian, A., Antonsson, T., Stark, A., Gordon, M.: Ankle fracture classification using deep learning: automating detailed AO foundation/orthopedic trauma association (AO/OTA) 2018 malleolar fracture identification reaches a high degree of correct classification. ACTA Orthop. 92, 102–108 (2020). https://doi.org/10.1080/17453674.2020.1837420
Ashkani-Esfahani, S., Mojahed Yazdi, R., Bhimani, R., Kerkhoffs, G.M., Maas, M., DiGiovanni, C.W., Lubberts, B., Guss, D.: Detection of ankle fractures using deep learning algorithms. Foot Ankle Surg. 28, 1259–1265 (2022). https://doi.org/10.1016/j.fas.2022.05.005
Kim, T., Goh, T.S., Lee, J.S., Lee, J.H., Kim, H., Jung, I.D.: Transfer learning-based ensemble convolutional neural network for accelerated diagnosis of foot fractures. Phys. Eng. Sci. Med. 46, 265–277 (2023). https://doi.org/10.1007/s13246-023-01215-w
Aghnia Farda, N., Lai, J.-Y., Wang, J.-C., Lee, P.-Y., Liu, J.-W., Hsieh, I.-H.: Sanders classification of calcaneal fractures in CT images with deep learning and differential data augmentation techniques. Injury 52, 616–624 (2021). https://doi.org/10.1016/j.injury.2020.09.010
Pranata, Y.D., Wang, K.-C., Wang, J.-C., Idram, I., Lai, J.-Y., Liu, J.-W., Hsieh, I.-H.: Deep learning and SURF for automated classification and detection of calcaneus fractures in CT images. Comput. Methods Programs Biomed. 171, 27–37 (2019). https://doi.org/10.1016/j.cmpb.2019.02.006
Guo, J., Mu, Y., Xue, D., Li, H., Chen, J., Yan, H., Xu, H., Wang, W.: Automatic analysis system of calcaneus radiograph: Rotation-invariant landmark detection for calcaneal angle measurement, fracture identification and fracture region segmentation. Comput. Methods Programs Biomed. (2021). https://doi.org/10.1016/j.cmpb.2021.106124
Murata, K., Endo, K., Aihara, T., Suzuki, H., Sawaji, Y., Matsuoka, Y., Nishimura, H., Takamatsu, T., Konishi, T., Maekawa, A., Yamauchi, H., Kanazawa, K., Endo, H., Tsuji, H., Inoue, S., Fukushima, N., Kikuchi, H., Sato, H., Yamamoto, K.: Artificial intelligence for the detection of vertebral fractures on plain spinal radiography. Sci. Rep. (2020). https://doi.org/10.1038/s41598-020-76866-w
Li, Y.C., Chen, H.H., Horng-Shing Lu, H., Hondar Wu, H.T., Chang, M.C., Chou, P.H.: Can a deep-learning model for the automated detection of vertebral fractures approach the performance level of human subspecialists? Clin. Orthop. Relat. Res. 479, 1598–1612 (2021). https://doi.org/10.1097/CORR.0000000000001685
Yabu, A., Hoshino, M., Tabuchi, H., Takahashi, S., Masumoto, H., Akada, M., Morita, S., Maeno, T., Iwamae, M., Inose, H., Kato, T., Yoshii, T., Tsujio, T., Terai, H., Toyoda, H., Suzuki, A., Tamai, K., Ohyama, S., Hori, Y., Okawa, A., Nakamura, H.: Using artificial intelligence to diagnose fresh osteoporotic vertebral fractures on magnetic resonance images. Spine J. 21, 1652–1658 (2021). https://doi.org/10.1016/j.spinee.2021.03.006
Small, J.E., Osler, P., Paul, A.B., Kunst, M.: CT cervical Spine fracture detection using a convolutional neural network. Am. J. Neuroradiol. 42, 1341–1347 (2021). https://doi.org/10.3174/ajnr.A7094
Cheng, L.W., Chou, H.H., Cai, Y.X., Huang, K.Y., Hsieh, C.C., Chu, P.L., Cheng, I.S., Hsieh, S.Y.: Automated detection of vertebral fractures from X-ray images: a novel machine learning model and survey of the field. Neurocomputing 566, 126946 (2024). https://doi.org/10.1016/j.neucom.2023.126946
Inagaki, N., Nakata, N., Ichimori, S., Udaka, J., Mandai, A., Saito, M.: Detection of sacral fractures on radiographs using artificial intelligence. JBJS Open Access (2022). https://doi.org/10.2106/JBJS.OA.22.00030
Yamada, Y., Maki, S., Kishida, S., Nagai, H., Arima, J., Yamakawa, N., Iijima, Y., Shiko, Y., Kawasaki, Y., Kotani, T., Shiga, Y., Inage, K., Orita, S., Eguchi, Y., Takahashi, H., Yamashita, T., Minami, S., Ohtori, S.: Automated classification of hip fractures using deep convolutional neural networks with orthopedic surgeon-level accuracy: ensemble decision-making with antero-posterior and lateral radiographs. ACTA Orthop. 91, 699–704 (2020). https://doi.org/10.1080/17453674.2020.1803664
Twinprai, N., Boonrod, A., Boonrod, A., Chindaprasirt, J., Sirithanaphol, W., Chindaprasirt, P., Twinprai, P.: Artificial intelligence (AI) vs. human in hip fracture detection. Heliyon 8, e11266 (2022). https://doi.org/10.1016/j.heliyon.2022.e11266
Kitamura, G.: Deep learning evaluation of pelvic radiographs for position, hardware presence, and fracture detection. Eur. J. Radiol. 130, 109139 (2020). https://doi.org/10.1016/j.ejrad.2020.109139
Cheng, C.-T., Wang, Y., Chen, H.-W., Hsiao, P.-M., Yeh, C.-N., Hsieh, C.-H., Miao, S., Xiao, J., Liao, C.-H., Lu, L.: A scalable physician-level deep learning algorithm detects universal trauma on pelvic radiographs. Nat. Commun. (2021). https://doi.org/10.1038/s41467-021-21311-3
Damien, P., Nader, R.B., Yaacoub, C., Lahoud, J.-C.: Iliopectineal Line fracture detection for computer-aided acetabular fracture classification. In: 2019 Ninth Int. Conf. Image Process. Theory, Tools Appl., 2019: pp. 1–5. https://doi.org/10.1109/IPTA.2019.8936080
Rashid, T., Zia, M.S., Najam-ur-Rehman, T., Meraj, T., Rauf, H.T., Kadry, S.: A minority class balanced approach using the DCNN-LSTM method to detect human wrist fracture. Life-Basel (2023). https://doi.org/10.3390/life13010133
Erne, F., Dehncke, D., Herath, S., Springer, F., Pfeifer, N., Eggeling, R., Küper, M.: Correction: deep learning in the detection of rare fractures - development of a “Deep Learning Convolutional Network” model for detecting acetabular fractures. Z. Orthop. Unfall. (2021). https://doi.org/10.1055/a-1577-4645
Castro-Gutierrez, E., Estacio-Cerquin, L., Gallegos-Guillen, J., Obando, J.D.: Detection of acetabulum fractures using X-ray imaging and processing methods focused on noisy images. In: 2019 Amity Int. Conf. Artif. Intell., 2019: pp. 296–302. https://doi.org/10.1109/AICAI.2019.8701297
Oka, K., Shiode, R., Yoshii, Y., Tanaka, H., Iwahashi, T., Murase, T.: Artificial intelligence to diagnosis distal radius fracture using biplane plain X-rays. J. Orthop. Surg. Res. (2021). https://doi.org/10.1186/s13018-021-02845-0
Dipnall, J.F., Page, R., Du, L., Costa, M., Lyons, R.A., Cameron, P., de Steiger, R., Hau, R., Bucknill, A., Oppy, A., Edwards, E., Varma, D., Jung, M.C., Gabbe, B.J., Du, L., Lyons, R.A., Cameron, P., Steiger, R., Hau, R., Bucknill, A., Oppy, A., Edwards, E., Varma, D., Jung, M.C., Gabbe, B.J.: Predicting fracture outcomes from clinical registry data using artificial intelligence supplemented models for evidence-informed treatment (PRAISE) study protocol. PLoS ONE (2021). https://doi.org/10.1371/journal.pone.0257361
Sato, Y., Takegami, Y., Asamoto, T., Ono, Y., Hidetoshi, T., Goto, R., Kitamura, A., Honda, S.: Artificial intelligence improves the accuracy of residents in the diagnosis of hip fractures: a multicenter study. BMC Musculoskelet. Disord. 22, 407 (2021). https://doi.org/10.1186/s12891-021-04260-2
Ahmed, A., Imran, A.S., Manaf, A., Kastrati, Z., Daudpota, S.M.: Enhancing wrist abnormality detection with YOLO: analysis of state-of-the-art single-stage detection models. Biomed. Signal Process. Control 93, 106144 (2024). https://doi.org/10.1016/j.bspc.2024.106144
Chen, C., Liu, B., Zhou, K., He, W., Yan, F., Wang, Z., Xiao, R.: CSR-Net: cross-scale residual network for multi-objective scaphoid fracture segmentation. Comput. Biol. Med. 137, 104776 (2021). https://doi.org/10.1016/j.compbiomed.2021.104776
Langerhuizen, D.W., Bulstra, A.E., Janssen, S.J., Ring, D., Kerkhoffs, G.M., Jaarsma, R.L., Doornberg, J.N.: Is deep learning on par with human observers for detection of radiographically visible and occult fractures of the scaphoid? Clin. Orthop. Relat. Res. 478, 1 (2020). https://doi.org/10.1097/CORR.0000000000001318
Ozkaya, E., Topal, F.E., Bulut, T., Gursoy, M., Ozuysal, M., Karakaya, Z.: Evaluation of an artificial intelligence system for diagnosing scaphoid fracture on direct radiography. Eur. J. Trauma Emerg. Surg. (2022). https://doi.org/10.1007/s00068-020-01468-0
Choi, J.W., Cho, Y.J., Lee, S., Lee, J., Lee, S., Choi, Y.H., Cheon, J.-E., Ha, J.Y.: Using a dual-input convolutional neural network for automated detection of pediatric supracondylar fracture on conventional radiography. Invest. Radiol. 55, 101–110 (2020). https://doi.org/10.1097/RLI.0000000000000615
Liu, P.-R., Zhang, J.-Y., Xue, M.-D., Duan, Y.-Y., Hu, J.-L., Liu, S.-X., Xie, Y., Wang, H.-L., Wang, J.-W., Huo, T.-T., Ye, Z.-W.: Artificial Intelligence to diagnose tibial plateau fractures: an intelligent assistant for orthopedic physicians. Curr. Med. Sci. 41, 1158–1164 (2021). https://doi.org/10.1007/s11596-021-2501-4
Castro-Zunti, R., Chae, K.J., Choi, Y., Jin, G.Y., Ko, S.: Assessing the speed-accuracy trade-offs of popular convolutional neural networks for single-crop rib fracture classification. Comput. Med. Imaging Graph. (2021). https://doi.org/10.1016/j.compmedimag.2021.101937
Niiya, A., Murakami, K., Kobayashi, R., Sekimoto, A., Saeki, M., Toyofuku, K., Kato, M., Shinjo, H., Ito, Y., Takei, M., Murata, C., Ohgiya, Y.: Development of an artificial intelligence-assisted computed tomography diagnosis technology for rib fracture and evaluation of its clinical usefulness. Sci. Rep. (2022). https://doi.org/10.1038/s41598-022-12453-5
Gao, Y., Liu, H., Jiang, L., Yang, C., Yin, X., Coatrieux, J.-L., Chen, Y.: CCE-Net: a rib fracture diagnosis network based on contralateral, contextual, and edge enhanced modules. Biomed. Signal Process. Control 75, 103620 (2022). https://doi.org/10.1016/j.bspc.2022.103620
Lind, A., Akbarian, E., Olsson, S., Nåsell, H., Sköldenberg, O., Razavian, A.S., Gordon, M.: Artificial intelligence for the classification of fractures around the knee in adults according to the 2018 AO/OTA classification system. PLoS ONE 16, e0248809 (2021). https://doi.org/10.1371/journal.pone.0248809
Hung, T.N.K., Vy, V.P.T., Tri, N.M., Hoang, L.N., Van Tuan, L., Ho, Q.T., Le, N.Q.K., Kang, J.-H.: Automatic detection of meniscus tears using backbone convolutional neural networks on knee MRI. J. Magn. Reson. Imaging 57, 740–749 (2023). https://doi.org/10.1002/jmri.28284
Zhang, L., Che, Z., Li, Y., Mu, M., Gang, J., Xiao, Y., Yao, Y.: Multi-level classification of knee cartilage lesion in multimodal MRI based on deep learning. Biomed. Signal Process. Control 83, 104687 (2023). https://doi.org/10.1016/j.bspc.2023.104687
Li, J., Qian, K., Liu, J., Huang, Z., Zhang, Y., Zhao, G., Wang, H., Li, M., Liang, X., Zhou, F., Yu, X., Li, L., Wang, X., Yang, X., Jiang, Q.: Identification and diagnosis of meniscus tear by magnetic resonance imaging using a deep learning model. J. Orthop. Transl. 34, 91–101 (2022). https://doi.org/10.1016/j.jot.2022.05.006
Fritz, B., Marbach, G., Civardi, F., Fucentese, S.F., Pfirrmann, C.W.A.: Deep convolutional neural network-based detection of meniscus tears: comparison with radiologists and surgery as standard of reference. Skeletal Radiol. 49, 1207–1217 (2020). https://doi.org/10.1007/s00256-020-03410-2
Hussain, D., Han, S.-M.: Computer-aided osteoporosis detection from DXA imaging. Comput. Methods Programs Biomed. 173, 87–107 (2019). https://doi.org/10.1016/j.cmpb.2019.03.011
Tecle, N., Teitel, J., Morris, M.R., Sani, N., Mitten, D., Hammert, W.C.: Convolutional neural network for second metacarpal radiographic osteoporosis screening. J. Hand Surg. Am. 45, 175–181 (2020). https://doi.org/10.1016/j.jhsa.2019.11.019
El-Saadawy, H., Tantawi, M., Shedeed, H.A., Tolba, M.F.: A hybrid two-stage GNG-modified VGG method for bone X-rays classification and abnormality detection. IEEE Access 9, 76649–76661 (2021). https://doi.org/10.1109/ACCESS.2021.3081915
Singh, G., Anand, D., Cho, W., Joshi, G.P., Son, K.C.: Hybrid deep learning approach for automatic detection in musculoskeletal radiographs. Biology-Basel (2022). https://doi.org/10.3390/biology11050665
Varma, M., Lu, M., Gardner, R., Dunnmon, J., Khandwala, N., Rajpurkar, P., Long, J., Beaulieu, C., Shpanskaya, K., Fei-Fei, L., Lungren, M.P., Patel, B.N.: Automated abnormality detection in lower extremity radiographs using deep learning. Nat. Mach. Intell. 1, 578–583 (2019). https://doi.org/10.1038/s42256-019-0126-0
Chada, G.: Machine learning models for abnormality detection in musculoskeletal radiographs. Reports 2, 26 (2019). https://doi.org/10.3390/reports2040026
Teeyapan, K.: Abnormality detection in musculoskeletal radiographs using EfficientNets, 2020 24th Int. Comput. Sci. Eng. Conf. ICSEC 2020 (2020). https://doi.org/10.1109/ICSEC51790.2020.9375275
Mondol, T.C., Iqbal, H., Hashem, M.M.A.: Deep CNN-based ensemble CADx model for musculoskeletal abnormality detection from radiographs, 2019 5th Int. Conf. Adv. Electr. Eng. ICAEE 2019, 392–397 (2019). https://doi.org/10.1109/ICAEE48663.2019.8975455
Mall, P.K., Singh, P.K.: BoostNet: a method to enhance the performance of deep learning model on musculoskeletal radiographs X-ray images. Int. J. Syst. Assur. Eng. Manag. 13, 658–672 (2022). https://doi.org/10.1007/s13198-021-01580-3
He, M., Wang, X., Zhao, Y.: A calibrated deep learning ensemble for abnormality detection in musculoskeletal radiographs. Sci. Rep. 11, 9097 (2021). https://doi.org/10.1038/s41598-021-88578-w
Eweje, F.R., Bao, B., Wu, J., Dalal, D., Liao, W., He, Y., Luo, Y., Lu, S., Zhang, P., Peng, X., Sebro, R., Bai, H.X., States, L.: Deep learning for classification of bone lesions on routine MRI. EBioMedicine 68, 103402 (2021). https://doi.org/10.1016/j.ebiom.2021.103402
Khaleel, Y.L., Habeeb, M.A., Albahri, A.S., Al-Quraishi, T., Albahri, O.S., Alamoodi, A.H.: Network and cybersecurity applications of defense in adversarial attacks: A state-of-the-art using machine learning and deep learning methods. J. Intell. Syst. (2024). https://doi.org/10.1515/jisys-2024-0153
Mohammed, A.S., Hasanaath, A.A., Latif, G., Bashar, A.: Knee osteoarthritis detection and severity classification using residual neural networks on preprocessed x-ray images. Diagnostics (2023). https://doi.org/10.3390/diagnostics13081380
Fatema, K., Hossen, A., Azam, S., Hossain, S., Karim, A., Hasan, Z., Jonkman, M.: Heliyon development of an automated optimal distance feature-based decision system for diagnosing knee osteoarthritis using segmented X-ray images. Heliyon 9, e21703 (2023). https://doi.org/10.1016/j.heliyon.2023.e21703
Funding
This work was supported by Universiti Sains Malaysia, Bridging with Project No: R501-LR-RND003-0000000913-0000.
Author information
Authors and Affiliations
Contributions
Thura J. Mohammed: Conceptualization, Methodology, Writing – Original Draft Preparation, Data Curation, and Formal Analysis. Chew Xinying: Supervision, Data Collection, Analysis, Writing – Review and Editing, and Visualization. Alhamzah Alnoor: Supervision, Review – Conceptualization, Methodology, and Validation. Khai Wah Khaw: Project Administration, Writing – Review and Editing, Data Analysis, and Visualization. A. S. Albahri: Supervision, Project Administration, Writing – Review and Editing, and Validation. Wei Lin Teoh: Data Collection, Analysis, and Literature Review. Zhi Lin Chong: Writing – Original Draft, Data Validation, and Analysis. Sajal Saha: Literature Review, Editing, and Project Administration. All authors contributed to the review process, interpretation of results, and final approval of the manuscript for publication.
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Mohammed, T.J., Xinying, C., Alnoor, A. et al. A Systematic Review of Artificial Intelligence in Orthopaedic Disease Detection: A Taxonomy for Analysis and Trustworthiness Evaluation. Int J Comput Intell Syst 17, 303 (2024). https://doi.org/10.1007/s44196-024-00718-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s44196-024-00718-y