Journal of Imaging

13 pages, 2632 KiB

Open AccessArticle

Volumetric Humeral Canal Fill Ratio Effects Primary Stability and Cortical Bone Loading in Short and Standard Stem Reverse Shoulder Arthroplasty: A Biomechanical and Computational Study

by Daniel Ritter, Patric Raiss, Patrick J. Denard, Brian C. Werner, Peter E. Müller, Matthias Woiczinski, Coen A. Wijdicks and Samuel Bachmaier

J. Imaging 2024, 10(12), 334; https://doi.org/10.3390/jimaging10120334 - 23 Dec 2024

Viewed by 881

Abstract

Objective: This study evaluated the effect of three-dimensional (3D) volumetric humeral canal fill ratios (VFR) of reverse shoulder arthroplasty (RSA) short and standard stems on biomechanical stability and bone deformations in the proximal humerus. Methods: Forty cadaveric shoulder specimens were analyzed in a [...] Read more.

Objective: This study evaluated the effect of three-dimensional (3D) volumetric humeral canal fill ratios (VFR) of reverse shoulder arthroplasty (RSA) short and standard stems on biomechanical stability and bone deformations in the proximal humerus. Methods: Forty cadaveric shoulder specimens were analyzed in a clinical computed tomography (CT) scanner allowing for segmentation of the humeral canal to calculate volumetric measures which were verified postoperatively with plain radiographs. Virtual implant positioning allowed for group assignment (VFR < 0.72): Standard stem with low (n = 10) and high (n = 10) filling ratios, a short stem with low (n = 10) and high filling ratios (n = 10). Biomechanical testing included cyclic loading of the native bone and the implanted humeral component. Optical recording allowed for spatial implant tracking and the quantification of cortical bone deformations in the proximal humerus. Results: Planned filling ratios based on 3D volumetric measures had a good-to-excellent correlation (ICC = 0.835; p < 0.001) with implanted filling ratios. Lower canal fill ratios resulted in significantly higher variability between short and standard stems regarding implant tilt (820 N: p = 0.030) and subsidence (220 N: p = 0.046, 520 N: p = 0.007 and 820 N: p = 0.005). Higher filling ratios resulted in significantly lower bone deformations in the medial calcar area compared to the native bone, while the bone deformations in lower filling ratios did not differ significantly (p > 0.177). Conclusions: Lower canal filling ratios maintain dynamic bone loading in the medial calcar of the humerus similar to the native situation in this biomechanical loading setup. Short stems implanted with a low filling ratio have an increased risk for implant tilt and subsidence compared to high filling ratios or standard stems. Full article

► Show Figures

Figure 1

Figure 1
Methodical framework, from virtually planning and developing a volumetric measure of the humeral canal which was used in this study for group assignment and planning of low and high filling ratios. Canal fill ratios were controlled using postoperative X-rays after the implantation and before testing the implanted humeral component biomechanically. Full article ">Figure 2
Measurement and calculation of the filling ratios by dividing the red marked measure through the respective blue one. The three-dimensional rendered and segmented CT data on the left side allowed for volumetric calculation of the canal fill ratio (3D VFR). Calculation of the canal fill ratios based on two-dimensional plane radiographs (2D Metaphysis FR and 2D Diaphysis FR) is shown on the right side based on current clinical practice [<a href="#B14-jimaging-10-00334" class="html-bibr">14</a>,<a href="#B16-jimaging-10-00334" class="html-bibr">16</a>]. Full article ">Figure 3
The 2D to 3D registration allowedto validate the accuracy of preoperative canal fill measurements with the actual postoperative implant seating: (A). preoperative planning of the humeral implant (purple) and segmentation of the humeral canal (orange), (B). registration of postOP X-rays, (C). correction of the implant position according to postOP position (blue) and (D). calculation of the true postOP canal fill ratio for comparison with the preOP ratio. Full article ">Figure 4
(A) Testing protocol shows the loading cycles including the points of data analysis (a–g). (B) Experimental cyclic loading setups and the optical tracking points (green) for data analysis. (C) Evaluated tracking points during cyclic loading force (F) to analyze implant subsidence and tilt measurements between analysis points a and b, d or f, respectively, (simlant and αimlant, Δab, Δad, and Δaf) at the end of each loading block. Bone micromotion (sBoneHW, Δbc, Δde, and Δfg) was evaluated as bone displacement within each final load cycle (hysteresis width (HW)). Total compressive transmission caused deformation of the bone was measured at the end of each loading block (sBoneTot, Δab, Δad, and Δaf). Full article ">Figure 5
Boxplot of implant subsidence (A) and tilt (B) at the end of each cyclic loading block (220 N, 520 N, and 820 N) comparing short and standard stem implants, respectively, implanted with high and low filling ratios. Full article ">Figure 6
Boxplots of total bone deformation (A) and bone micromotion (B) for each cyclic loading block (220 N, 520 N, and 820 N) comparing low- and high filling ratios to the biomechanical behavior of the native bone. Full article ">

20 pages, 4678 KiB

Open AccessArticle

Deep Learning-Based Diagnosis Algorithm for Alzheimer’s Disease

by Zhenhao Jin, Junjie Gong, Minghui Deng, Piaoyi Zheng and Guiping Li

J. Imaging 2024, 10(12), 333; https://doi.org/10.3390/jimaging10120333 - 23 Dec 2024

Viewed by 571

Abstract

Alzheimer’s disease (AD), a degenerative condition affecting the central nervous system, has witnessed a notable rise in prevalence along with the increasing aging population. In recent years, the integration of cutting-edge medical imaging technologies with forefront theories in artificial intelligence has dramatically enhanced [...] Read more.

Alzheimer’s disease (AD), a degenerative condition affecting the central nervous system, has witnessed a notable rise in prevalence along with the increasing aging population. In recent years, the integration of cutting-edge medical imaging technologies with forefront theories in artificial intelligence has dramatically enhanced the efficiency of identifying and diagnosing brain diseases such as AD. This paper presents an innovative two-stage automatic auxiliary diagnosis algorithm for AD, based on an improved 3D DenseNet segmentation model and an improved MobileNetV3 classification model applied to brain MR images. In the segmentation network, the backbone network was simplified, the activation function and loss function were replaced, and the 3D GAM attention mechanism was introduced. In the classification network, firstly, the CA attention mechanism was added to enhance the model’s ability to capture positional information of disease features; secondly, dilated convolutions were introduced to extract richer features from the input feature maps; and finally, the fully connected layer of MobileNetV3 was modified and the idea of transfer learning was adopted to improve the model’s feature extraction capability. The results of the study showed that the proposed approach achieved classification accuracies of 97.85% for AD/NC, 95.31% for MCI/NC, 93.96% for AD/MCI, and 92.63% for AD/MCI/NC, respectively, which were 3.1, 2.8, 2.6, and 2.8 percentage points higher than before the improvement. Comparative and ablation experiments have validated the proposed classification performance of this method, demonstrating its capability to facilitate an accurate and efficient automated auxiliary diagnosis of AD, offering a deep learning-based solution for it. Full article

► Show Figures

Figure 1

26 pages, 21880 KiB

Open AccessArticle

Explainable AI-Based Skin Cancer Detection Using CNN, Particle Swarm Optimization and Machine Learning

by Syed Adil Hussain Shah, Syed Taimoor Hussain Shah, Roa’a Khaled, Andrea Buccoliero, Syed Baqir Hussain Shah, Angelo Di Terlizzi, Giacomo Di Benedetto and Marco Agostino Deriu

J. Imaging 2024, 10(12), 332; https://doi.org/10.3390/jimaging10120332 - 22 Dec 2024

Viewed by 853

Abstract

Skin cancer is among the most prevalent cancers globally, emphasizing the need for early detection and accurate diagnosis to improve outcomes. Traditional diagnostic methods, based on visual examination, are subjective, time-intensive, and require specialized expertise. Current artificial intelligence (AI) approaches for skin cancer [...] Read more.

Skin cancer is among the most prevalent cancers globally, emphasizing the need for early detection and accurate diagnosis to improve outcomes. Traditional diagnostic methods, based on visual examination, are subjective, time-intensive, and require specialized expertise. Current artificial intelligence (AI) approaches for skin cancer detection face challenges such as computational inefficiency, lack of interpretability, and reliance on standalone CNN architectures. To address these limitations, this study proposes a comprehensive pipeline combining transfer learning, feature selection, and machine-learning algorithms to improve detection accuracy. Multiple pretrained CNN models were evaluated, with Xception emerging as the optimal choice for its balance of computational efficiency and performance. An ablation study further validated the effectiveness of freezing task-specific layers within the Xception architecture. Feature dimensionality was optimized using Particle Swarm Optimization, reducing dimensions from 1024 to 508, significantly enhancing computational efficiency. Machine-learning classifiers, including Subspace KNN and Medium Gaussian SVM, further improved classification accuracy. Evaluated on the ISIC 2018 and HAM10000 datasets, the proposed pipeline achieved impressive accuracies of 98.5% and 86.1%, respectively. Moreover, Explainable-AI (XAI) techniques, such as Grad-CAM, LIME, and Occlusion Sensitivity, enhanced interpretability. This approach provides a robust, efficient, and interpretable solution for automated skin cancer diagnosis in clinical applications. Full article

(This article belongs to the Special Issue Deep Learning in Image Analysis: Progress and Challenges)

► Show Figures

Figure 1

Figure 1
The complete pipeline of the proposed methodology. Full article ">Figure 2
Resulting images of augmentation operation. Full article ">Figure 3
Training and validation accuracy (top) and loss (bottom) curves over iterations during the training process of the proposed model. The gray bars indicate specific epochs of interest, highlighting regions where the training and validation metrics stabilized or showed notable changes. Additional training details, such as elapsed time, learning rate, and hardware resources, are provided for context. Full article ">Figure 4
Confusion matrix of improved Xception network: (A) confusion matrix on validation dataset; (B) confusion matrix on testing dataset. Full article ">Figure 5
Confusion matrix and ROC curve of Experiment 2 by Medium Gaussian SVM classifier: (A) confusion matrix and ROC curve on training dataset; (B) confusion matrix and ROC curve on testing dataset. Additionally, the dashed line in the ROC curve represents the reference line for random classification (AUC = 0.5). Full article ">Figure 6
Confusion matrix and ROC curve of Experiment 3 by Ensemble Subspace KNN classifier: (A) confusion matrix and ROC curve on training dataset; (B) confusion matrix and ROC curve on testing dataset. Additionally, the dashed line in the ROC curve represents the reference line for random classification (AUC = 0.5). Full article ">Figure 7
Confusion matrix and ROC curve for the Subspace KNN classifier on the HAM10000 dataset, showing classification performance with an AUC of 0.8785 for both benign and malignant classes. The confusion matrix highlights true positives, false positives, and misclassifications, while the ROC curve demonstrates the model’s discriminative ability. Additionally, the dashed line in the ROC curve represents the reference line for random classification (AUC = 0.5). Full article ">Figure 8
Visualization of the proposed Xception-based pipeline applied to ISIC and HAM10000 datasets for skin cancer classification. Input images are classified as benign or malignant with confidence scores. Grad-CAM highlights critical regions, LIME provides pixel-level interpretations, and Occlusion Sensitivity validates predictions, enhancing model transparency for clinical applications. Additionally, the color legend bars indicate the intensity of contribution, with “min” and “max” representing low to high importance, enhancing the model’s transparency and interpretability for clinical applications. Full article ">

13 pages, 6526 KiB

Open AccessArticle

Towards Robust Supervised Pectoral Muscle Segmentation in Mammography Images

by Parvaneh Aliniya, Mircea Nicolescu, Monica Nicolescu and George Bebis

J. Imaging 2024, 10(12), 331; https://doi.org/10.3390/jimaging10120331 - 22 Dec 2024

Viewed by 510

Abstract

Mammography images are the most commonly used tool for breast cancer screening. The presence of pectoral muscle in images for the mediolateral oblique view makes designing a robust automated breast cancer detection system more challenging. Most of the current methods for removing the [...] Read more.

Mammography images are the most commonly used tool for breast cancer screening. The presence of pectoral muscle in images for the mediolateral oblique view makes designing a robust automated breast cancer detection system more challenging. Most of the current methods for removing the pectoral muscle are based on traditional machine learning approaches. This is partly due to the lack of segmentation masks of pectoral muscle in available datasets. In this paper, we provide the segmentation masks of the pectoral muscle for the INbreast, MIAS, and CBIS-DDSM datasets, which will enable the development of supervised methods and the utilization of deep learning. Training deep learning-based models using segmentation masks will also be a powerful tool for removing pectoral muscle for unseen data. To test the validity of this idea, we trained AU-Net separately on the INbreast and CBIS-DDSM for the segmentation of the pectoral muscle. We used cross-dataset testing to evaluate the performance of the models on an unseen dataset. In addition, the models were tested on all of the images in the MIAS dataset. The experimental results show that cross-dataset testing achieves a comparable performance to the same-dataset experiments. Full article

(This article belongs to the Special Issue Image Segmentation Techniques: Current Status and Future Directions (2nd Edition))

► Show Figures

Figure 1

18 pages, 36094 KiB

Open AccessArticle

Arbitrary Optics for Gaussian Splatting Using Space Warping

by Jakob Nazarenus, Simin Kou, Fang-Lue Zhang and Reinhard Koch

J. Imaging 2024, 10(12), 330; https://doi.org/10.3390/jimaging10120330 - 22 Dec 2024

Viewed by 557

Abstract

Due to recent advances in 3D reconstruction from RGB images, it is now possible to create photorealistic representations of real-world scenes that only require minutes to be reconstructed and can be rendered in real time. In particular, 3D Gaussian splatting shows promising results, [...] Read more.

Due to recent advances in 3D reconstruction from RGB images, it is now possible to create photorealistic representations of real-world scenes that only require minutes to be reconstructed and can be rendered in real time. In particular, 3D Gaussian splatting shows promising results, outperforming preceding reconstruction methods while simultaneously reducing the overall computational requirements. The main success of 3D Gaussian splatting relies on the efficient use of a differentiable rasterizer to render the Gaussian scene representation. One major drawback of this method is its underlying pinhole camera model. In this paper, we propose an extension of the existing method that removes this constraint and enables scene reconstructions using arbitrary camera optics such as highly distorting fisheye lenses. Our method achieves this by applying a differentiable warping function to the Gaussian scene representation. Additionally, we reduce overfitting in outdoor scenes by utilizing a learnable skybox, reducing the presence of floating artifacts within the reconstructed scene. Based on synthetic and real-world image datasets, we show that our method is capable of creating an accurate scene reconstruction from highly distorted images and rendering photorealistic images from such reconstructions. Full article

(This article belongs to the Special Issue Geometry Reconstruction from Images (2nd Edition))

► Show Figures

Figure 1

Figure 1
Pipeline of our proposed method. Before being forwarded to the pinhole Gaussian rasterizer, we apply a space-warping module to the position, rotation, and scale to emulate the distortion of a lens specified by the camera’s intrinsics. Full article ">Figure 2
Distortion of scale and rotation. The four images show the steps in the distortion pipeline from left to right. An undistorted Gaussian (a) is non-linearly distorted (b). This distortion is linearly approximated using Jacobian <math display="inline"><semantics> <msub> <mi>J</mi> <mi mathvariant="script">W</mi> </msub> </semantics></math> (c), with a subsequent orthogonalization of the axes (d). For (c,d), the gray area shows the true distorted Gaussian to visualize the approximation error. Full article ">Figure 3
Reconstruction results for the synthetic Classroom scene. Full article ">Figure 4
Results of our proposed method on synthetic Blender scenes (Archiviz, Barbershop, and Classroom). Red rectangles indicate areas in which our method produced reconstruction artifacts. Zoom in for details. Full article ">Figure 5
Results of our proposed method on synthetic Blender scenes (Monk, Pabellon, and Sky). Red rectangles indicate areas in which our method produced reconstruction artifacts. Zoom in for details. Full article ">Figure 6
Results of our proposed method and Fisheye-GS on ScanNet++ scenes (Bedroom, Kitchen, and Office Day). Zoom in for details. Full article ">Figure 7
Results of our proposed method and Fisheye-GS on ScanNet++ scenes (Office Night, Tool Room, and Utility Room). Zoom in for details. Full article ">Figure 8
Renderings of a cube with Gaussians along the edges. The left rendering has the scale and rotation adjusted according to the Jacobian; for the right rendering, scale and rotation were left unmodified. Full article ">Figure 9
Evaluation metrics for the Utility Room scene for varying degrees of the polynomial polar distortion function. Full article ">Figure 10
Results for our proposed model trained on synthetic data with the learned skybox enabled (middle) and disabled (right). Full article ">Figure A1
Results for three validation views optimized on our synthetic orthographic dataset. Full article ">Figure A2
Results for the five additional real-world scenes from the ScanNet++ dataset. Full article ">

23 pages, 7813 KiB

Open AccessArticle

The Use of Hybrid CNN-RNN Deep Learning Models to Discriminate Tumor Tissue in Dynamic Breast Thermography

by Andrés Munguía-Siu, Irene Vergara and Juan Horacio Espinoza-Rodríguez

J. Imaging 2024, 10(12), 329; https://doi.org/10.3390/jimaging10120329 - 21 Dec 2024

Viewed by 887

Abstract

Breast cancer is one of the leading causes of death for women worldwide, and early detection can help reduce the death rate. Infrared thermography has gained popularity as a non-invasive and rapid method for detecting this pathology and can be further enhanced by [...] Read more.

Breast cancer is one of the leading causes of death for women worldwide, and early detection can help reduce the death rate. Infrared thermography has gained popularity as a non-invasive and rapid method for detecting this pathology and can be further enhanced by applying neural networks to extract spatial and even temporal data derived from breast thermographic images if they are acquired sequentially. In this study, we evaluated hybrid convolutional-recurrent neural network (CNN-RNN) models based on five state-of-the-art pre-trained CNN architectures coupled with three RNNs to discern tumor abnormalities in dynamic breast thermographic images. The hybrid architecture that achieved the best performance for detecting breast cancer was VGG16-LSTM, which showed accuracy (ACC), sensitivity (SENS), and specificity (SPEC) of 95.72%, 92.76%, and 98.68%, respectively, with a CPU runtime of 3.9 s. However, the hybrid architecture that showed the fastest CPU runtime was AlexNet-RNN with 0.61 s, although with lower performance (ACC: 80.59%, SENS: 68.52%, SPEC: 92.76%), but still superior to AlexNet (ACC: 69.41%, SENS: 52.63%, SPEC: 86.18%) with 0.44 s. Our findings show that hybrid CNN-RNN models outperform stand-alone CNN models, indicating that temporal data recovery from dynamic breast thermographs is possible without significantly compromising classifier runtime. Full article

(This article belongs to the Special Issue Deep Learning in Biomedical Image Segmentation and Classification: Advancements, Challenges and Applications)

► Show Figures

Figure 1

15 pages, 11038 KiB

Open AccessArticle

X-Ray Image-Based Real-Time COVID-19 Diagnosis Using Deep Neural Networks (CXR-DNNs)

by Ali Yousuf Khan, Miguel-Angel Luque-Nieto, Muhammad Imran Saleem and Enrique Nava-Baro

J. Imaging 2024, 10(12), 328; https://doi.org/10.3390/jimaging10120328 - 19 Dec 2024

Viewed by 752

Abstract

On 11 February 2020, the prevalent outbreak of COVID-19, a coronavirus illness, was declared a global pandemic. Since then, nearly seven million people have died and over 765 million confirmed cases of COVID-19 have been reported. The goal of this study is to [...] Read more.

On 11 February 2020, the prevalent outbreak of COVID-19, a coronavirus illness, was declared a global pandemic. Since then, nearly seven million people have died and over 765 million confirmed cases of COVID-19 have been reported. The goal of this study is to develop a diagnostic tool for detecting COVID-19 infections more efficiently. Currently, the most widely used method is Reverse Transcription Polymerase Chain Reaction (RT-PCR), a clinical technique for infection identification. However, RT-PCR is expensive, has limited sensitivity, and requires specialized medical expertise. One of the major challenges in the rapid diagnosis of COVID-19 is the need for reliable imaging, particularly X-ray imaging. This work takes advantage of artificial intelligence (AI) techniques to enhance diagnostic accuracy by automating the detection of COVID-19 infections from chest X-ray (CXR) images. We obtained and analyzed CXR images from the Kaggle public database (4035 images in total), including cases of COVID-19, viral pneumonia, pulmonary opacity, and healthy controls. By integrating advanced techniques with transfer learning from pre-trained convolutional neural networks (CNNs), specifically InceptionV3, ResNet50, and Xception, we achieved an accuracy of 95%, significantly higher than the 85.5% achieved with ResNet50 alone. Additionally, our proposed method, CXR-DNNs, can accurately distinguish between three different types of chest X-ray images for the first time. This computer-assisted diagnostic tool has the potential to significantly enhance the speed and accuracy of COVID-19 diagnoses. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

Figure 1
Block diagram of CXR-DNN used for screening COVID-19. Full article ">Figure 2
Proposed EfficientNetB7 architecture. Full article ">Figure 3
CXR images of lungs in patients: (a) healthy, (b) COVID-19, and (c) pneumonia. Full article ">Figure 4
3 × 3 confusion matrices for the (a) Training, (b) Validation, and (c) Testing datasets representing the model’s performance in true positives, false positives, true negatives, and false negatives for each class (COVID-19, Normal, and Pneumonia). Full article ">Figure 5
Accuracy and loss per epoch for each dataset, illustrating the model’s performance and learning progression: (a) Training dataset, (b) Validation dataset, (c) Testing dataset. In the accuracy graphs, the blue curve represents accuracy, and the orange curve represents loss. In the loss graphs, the blue curve represents loss, and the orange curve represents accuracy, showing their variation over the epochs. Full article ">Figure 5 Cont.
Accuracy and loss per epoch for each dataset, illustrating the model’s performance and learning progression: (a) Training dataset, (b) Validation dataset, (c) Testing dataset. In the accuracy graphs, the blue curve represents accuracy, and the orange curve represents loss. In the loss graphs, the blue curve represents loss, and the orange curve represents accuracy, showing their variation over the epochs. Full article ">Figure 6
Convergence for precision, Recall and F1-score in every dataset used: (a) training, (b) validation, (c) testing. Classes: 1—COVID-19, 2—normal, 3—pneumonia. Full article ">Figure 6 Cont.
Convergence for precision, Recall and F1-score in every dataset used: (a) training, (b) validation, (c) testing. Classes: 1—COVID-19, 2—normal, 3—pneumonia. Full article ">Figure 7
Vision sample COVID-19 CXR picture with transformer attention map (layer 1). Full article ">

18 pages, 2563 KiB

Open AccessArticle

Optimization of Cocoa Pods Maturity Classification Using Stacking and Voting with Ensemble Learning Methods in RGB and LAB Spaces

by Kacoutchy Jean Ayikpa, Abou Bakary Ballo, Diarra Mamadou and Pierre Gouton

J. Imaging 2024, 10(12), 327; https://doi.org/10.3390/jimaging10120327 - 18 Dec 2024

Viewed by 685

Abstract

Determining the maturity of cocoa pods early is not just about guaranteeing harvest quality and optimizing yield. It is also about efficient resource management. Rapid identification of the stage of maturity helps avoid losses linked to a premature or late harvest, improving productivity. [...] Read more.

Determining the maturity of cocoa pods early is not just about guaranteeing harvest quality and optimizing yield. It is also about efficient resource management. Rapid identification of the stage of maturity helps avoid losses linked to a premature or late harvest, improving productivity. Early determination of cocoa pod maturity ensures both the quality and quantity of the harvest, as immature or overripe pods cannot produce premium cocoa beans. Our innovative research harnesses artificial intelligence and computer vision technologies to revolutionize the cocoa industry, offering precise and advanced tools for accurately assessing cocoa pod maturity. Providing an objective and rapid assessment enables farmers to make informed decisions about the optimal time to harvest, helping to maximize the yield of their plantations. Furthermore, by automating this process, these technologies reduce the margins for human error and improve the management of agricultural resources. With this in mind, our study proposes to exploit a computer vision method based on the GLCM (gray level co-occurrence matrix) algorithm to extract the characteristics of images in the RGB (red, green, blue) and LAB (luminance, axis between red and green, axis between yellow and blue) color spaces. This approach allows for in-depth image analysis, which is essential for capturing the nuances of cocoa pod maturity. Next, we apply classification algorithms to identify the best performers. These algorithms are then combined via stacking and voting techniques, allowing our model to be optimized by taking advantage of the strengths of each method, thus guaranteeing more robust and precise results. The results demonstrated that the combination of algorithms produced superior performance, especially in the LAB color space, where voting scored 98.49% and stacking 98.71%. In comparison, in the RGB color space, voting scored 96.59% and stacking 97.06%. These results surpass those generally reported in the literature, showing the increased effectiveness of combined approaches in improving the accuracy of classification models. This highlights the importance of exploring ensemble techniques to maximize performance in complex contexts such as cocoa pod maturity classification. Full article

(This article belongs to the Special Issue Imaging Applications in Agriculture)

► Show Figures

Figure 1

38 pages, 3841 KiB

Open AccessReview

Computer Vision-Based Gait Recognition on the Edge: A Survey on Feature Representations, Models, and Architectures

by Edwin Salcedo

J. Imaging 2024, 10(12), 326; https://doi.org/10.3390/jimaging10120326 - 18 Dec 2024

Viewed by 1616

Abstract

Computer vision-based gait recognition (CVGR) is a technology that has gained considerable attention in recent years due to its non-invasive, unobtrusive, and difficult-to-conceal nature. Beyond its applications in biometrics, CVGR holds significant potential for healthcare and human–computer interaction. Current CVGR systems often transmit [...] Read more.

Computer vision-based gait recognition (CVGR) is a technology that has gained considerable attention in recent years due to its non-invasive, unobtrusive, and difficult-to-conceal nature. Beyond its applications in biometrics, CVGR holds significant potential for healthcare and human–computer interaction. Current CVGR systems often transmit collected data to a cloud server for machine learning-based gait pattern recognition. While effective, this cloud-centric approach can result in increased system response times. Alternatively, the emerging paradigm of edge computing, which involves moving computational processes to local devices, offers the potential to reduce latency, enable real-time surveillance, and eliminate reliance on internet connectivity. Furthermore, recent advancements in low-cost, compact microcomputers capable of handling complex inference tasks (e.g., Jetson Nano Orin, Jetson Xavier NX, and Khadas VIM4) have created exciting opportunities for deploying CVGR systems at the edge. This paper reports the state of the art in gait data acquisition modalities, feature representations, models, and architectures for CVGR systems suitable for edge computing. Additionally, this paper addresses the general limitations and highlights new avenues for future research in the promising intersection of CVGR and edge computing. Full article

(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)

► Show Figures

Figure 1

Figure 1
A comparative analysis illustrating the growing preference for DL architectures. The illustrations summarise the findings from the papers reviewed in this survey. Full article ">Figure 2
General structure of this survey paper and our proposed taxonomy of the existing technologies that facilitate on-device deployment of CVGR systems for real-time recognition. Full article ">Figure 3
Broad perspective on gait feature representations. Full article ">Figure 4
CVGR systems based on handcrafted representations typically employ one of two approaches: the systems extract silhouettes from 2D images (model-free) or rely on human body models (model-based). The video sample shown in the figure comes from the CASIA-B dataset [<a href="#B23-jimaging-10-00326" class="html-bibr">23</a>]. Full article ">Figure 5
DL-based end-to-end gait recognition scheme for CVGR systems. The sample with the walking subjects shown in the figure comes from the Penn–Fudan dataset [<a href="#B94-jimaging-10-00326" class="html-bibr">94</a>]. Full article ">Figure 6
Graphical depictions of various edge-oriented inference architectures. Full article ">Figure 7
A large-scale scalable framework to support gait recognition computations in a distributed manner. This framework would incorporate multiple nodes and an edge server to handle data acquisition, detection, segmentation, and classification, enabling more feasible real-time computation. The video sample shown in the figure comes from the CASIA-A dataset [<a href="#B24-jimaging-10-00326" class="html-bibr">24</a>]. Full article ">

14 pages, 1855 KiB

Open AccessArticle

Point-Cloud Instance Segmentation for Spinning Laser Sensors

by Alvaro Casado-Coscolla, Carlos Sanchez-Belenguer, Erik Wolfart and Vitor Sequeira

J. Imaging 2024, 10(12), 325; https://doi.org/10.3390/jimaging10120325 - 17 Dec 2024

Viewed by 602

Abstract

In this paper, we face the point-cloud segmentation problem for spinning laser sensors from a deep-learning (DL) perspective. Since the sensors natively provide their measurements in a 2D grid, we directly use state-of-the-art models designed for visual information for the segmentation task and [...] Read more.

In this paper, we face the point-cloud segmentation problem for spinning laser sensors from a deep-learning (DL) perspective. Since the sensors natively provide their measurements in a 2D grid, we directly use state-of-the-art models designed for visual information for the segmentation task and then exploit the range information to ensure 3D accuracy. This allows us to effectively address the main challenges of applying DL techniques to point clouds, i.e., lack of structure and increased dimensionality. To the best of our knowledge, this is the first work that faces the 3D segmentation problem from a 2D perspective without explicitly re-projecting 3D point clouds. Moreover, our approach exploits multiple channels available in modern sensors, i.e., range, reflectivity, and ambient illumination. We also introduce a novel data-mining pipeline that enables the annotation of 3D scans without human intervention. Together with this paper, we present a new public dataset with all the data collected for training and evaluating our approach, where point clouds preserve their native sensor structure and where every single measurement contains range, reflectivity, and ambient information, together with its associated 3D point. As experimental results show, our approach achieves state-of-the-art results both in terms of performance and inference time. Additionally, we provide a novel ablation test that analyses the individual and combined contributions of the different channels provided by modern laser sensors. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning: Trends and Applications (2nd Edition))

► Show Figures

Figure 1

Figure 1
Ouster data used in this paper. (left) Structured view in the projective space (1024 × 128 pixels). From top to bottom: range, reflectivity, and ambient channels. (right) Partial view of the associated point cloud in the 3D Cartesian space. Full article ">Figure 2
Background removal and foreground clustering. (a) Partial slice of a scan (range channel). (b) Segmentation mask after voxel filtering. (c) Resulting mask after the shrink operation (seeds for the flooding algorithm). (d) Masks of the resulting clusters, with their associated labels. (e) 3D view of the clusters with their 3D bounding boxes. Full article ">Figure 3
Data remapping from the projective space to the CNN space. All three channels (range, reflectivity, and ambient) are split into four overlapping segments and stacked together to comply with the expected input tensor size (640 × 640 × 3). Segmentation masks and bounding boxes are split accordingly. Only the reflectivity channel is represented in this figure. Full article ">Figure 4
Segmentation masks fusion with range information. (a) Raw mask of the yellow person in (b), as inferred by the model. (b) Direct projection of the raw masks to the 3D data. Notice the artifacts around the edges. (c) Largest cluster after the flooding algorithm over a), i.e., the final segmentation mask, with its associated raw mask on the back. (d) Results after projecting the final segmentation masks into the 3D data. Full article ">Figure 5
Ablation test results. CNN performance without post-processing for each CNN size nano, small, medium, large, extra-large) and each possible combination of input channels, (ambient, depth, reflectivity, A+D, A+R, D+R, A+D+R). Full article ">Figure 6
Full pipeline performance, using the Medium (M) CNN and with all input channels (A+D+R). PR curves and comparison with other techniques when predicting 3D bounding boxes with IoU thresholds of <math display="inline"><semantics> <mrow> <mn>50</mn> <mo>%</mo> </mrow> </semantics></math> (first plot) and <math display="inline"><semantics> <mrow> <mn>75</mn> <mo>%</mo> </mrow> </semantics></math> (second plot). PR curves for different IoU thresholds when predicting 3D bounding boxes (third plot) and segmentation masks (fourth plot). Full article ">

12 pages, 2922 KiB

Open AccessArticle

Exploiting 2D Neural Network Frameworks for 3D Segmentation Through Depth Map Analytics of Harvested Wild Blueberries (Vaccinium angustifolium Ait.)

by Connor C. Mullins, Travis J. Esau, Qamar U. Zaman, Ahmad A. Al-Mallahi and Aitazaz A. Farooque

J. Imaging 2024, 10(12), 324; https://doi.org/10.3390/jimaging10120324 - 15 Dec 2024

Viewed by 837

Abstract

This study introduced a novel approach to 3D image segmentation utilizing a neural network framework applied to 2D depth map imagery, with Z axis values visualized through color gradation. This research involved comprehensive data collection from mechanically harvested wild blueberries to populate 3D [...] Read more.

This study introduced a novel approach to 3D image segmentation utilizing a neural network framework applied to 2D depth map imagery, with Z axis values visualized through color gradation. This research involved comprehensive data collection from mechanically harvested wild blueberries to populate 3D and red–green–blue (RGB) images of filled totes through time-of-flight and RGB cameras, respectively. Advanced neural network models from the YOLOv8 and Detectron2 frameworks were assessed for their segmentation capabilities. Notably, the YOLOv8 models, particularly YOLOv8n-seg, demonstrated superior processing efficiency, with an average time of 18.10 ms, significantly faster than the Detectron2 models, which exceeded 57 ms, while maintaining high performance with a mean intersection over union (IoU) of 0.944 and a Matthew’s correlation coefficient (MCC) of 0.957. A qualitative comparison of segmentation masks indicated that the YOLO models produced smoother and more accurate object boundaries, whereas Detectron2 showed jagged edges and under-segmentation. Statistical analyses, including ANOVA and Tukey’s HSD test (α = 0.05), confirmed the superior segmentation performance of models on depth maps over RGB images (p < 0.001). This study concludes by recommending the YOLOv8n-seg model for real-time 3D segmentation in precision agriculture, providing insights that can enhance volume estimation, yield prediction, and resource management practices. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

Figure 1
Example of wild blueberries (Vaccinium angustifolium Ait.) at time of harvest, illustrating the irregular clustering. Full article ">Figure 2
Visual demonstration of conversion from point cloud to depth map using the jet colormap as Z axis representation in mm, where the background color of the depth map was set to blue. Full article ">Figure 3
Dual camera mount setup for data collection with Basler Blaze-101 (67° by 51° in the X and Y axes, respectively) and Lucid Vision Labs Triton (60° by 46° in the X and Y axes, respectively). Full article ">Figure 4
Visualization of segmentation mask correctness of YOLO masks for ToF 3D camera and 2D RGB camera, with true positive as green, true negative as blue, false positive as red, and false negative as orange. Full article ">Figure 5
Visualization of segmentation mask correctness of Detectron2 masks for ToF 3D and 2D RGB cameras, with true positive as green, true negative as blue, false positive as red, and false negative as orange. Full article ">Figure 6
Sample confusion matrices of Detectron2 R50 with FPN and YOLOv8n-seg on the testing dataset of the depth image dataset. Full article ">Figure 7
Sample confusion matrices of Detectron2 R50 with FPN and YOLOv8n-seg on the testing dataset of the RGB image dataset. Full article ">

13 pages, 3641 KiB

Open AccessReview

Current Role of CT Pulmonary Angiography in Pulmonary Embolism: A State-of-the-Art Review

by Ignacio Diaz-Lorenzo, Alberto Alonso-Burgos, Alfonsa Friera Reyes, Ruben Eduardo Pacios Blanco, Maria del Carmen de Benavides Bernaldo de Quiros and Guillermo Gallardo Madueño

J. Imaging 2024, 10(12), 323; https://doi.org/10.3390/jimaging10120323 - 15 Dec 2024

Cited by 1 | Viewed by 1031

Abstract

The purpose of this study is to conduct a literature review on the current role of computed tomography pulmonary angiography (CTPA) in the diagnosis and prognosis of pulmonary embolism (PE). It addresses key topics such as the quantification of the thrombotic burden, its [...] Read more.

The purpose of this study is to conduct a literature review on the current role of computed tomography pulmonary angiography (CTPA) in the diagnosis and prognosis of pulmonary embolism (PE). It addresses key topics such as the quantification of the thrombotic burden, its role as a predictor of mortality, new diagnostic techniques that are available, the possibility of analyzing the thrombus composition to differentiate its evolutionary stage, and the applicability of artificial intelligence (AI) in PE through CTPA. The only finding from CTPA that has been validated as a prognostic factor so far is the right ventricle/left ventricle (RV/LV) diameter ratio being >1, which is associated with a 2.5-fold higher risk of all-cause mortality or adverse events, and a 5-fold higher risk of PE-related mortality. The increasing use of techniques such as dual-energy computed tomography allows for the more accurate diagnosis of perfusion defects, which may go undetected in conventional computed tomography, identifying up to 92% of these defects compared to 78% being detected by CTPA. Additionally, it is essential to explore the latest advances in the application of AI to CTPA, which are currently expanding and have demonstrated a 23% improvement in the detection of subsegmental emboli compared to manual interpretation. With deep image analysis, up to a 95% accuracy has been achieved in predicting PE severity based on the thrombus volume and perfusion deficits. These advancements over the past 10 years significantly contribute to early intervention strategies and, therefore, to the improvement of morbidity and mortality outcomes for these patients. Full article

(This article belongs to the Special Issue Tools and Techniques for Improving Radiological Imaging Applications)

► Show Figures

Figure 1

22 pages, 838 KiB

Open AccessArticle

MediScan: A Framework of U-Health and Prognostic AI Assessment on Medical Imaging

by Sibtain Syed, Rehan Ahmed, Arshad Iqbal, Naveed Ahmad and Mohammed Ali Alshara

J. Imaging 2024, 10(12), 322; https://doi.org/10.3390/jimaging10120322 - 13 Dec 2024

Viewed by 1265

Abstract

With technological advancements, remarkable progress has been made with the convergence of health sciences and Artificial Intelligence (AI). Modern health systems are proposed to ease patient diagnostics. However, the challenge is to provide AI-based precautions to patients and doctors for more accurate risk [...] Read more.

With technological advancements, remarkable progress has been made with the convergence of health sciences and Artificial Intelligence (AI). Modern health systems are proposed to ease patient diagnostics. However, the challenge is to provide AI-based precautions to patients and doctors for more accurate risk assessment. The proposed healthcare system aims to integrate patients, doctors, laboratories, pharmacies, and administrative personnel use cases and their primary functions onto a single platform. The proposed framework can also process microscopic images, CT scans, X-rays, and MRI to classify malignancy and give doctors a set of AI precautions for patient risk assessment. The proposed framework incorporates various DCNN models for identifying different forms of tumors and fractures in the human body i.e., brain, bones, lungs, kidneys, and skin, and generating precautions with the help of the Fined-Tuned Large Language Model (LLM) i.e., Generative Pretrained Transformer 4 (GPT-4). With enough training data, DCNN can learn highly representative, data-driven, hierarchical image features. The GPT-4 model is selected for generating precautions due to its explanation, reasoning, memory, and accuracy on prior medical assessments and research studies. Classification models are evaluated by classification report (i.e., Recall, Precision, F1 Score, Support, Accuracy, and Macro and Weighted Average) and confusion matrix and have shown robust performance compared to the conventional schemes. Full article

(This article belongs to the Special Issue Deep Learning in Biomedical Image Segmentation and Classification: Advancements, Challenges and Applications)

► Show Figures

Figure 1

16 pages, 1289 KiB

Open AccessArticle

DAT: Deep Learning-Based Acceleration-Aware Trajectory Forecasting

by Ali Asghar Sharifi, Ali Zoljodi and Masoud Daneshtalab

J. Imaging 2024, 10(12), 321; https://doi.org/10.3390/jimaging10120321 - 13 Dec 2024

Viewed by 616

Abstract

As the demand for autonomous driving (AD) systems has increased, the enhancement of their safety has become critically important. A fundamental capability of AD systems is object detection and trajectory forecasting of vehicles and pedestrians around the ego-vehicle, which is essential for preventing [...] Read more.

As the demand for autonomous driving (AD) systems has increased, the enhancement of their safety has become critically important. A fundamental capability of AD systems is object detection and trajectory forecasting of vehicles and pedestrians around the ego-vehicle, which is essential for preventing potential collisions. This study introduces the Deep learning-based Acceleration-aware Trajectory forecasting (DAT) model, a deep learning-based approach for object detection and trajectory forecasting, utilizing raw sensor measurements. DAT is an end-to-end model that processes sequential sensor data to detect objects and forecasts their future trajectories at each time step. The core innovation of DAT lies in its novel forecasting module, which leverages acceleration data to enhance trajectory forecasting, leading to the consideration of a variety of agent motion models. We propose a robust and innovative method for estimating ground-truth acceleration for objects, along with an object detector that predicts acceleration attributes for each detected object and a novel method for trajectory forecasting. DAT is trained and evaluated on the NuScenes dataset, demonstrating its empirical effectiveness through extensive experiments. The results indicate that DAT significantly surpasses state-of-the-art methods, particularly in enhancing forecasting accuracy for objects exhibiting both linear and nonlinear motion patterns, achieving up to a

2 \times

improvement. This advancement highlights the critical role of incorporating acceleration data into predictive models, representing a substantial step forward in the development of safer autonomous driving systems. Full article

(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)

► Show Figures

Figure 1

Figure 1
(Top Row) Cascade methods, which handle detection, tracking, and forecasting in a sequential pipeline, are vulnerable to error propagation. In the diagram, the arrows indicate the direction of processing for the Lidar data, moving from raw input to the final output. The input data, represented in blue, is gathered from past observations, while the future output is shown in orange. This is because each stage assumes error-free input from the previous one, which is often unrealistic in real-world applications. As a result, errors can accumulate and negatively impact the final predictions. (Bottom Row) End-to-end methods, on the other hand, directly predict future trajectories from raw data. This unified approach allows for the joint optimization of detection, tracking, and forecasting, leading to more accurate and reliable results. Full article ">Figure 2
Acceleration error comparison across different methods. Full article ">Figure 3
Initial velocity error comparison across different methods. Full article ">Figure 4
DAT: based on a LiDAR sequence, DAT detects objects in both the present frame (t) and future frames (up to t + T). These future detections are projected back to the current frame allowing for alignment with detections in the present moment. Full article ">Figure 5
Qualitative evaluation of trajectory forecasts using DAT. In the first row, ground-truth trajectories are depicted in green, the highest confidence forecast in blue, and other potential future trajectories in cyan. The second row compares the highest confidence forecasts of DAT (blue) with those of TrajectoryNAS (magenta), alongside the ground-truth trajectories (green). The results illustrate that DAT predictions are closer to the ground truth. Full article ">

16 pages, 5125 KiB

Open AccessArticle

Multi-Level Feature Fusion in CNN-Based Human Action Recognition: A Case Study on EfficientNet-B7

by Pitiwat Lueangwitchajaroen, Sitapa Watcharapinchai, Worawit Tepsan and Sorn Sooksatra

J. Imaging 2024, 10(12), 320; https://doi.org/10.3390/jimaging10120320 (registering DOI) - 12 Dec 2024

Viewed by 735

Abstract

Accurate human action recognition is becoming increasingly important across various fields, including healthcare and self-driving cars. A simple approach to enhance model performance is incorporating additional data modalities, such as depth frames, point clouds, and skeleton information, while previous studies have predominantly used [...] Read more.

Accurate human action recognition is becoming increasingly important across various fields, including healthcare and self-driving cars. A simple approach to enhance model performance is incorporating additional data modalities, such as depth frames, point clouds, and skeleton information, while previous studies have predominantly used late fusion techniques to combine these modalities, our research introduces a multi-level fusion approach that combines information at early, intermediate, and late stages together. Furthermore, recognizing the challenges of collecting multiple data types in real-world applications, our approach seeks to exploit multimodal techniques while relying solely on RGB frames as the single data source. In our work, we used RGB frames from the NTU RGB+D dataset as the sole data source. From these frames, we extracted 2D skeleton coordinates and optical flow frames using pre-trained models. We evaluated our multi-level fusion approach with EfficientNet-B7 as a case study, and our methods demonstrated significant improvement, achieving 91.5% in NTU RGB+D 60 dataset accuracy compared to single-modality and single-view models. Despite their simplicity, our methods are also comparable to other state-of-the-art approaches. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

14 pages, 2304 KiB

Open AccessArticle

Improved Generalizability in Medical Computer Vision: Hyperbolic Deep Learning in Multi-Modality Neuroimaging

by Cyrus Ayubcha, Sulaiman Sajed, Chady Omara, Anna B. Veldman, Shashi B. Singh, Yashas Ullas Lokesha, Alex Liu, Mohammad Ali Aziz-Sultan, Timothy R. Smith and Andrew Beam

J. Imaging 2024, 10(12), 319; https://doi.org/10.3390/jimaging10120319 - 12 Dec 2024

Viewed by 841

Abstract

Deep learning has shown significant value in automating radiological diagnostics but can be limited by a lack of generalizability to external datasets. Leveraging the geometric principles of non-Euclidean space, certain geometric deep learning approaches may offer an alternative means of improving model generalizability. [...] Read more.

Deep learning has shown significant value in automating radiological diagnostics but can be limited by a lack of generalizability to external datasets. Leveraging the geometric principles of non-Euclidean space, certain geometric deep learning approaches may offer an alternative means of improving model generalizability. This study investigates the potential advantages of hyperbolic convolutional neural networks (HCNNs) over traditional convolutional neural networks (CNNs) in neuroimaging tasks. We conducted a comparative analysis of HCNNs and CNNs across various medical imaging modalities and diseases, with a focus on a compiled multi-modality neuroimaging dataset. The models were assessed for their performance parity, robustness to adversarial attacks, semantic organization of embedding spaces, and generalizability. Zero-shot evaluations were also performed with ischemic stroke non-contrast CT images. HCNNs matched CNNs’ performance in less complex settings and demonstrated superior semantic organization and robustness to adversarial attacks. While HCNNs equaled CNNs in out-of-sample datasets identifying Alzheimer’s disease, in zero-shot evaluations, HCNNs outperformed CNNs and radiologists. HCNNs deliver enhanced robustness and organization in neuroimaging data. This likely underlies why, while HCNNs perform similarly to CNNs with respect to in-sample tasks, they confer improved generalizability. Nevertheless, HCNNs encounter efficiency and performance challenges with larger, complex datasets. These limitations underline the need for further optimization of HCNN architectures. HCNNs present promising improvements in generalizability and resilience for medical imaging applications, particularly in neuroimaging. Despite facing challenges with larger datasets, HCNNs enhance performance under adversarial conditions and offer better semantic organization, suggesting valuable potential in generalizable deep learning models in medical imaging and neuroimaging diagnostics. Full article

(This article belongs to the Special Issue Medical Image Classification and Segmentation: Progress and Challenges)

► Show Figures

Figure 1

Figure 1
Relative model performance across datasets. The bar plot above shows the Top-1 accuracy metrics with 95% confidence intervals for the Euclidean ResNet 18 and the Euclidean–Lorentz ResNet 18 across the three datasets, increasing in size from left to right (i.e., Miniature Multi-Disease (MDD) Dataset, Multi-Modality Neuroimaging (MMN) Dataset, and Multi-Disease (MD) Dataset). Full article ">Figure 2
Euclidean and hyperbolic model T-SNE in the Neuroimaging Dataset. This Figure shows the low-dimensional representation T-SNE of the average class embedding space from the Euclidean ResNet 18 (A) and the Euclidean–Lorentz ResNet 18 (B) for the Multi-Modality Neuroimaging (MMN) Dataset. The colors denote the broader category per class. Full article ">Figure 3
Euclidean and hyperbolic model dendrograms for the Neuroimaging Dataset. This Figure illustrates the hierarchical clustering dendrogram of the average class embedding space of the Euclidean ResNet 18 (A) and the Euclidean–Lorentz ResNet 18 (B) for the Multi-Modality Neuroimaging (MMN) Dataset. Full article ">Figure 4
Zero-shot identification of stroke patients. The diagram above shows how many of the zero-shot stroke patients were identified across the Euclidean and Euclidean–Lorentz models, as well as by human radiologists with emergent non-contrast brain CT imaging. We also note that 26 patients were not identified using any of the three approaches. Full article ">

22 pages, 15973 KiB

Open AccessArticle

Three-Dimensional Bone-Image Synthesis with Generative Adversarial Networks

by Christoph Angermann, Johannes Bereiter-Payr, Kerstin Stock, Gerald Degenhart and Markus Haltmeier

J. Imaging 2024, 10(12), 318; https://doi.org/10.3390/jimaging10120318 - 11 Dec 2024

Viewed by 639

Abstract

Medical image processing has been highlighted as an area where deep-learning-based models have the greatest potential. However, in the medical field, in particular, problems of data availability and privacy are hampering research progress and, thus, rapid implementation in clinical routine. The generation of [...] Read more.

Medical image processing has been highlighted as an area where deep-learning-based models have the greatest potential. However, in the medical field, in particular, problems of data availability and privacy are hampering research progress and, thus, rapid implementation in clinical routine. The generation of synthetic data not only ensures privacy but also allows the drawing of new patients with specific characteristics, enabling the development of data-driven models on a much larger scale. This work demonstrates that three-dimensional generative adversarial networks (GANs) can be efficiently trained to generate high-resolution medical volumes with finely detailed voxel-based architectures. In addition, GAN inversion is successfully implemented for the three-dimensional setting and used for extensive research on model interpretability and applications such as image morphing, attribute editing, and style mixing. The results are comprehensively validated on a database of three-dimensional HR-pQCT instances representing the bone micro-architecture of the distal radius. Full article

(This article belongs to the Special Issue Advances in Medical Imaging and Machine Learning)

► Show Figures

Figure 1

22 pages, 3640 KiB

Open AccessArticle

Evaluation of Color Difference Models for Wide Color Gamut and High Dynamic Range

by Olga Basova, Sergey Gladilin, Vladislav Kokhan, Mikhalina Kharkevich, Anastasia Sarycheva, Ivan Konovalenko, Mikhail Chobanu and Ilya Nikolaev

J. Imaging 2024, 10(12), 317; https://doi.org/10.3390/jimaging10120317 - 10 Dec 2024

Viewed by 643

Abstract

Color difference models (CDMs) are essential for accurate color reproduction in image processing. While CDMs aim to reflect perceived color differences (CDs) from psychophysical data, they remain largely untested in wide color gamut (WCG) and high dynamic range (HDR) contexts, which are underrepresented [...] Read more.

Color difference models (CDMs) are essential for accurate color reproduction in image processing. While CDMs aim to reflect perceived color differences (CDs) from psychophysical data, they remain largely untested in wide color gamut (WCG) and high dynamic range (HDR) contexts, which are underrepresented in current datasets. This gap highlights the need to validate CDMs across WCG and HDR. Moreover, the non-geodesic structure of perceptual color space necessitates datasets covering CDs of various magnitudes, while most existing datasets emphasize only small and threshold CDs. To address this, we collected a new dataset encompassing a broad range of CDs in WCG and HDR contexts and developed a novel CDM fitted to these data. Benchmarking various CDMs using STRESS and significant error fractions on both new and established datasets reveals that CAM16-UCS with power correction is the most versatile model, delivering strong average performance across WCG colors up to 1611 cd/m². However, even the best CDM fails to achieve the desired accuracy limits and yields significant errors. CAM16-UCS, though promising, requires further refinement, particularly in its power correction component to better capture the non-geodesic structure of perceptual color space. Full article

(This article belongs to the Special Issue Color in Image Processing and Computer Vision)

► Show Figures

Figure 1

11 pages, 1525 KiB

Open AccessArticle

Toward Closing the Loop in Image-to-Image Conversion in Radiotherapy: A Quality Control Tool to Predict Synthetic Computed Tomography Hounsfield Unit Accuracy

by Paolo Zaffino, Ciro Benito Raggio, Adrian Thummerer, Gabriel Guterres Marmitt, Johannes Albertus Langendijk, Anna Procopio, Carlo Cosentino, Joao Seco, Antje Christin Knopf, Stefan Both and Maria Francesca Spadea

J. Imaging 2024, 10(12), 316; https://doi.org/10.3390/jimaging10120316 - 10 Dec 2024

Viewed by 808

Abstract

In recent years, synthetic Computed Tomography (CT) images generated from Magnetic Resonance (MR) or Cone Beam Computed Tomography (CBCT) acquisitions have been shown to be comparable to real CT images in terms of dose computation for radiotherapy simulation. However, until now, there has [...] Read more.

In recent years, synthetic Computed Tomography (CT) images generated from Magnetic Resonance (MR) or Cone Beam Computed Tomography (CBCT) acquisitions have been shown to be comparable to real CT images in terms of dose computation for radiotherapy simulation. However, until now, there has been no independent strategy to assess the quality of each synthetic image in the absence of ground truth. In this work, we propose a Deep Learning (DL)-based framework to predict the accuracy of synthetic CT in terms of Mean Absolute Error (MAE) without the need for a ground truth (GT). The proposed algorithm generates a volumetric map as an output, informing clinicians of the predicted MAE slice-by-slice. A cascading multi-model architecture was used to deal with the complexity of the MAE prediction task. The workflow was trained and tested on two cohorts of head and neck cancer patients with different imaging modalities: 27 MR scans and 33 CBCT. The algorithm evaluation revealed an accurate HU prediction (a median absolute prediction deviation equal to 4 HU for CBCT-based synthetic CTs and 6 HU for MR-based synthetic CTs), with discrepancies that do not affect the clinical decisions made on the basis of the proposed estimation. The workflow exhibited no systematic error in MAE prediction. This work represents a proof of concept about the feasibility of synthetic CT evaluation in daily clinical practice, and it paves the way for future patient-specific quality assessment strategies. Full article

(This article belongs to the Special Issue Advances in Biomedical Image Processing and Artificial Intelligence for Computer-Aided Diagnosis in Medicine)

► Show Figures

Figure 1

Figure 1
Representation of the general MAE prediction pipeline. An axial sCT slice is given as input, and the associated MAE scalar for the image slice is predicted by using a DL pipeline. Full article ">Figure 2
A more detailed graphical representation of the MAE prediction pipeline. The final MAE prediction is obtained as a result of two DL steps: First a raw MAE interval classification is performed, followed by a more precise MAE estimation based on a regression algorithm. Full article ">Figure 3
Exemplary <math display="inline"><semantics> <mrow> <mi>s</mi> <mi>C</mi> <msub> <mi>T</mi> <mrow> <mi>C</mi> <mi>B</mi> <mi>C</mi> <mi>T</mi> </mrow> </msub> </mrow> </semantics></math> overlaid with its <math display="inline"><semantics> <mrow> <mi>p</mi> <mi>M</mi> <mi>A</mi> <msub> <mi>E</mi> <mrow> <mi>v</mi> <mi>o</mi> <mi>l</mi> <mi>u</mi> <mi>m</mi> <mi>e</mi> </mrow> </msub> </mrow> </semantics></math>. In addition to the 2D views (axial, sagittal, and coronal planes), the 3D representation is also shown. Full article ">Figure 4
Detailed workflow of MAE prediction. A single sCT axial slice is fed firstly into a DL model that classifies it as belonging to a specific MAE class. According to this prediction, the 2D image is then provided as input to a connected DL regression model, specifically trained to operate on a restricted range of MAE values. As a result, the MAE of a single sCT slice can be forecasted. In order to train the different models with a GT MAE, the ground truth CT is needed (dashed lines are needed only to train the models). Full article ">Figure 5
PD distributions for modality-specific and mixed pipelines. Results for <math display="inline"><semantics> <mrow> <mi>s</mi> <mi>C</mi> <msub> <mi>T</mi> <mrow> <mi>C</mi> <mi>B</mi> <mi>C</mi> <mi>T</mi> </mrow> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>s</mi> <mi>C</mi> <msub> <mi>T</mi> <mrow> <mi>M</mi> <mi>R</mi> </mrow> </msub> </mrow> </semantics></math> are reported, respectively, in the left and in the right panel. Full article ">Figure 6
APD distributions for modality-specific and mixed pipelines. Results for <math display="inline"><semantics> <mrow> <mi>s</mi> <mi>C</mi> <msub> <mi>T</mi> <mrow> <mi>C</mi> <mi>B</mi> <mi>C</mi> <mi>T</mi> </mrow> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>s</mi> <mi>C</mi> <msub> <mi>T</mi> <mrow> <mi>M</mi> <mi>R</mi> </mrow> </msub> </mrow> </semantics></math> are reported, respectively, in the left and in the right panel. Full article ">

12 pages, 599 KiB

Open AccessArticle

PAS or Not PAS? The Sonographic Assessment of Placenta Accreta Spectrum Disorders and the Clinical Validation of a New Diagnostic and Prognostic Scoring System

by Antonella Vimercati, Arianna Galante, Margherita Fanelli, Francesca Cirignaco, Amerigo Vitagliano, Pierpaolo Nicolì, Andrea Tinelli, Antonio Malvasi, Miriam Dellino, Gianluca Raffaello Damiani, Barbara Crescenza, Giorgio Maria Baldini, Ettore Cicinelli and Marco Cerbone

J. Imaging 2024, 10(12), 315; https://doi.org/10.3390/jimaging10120315 - 10 Dec 2024

Viewed by 681

Abstract

This study aimed to evaluate our center’s experience in diagnosing and managing placenta accreta spectrum (PAS) in a high-risk population, focusing on prenatal ultrasound features associated with PAS severity and maternal outcomes. We conducted a retrospective analysis of 102 high-risk patients with confirmed [...] Read more.

This study aimed to evaluate our center’s experience in diagnosing and managing placenta accreta spectrum (PAS) in a high-risk population, focusing on prenatal ultrasound features associated with PAS severity and maternal outcomes. We conducted a retrospective analysis of 102 high-risk patients with confirmed placenta previa who delivered at our center between 2018 and 2023. Patients underwent transabdominal and transvaginal ultrasound scans, assessing typical sonographic features. Binary and multivariate logistic regression analyses were performed to identify sonographic markers predictive of PAS and relative complications. Key ultrasound features—retroplacental myometrial thinning (<1 mm), vascular lacunae, and retroplacental vascularization—were significantly associated with PAS and a higher risk of surgical complications. An exceedingly rare sign, the “riddled cervix” sign, was observed in only three patients with extensive cervical or parametrial involvement. Those patients had the worst surgical outcomes. This study highlights the utility of specific ultrasound features in stratifying PAS risk and guiding clinical and surgical management in high-risk pregnancies. The findings support integrating these markers into prenatal diagnostic protocols to improve patient outcomes and inform surgical planning. Full article

(This article belongs to the Special Issue Clinical and Pathological Imaging in the Era of Artificial Intelligence: New Insights and Perspectives)

► Show Figures

Figure 1

19 pages, 9164 KiB

Open AccessArticle

A Regularization Method for Landslide Thickness Estimation

by Lisa Borgatti, Davide Donati, Liwei Hu, Germana Landi and Fabiana Zama

J. Imaging 2024, 10(12), 314; https://doi.org/10.3390/jimaging10120314 - 10 Dec 2024

Viewed by 607

Abstract

Accurate estimation of landslide depth is essential for practical hazard assessment and risk mitigation. This work addresses the problem of determining landslide depth from satellite-derived elevation data. Using the principle of mass conservation, this problem can be formulated as a linear inverse problem. [...] Read more.

Accurate estimation of landslide depth is essential for practical hazard assessment and risk mitigation. This work addresses the problem of determining landslide depth from satellite-derived elevation data. Using the principle of mass conservation, this problem can be formulated as a linear inverse problem. To solve the inverse problem, we present a regularization approach that computes approximate solutions and regularization parameters using the Balancing Principle. Synthetic data were carefully designed and generated to evaluate the method under controlled conditions, allowing for precise validation of its performance. Through comprehensive testing with this synthetic dataset, we demonstrate the method’s robustness across varying noise levels. When applied to real-world data from the Fels landslide in Alaska, the proposed method proved its practical value in reconstructing landslide thickness patterns. These reconstructions showed good agreement with existing geological interpretations, validating the method’s effectiveness in real-world scenarios. Full article

► Show Figures

Figure 1

15 pages, 627 KiB

Open AccessReview

Real-Time Emotion Recognition for Improving the Teaching–Learning Process: A Scoping Review

by Cèlia Llurba and Ramon Palau

J. Imaging 2024, 10(12), 313; https://doi.org/10.3390/jimaging10120313 - 9 Dec 2024

Viewed by 768

Abstract

Emotion recognition (ER) is gaining popularity in various fields, including education. The benefits of ER in the classroom for educational purposes, such as improving students’ academic performance, are gradually becoming known. Thus, real-time ER is proving to be a valuable tool for teachers [...] Read more.

Emotion recognition (ER) is gaining popularity in various fields, including education. The benefits of ER in the classroom for educational purposes, such as improving students’ academic performance, are gradually becoming known. Thus, real-time ER is proving to be a valuable tool for teachers as well as for students. However, its feasibility in educational settings requires further exploration. This review offers learning experiences based on real-time ER with students to explore their potential in learning and in improving their academic achievement. The purpose is to present evidence of good implementation and suggestions for their successful application. The content analysis finds that most of the practices lead to significant improvements in terms of educational purposes. Nevertheless, the analysis identifies problems that might block the implementation of these practices in the classroom and in education; among the obstacles identified are the absence of privacy of the students and the support needs of the students. We conclude that artificial intelligence (AI) and ER are potential tools to approach the needs in ordinary classrooms, although reliable automatic recognition is still a challenge for researchers to achieve the best ER feature in real time, given the high input data variability. Full article

(This article belongs to the Special Issue Deep Learning in Image Analysis: Progress and Challenges)

► Show Figures

Figure 1

7 pages, 1102 KiB

Open AccessCommunication

Quantitative MRI Assessment of Post-Surgical Spinal Cord Injury Through Radiomic Analysis

by Azadeh Sharafi, Andrew P. Klein and Kevin M. Koch

J. Imaging 2024, 10(12), 312; https://doi.org/10.3390/jimaging10120312 - 8 Dec 2024

Viewed by 616

Abstract

This study investigates radiomic efficacy in post-surgical traumatic spinal cord injury (SCI), overcoming MRI limitations from metal artifacts to enhance diagnosis, severity assessment, and lesion characterization or prognosis and therapy guidance. Traumatic spinal cord injury (SCI) causes severe neurological deficits. While MRI allows [...] Read more.

This study investigates radiomic efficacy in post-surgical traumatic spinal cord injury (SCI), overcoming MRI limitations from metal artifacts to enhance diagnosis, severity assessment, and lesion characterization or prognosis and therapy guidance. Traumatic spinal cord injury (SCI) causes severe neurological deficits. While MRI allows qualitative injury evaluation, standard imaging alone has limitations for precise SCI diagnosis, severity stratification, and pathology characterization, which are needed to guide prognosis and therapy. Radiomics enables quantitative tissue phenotyping by extracting a high-dimensional set of descriptive texture features from medical images. However, the efficacy of postoperative radiomic quantification in the presence of metal-induced MRI artifacts from spinal instrumentation has yet to be fully explored. A total of 50 healthy controls and 12 SCI patients post-stabilization surgery underwent 3D multi-spectral MRI. Automated spinal cord segmentation was followed by radiomic feature extraction. Supervised machine learning categorized SCI versus controls, injury severity, and lesion location relative to instrumentation. Radiomics differentiated SCI patients (Matthews correlation coefficient (MCC) 0.97; accuracy 1.0), categorized injury severity (MCC: 0.95; ACC: 0.98), and localized lesions (MCC: 0.85; ACC: 0.90). Combined T₁ and T₂ features outperformed individual modalities across tasks with gradient boosting models showing the highest efficacy. The radiomic framework achieved excellent performance, differentiating SCI from controls and accurately categorizing injury severity. The ability to reliably quantify SCI severity and localization could potentially inform diagnosis, prognosis, and guide therapy. Further research is warranted to validate radiomic SCI biomarkers and explore clinical integration. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

59 pages, 3270 KiB

Open AccessReview

State-of-the-Art Deep Learning Methods for Microscopic Image Segmentation: Applications to Cells, Nuclei, and Tissues

by Fatma Krikid, Hugo Rositi and Antoine Vacavant

J. Imaging 2024, 10(12), 311; https://doi.org/10.3390/jimaging10120311 - 6 Dec 2024

Cited by 1 | Viewed by 1547

Abstract

Microscopic image segmentation (MIS) is a fundamental task in medical imaging and biological research, essential for precise analysis of cellular structures and tissues. Despite its importance, the segmentation process encounters significant challenges, including variability in imaging conditions, complex biological structures, and artefacts (e.g., [...] Read more.

Microscopic image segmentation (MIS) is a fundamental task in medical imaging and biological research, essential for precise analysis of cellular structures and tissues. Despite its importance, the segmentation process encounters significant challenges, including variability in imaging conditions, complex biological structures, and artefacts (e.g., noise), which can compromise the accuracy of traditional methods. The emergence of deep learning (DL) has catalyzed substantial advancements in addressing these issues. This systematic literature review (SLR) provides a comprehensive overview of state-of-the-art DL methods developed over the past six years for the segmentation of microscopic images. We critically analyze key contributions, emphasizing how these methods specifically tackle challenges in cell, nucleus, and tissue segmentation. Additionally, we evaluate the datasets and performance metrics employed in these studies. By synthesizing current advancements and identifying gaps in existing approaches, this review not only highlights the transformative potential of DL in enhancing diagnostic accuracy and research efficiency but also suggests directions for future research. The findings of this study have significant implications for improving methodologies in medical and biological applications, ultimately fostering better patient outcomes and advancing scientific understanding. Full article

► Show Figures

Figure 1

17 pages, 10713 KiB

Open AccessArticle

UV Hyperspectral Imaging with Xenon and Deuterium Light Sources: Integrating PCA and Neural Networks for Analysis of Different Raw Cotton Types

by Mohammad Al Ktash, Mona Knoblich, Max Eberle, Frank Wackenhut and Marc Brecht

J. Imaging 2024, 10(12), 310; https://doi.org/10.3390/jimaging10120310 - 5 Dec 2024

Viewed by 745

Abstract

Ultraviolet (UV) hyperspectral imaging shows significant promise for the classification and quality assessment of raw cotton, a key material in the textile industry. This study evaluates the efficacy of UV hyperspectral imaging (225–408 nm) using two different light sources: xenon arc (XBO) and [...] Read more.

Ultraviolet (UV) hyperspectral imaging shows significant promise for the classification and quality assessment of raw cotton, a key material in the textile industry. This study evaluates the efficacy of UV hyperspectral imaging (225–408 nm) using two different light sources: xenon arc (XBO) and deuterium lamps, in comparison to NIR hyperspectral imaging. The aim is to determine which light source provides better differentiation between cotton types in UV hyperspectral imaging, as each interacts differently with the materials, potentially affecting imaging quality and classification accuracy. Principal component analysis (PCA) and Quadratic Discriminant Analysis (QDA) were employed to differentiate between various cotton types and hemp plant. PCA for the XBO illumination revealed that the first three principal components (PCs) accounted for 94.8% of the total variance: PC1 (78.4%) and PC2 (11.6%) clustered the samples into four main groups—hemp (HP), recycled cotton (RcC), and organic cotton (OC) from the other cotton samples—while PC3 (6%) further separated RcC. When using the deuterium light source, the first three PCs explained 89.4% of the variance, effectively distinguishing sample types such as HP, RcC, and OC from the remaining samples, with PC3 clearly separating RcC. When combining the PCA scores with QDA, the classification accuracy reached 76.1% for the XBO light source and 85.1% for the deuterium light source. Furthermore, a deep learning technique called a fully connected neural network for classification was applied. The classification accuracy for the XBO and deuterium light sources reached 83.6% and 90.1%, respectively. The results highlight the ability of this method to differentiate conventional and organic cotton, as well as hemp, and to identify distinct types of recycled cotton, suggesting varying recycling processes and possible common origins with raw cotton. These findings underscore the potential of UV hyperspectral imaging, coupled with chemometric models, as a powerful tool for enhancing cotton classification accuracy in the textile industry. Full article

(This article belongs to the Section Color, Multi-spectral, and Hyperspectral Imaging)

► Show Figures

Figure 1

17 pages, 3796 KiB

Open AccessArticle

FastQAFPN-YOLOv8s-Based Method for Rapid and Lightweight Detection of Walnut Unseparated Material

by Junqiu Li, Jiayi Wang, Dexiao Kong, Qinghui Zhang and Zhenping Qiang

J. Imaging 2024, 10(12), 309; https://doi.org/10.3390/jimaging10120309 - 2 Dec 2024

Cited by 1 | Viewed by 783

Abstract

Walnuts possess significant nutritional and economic value. Fast and accurate sorting of shells and kernels will enhance the efficiency of automated production. Therefore, we propose a FastQAFPN-YOLOv8s object detection network to achieve rapid and precise detection of unsorted materials. The method uses lightweight [...] Read more.

Walnuts possess significant nutritional and economic value. Fast and accurate sorting of shells and kernels will enhance the efficiency of automated production. Therefore, we propose a FastQAFPN-YOLOv8s object detection network to achieve rapid and precise detection of unsorted materials. The method uses lightweight Pconv (Partial Convolution) operators to build the FasterNextBlock structure, which serves as the backbone feature extractor for the Fasternet feature extraction network. The ECIoU loss function, combining EIoU (Efficient-IoU) and CIoU (Complete-IoU), speeds up the adjustment of the prediction frame and the network regression. In the Neck section of the network, the QAFPN feature fusion extraction network is proposed to replace the PAN-FPN (Path Aggregation Network—Feature Pyramid Network) in YOLOv8s with a Rep-PAN structure based on the QARepNext reparameterization framework for feature fusion extraction to strike a balance between network performance and inference speed. To validate the method, we built a three-axis mobile sorting device and created a dataset of 3000 images of walnuts after shell removal for experiments. The results show that the improved network contains 6071008 parameters, a training time of 2.49 h, a model size of 12.3 MB, an mAP (Mean Average Precision) of 94.5%, and a frame rate of 52.1 FPS. Compared with the original model, the number of parameters decreased by 45.5%, with training time reduced by 32.7%, the model size shrunk by 45.3%, and frame rate improved by 40.8%. However, some accuracy is sacrificed due to the lightweight design, resulting in a 1.2% decrease in mAP. The network reduces the model size by 59.7 MB and 23.9 MB compared to YOLOv7 and YOLOv6, respectively, and improves the frame rate by 15.67 fps and 22.55 fps, respectively. The average confidence and mAP show minimal changes compared to YOLOv7 and improved by 4.2% and 2.4% compared to YOLOv6, respectively. The FastQAFPN-YOLOv8s detection method effectively reduces model size while maintaining recognition accuracy. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

12 pages, 1486 KiB

Open AccessArticle

Elucidating Early Radiation-Induced Cardiotoxicity Markers in Preclinical Genetic Models Through Advanced Machine Learning and Cardiac MRI

by Dayeong An and El-Sayed Ibrahim

J. Imaging 2024, 10(12), 308; https://doi.org/10.3390/jimaging10120308 - 1 Dec 2024

Viewed by 788

Abstract

Radiation therapy (RT) is widely used to treat thoracic cancers but carries a risk of radiation-induced heart disease (RIHD). This study aimed to detect early markers of RIHD using machine learning (ML) techniques and cardiac MRI in a rat model. SS.BN3 consomic rats, [...] Read more.

Radiation therapy (RT) is widely used to treat thoracic cancers but carries a risk of radiation-induced heart disease (RIHD). This study aimed to detect early markers of RIHD using machine learning (ML) techniques and cardiac MRI in a rat model. SS.BN3 consomic rats, which have a more subtle RIHD phenotype compared to Dahl salt-sensitive (SS) rats, were treated with localized cardiac RT or sham at 10 weeks of age. Cardiac MRI was performed 8 and 10 weeks post-treatment to assess global and regional cardiac function. ML algorithms were applied to differentiate sham-treated and irradiated rats based on early changes in myocardial function. Despite normal global left ventricular ejection fraction in both groups, strain analysis showed significant reductions in the anteroseptal and anterolateral segments of irradiated rats. Gradient boosting achieved an F1 score of 0.94 and an ROC value of 0.95, while random forest showed an accuracy of 88%. These findings suggest that ML, combined with cardiac MRI, can effectively detect early preclinical changes in RIHD, particularly alterations in regional myocardial contractility, highlighting the potential of these techniques for early detection and monitoring of radiation-induced cardiac dysfunction. Full article

(This article belongs to the Special Issue Progress and Challenges in Biomedical Image Analysis)

► Show Figures

Figure 1

Figure 1
Ventricular remodeling post-RT to preserve global cardiac function. Mid-ventricular short axis cine images showing end-diastolic and end-systolic images in sham, 8 weeks post-RT, and 10 weeks post-RT in SS.BN3 rats. The images show preserved cardiac function post-RT, along with concentric hypertrophy (arrow). Full article ">Figure 2
Contractility pattern changes post-RT. Segmental (a) circumferential, (b) radial, and (c) longitudinal strain curves throughout the whole cardiac cycle (20 timeframes starting after the R-wave of the ECG signal) in SS.BN3 sham, 8 weeks post-RT, and 10 weeks post-RT rats. Myocardial segmental color code is shown on the left panel in each row (short-axis slices for circumferential and radial strain and long-axis slice for longitudinal strain), where Ant, Inf, Sept, and Lat show the anterior, inferior, septal, and lateral segments, respectively. Note reduced peak strain post-RT. Note more heterogeneity (mechanical dyssynchrony) between strain curves from different heart segments at 10 weeks post-RT. Full article ">Figure 3
The bar plots illustrate the differences across four key metrics: (a) circumferential strain, (b) radial strain, (c) rotation angle, and (d) short-axis (SAX) motion. Data are represented for the six myocardial sectors: anterior, anteroseptal, inferoseptal, inferior, inferolateral, and anterolateral. Error bars show SEM. Asterisk (*) indicates statistically significant (p < 0.05) differences between sham and post-RT. Full article ">Figure 4
Thebar plots illustrate the differences across four key metrics: (a) circumferential strain, (b) radial strain, (c) rotation angle, and (d) short-axis (SAX) motion. Data are represented for the six myocardial sectors: anterior, anteroseptal, inferoseptal, inferior, inferolateral, and anterolateral. Error bars show SEM. Asterisk (*) indicates statistically significant (p < 0.05) differences between sham and 8 weeks post-RT or 10 weeks post-RT. Hash (#) indicates statistically significant differences between 8 weeks post-RT and 10 weeks post-RT. Full article ">Figure 5
Comparison of performance metrics across different classifiers for various feature sets to differentiate sham vs. irradiated SS.BN3 rats. Bars represent metrics (accuracy, F1 score, specificity, sensitivity, ROC AUC) for feature sets (a) Lasso, (b) selected 7 features, (c) selected 19 features, and (d) all features. Full article ">Figure 6
Comparison of performance metrics across different classifiers for various feature sets to differentiate sham vs. 8 weeks post-RT vs. 10 weeks post-RT. Bars represent metrics (accuracy, F1 score, specificity, sensitivity, ROC AUC) for feature sets (a) Lasso, (b) selected 7 features, (c) selected 21 features, and (d) all features. Full article ">

17 pages, 648 KiB

Open AccessArticle

Temporal Gap-Aware Attention Model for Temporal Action Proposal Generation

by Sorn Sooksatra and Sitapa Watcharapinchai

J. Imaging 2024, 10(12), 307; https://doi.org/10.3390/jimaging10120307 - 29 Nov 2024

Viewed by 667

Abstract

Temporal action proposal generation is a method for extracting temporal action instances or proposals from untrimmed videos. Existing methods often struggle to segment contiguous action proposals, which are a group of action boundaries with small temporal gaps. To address this limitation, we propose [...] Read more.

Temporal action proposal generation is a method for extracting temporal action instances or proposals from untrimmed videos. Existing methods often struggle to segment contiguous action proposals, which are a group of action boundaries with small temporal gaps. To address this limitation, we propose incorporating an attention mechanism to weigh the importance of each proposal within a contiguous group. This mechanism leverages the gap displacement between proposals to calculate attention scores, enabling a more accurate localization of action boundaries. We evaluate our method against a state-of-the-art boundary-based baseline on ActivityNet v1.3 and Thumos 2014 datasets. The experimental results demonstrate that our approach significantly improves the performance of short-duration and contiguous action proposals, achieving an average recall of 78.22%. Full article

► Show Figures

Figure 1

Figure 1
(a) An example of a video frame sequence with contiguous action proposals (within the green box) in predicted results, (b) the BMN candidate action proposal boundaries in different intensities with confidence scores, with actual proposals shown as gray shaded regions, and (c) the confidence scores on proposals by temporal length (best viewed in color). Full article ">Figure 2
Examples of attention masks from videos with various gap displacements in the range of (a) 0–10 s, (b) 20–30 s, and (c) more than 100 s, where green dots represent the action proposal positions. (Best viewed in color.) Full article ">Figure 3
The proposed TAPG network architecture (G-MCBD) with score fusion and SNMS in an inference phase. Green and red circles represent the starting and ending time of each action proposal, respectively (best viewed in color). Full article ">Figure 4
Examples of successful cases in merging contiguous proposals (top) and emphasizing small action proposals (bottom) with predicted proposals from MCBD (blue lines), G-MCBD (green lines), and their ground truths (red lines). The predicted starting and ending times of each are indicated by the beginning and ending of each line, respectively (best viewed in color). Full article ">Figure 5
Examples of failure cases in overlapping proposals (top) and proposals within temporal gaps (bottom) with predicted proposals from MCBD (blue lines), G-MCBD (green lines), and their ground truths (red lines). The predicted starting and ending times of each are indicated by the beginning and ending of each line, respectively (best viewed in color). Full article ">

39 pages, 3120 KiB

Open AccessArticle

A Comparative Review of the SWEET Simulator: Theoretical Verification Against Other Simulators

by Amine Ben-Daoued, Frédéric Bernardin and Pierre Duthon

J. Imaging 2024, 10(12), 306; https://doi.org/10.3390/jimaging10120306 - 27 Nov 2024

Viewed by 626

Abstract

Accurate luminance-based image generation is critical in physically based simulations, as even minor inaccuracies in radiative transfer calculations can introduce noise or artifacts, adversely affecting image quality. The radiative transfer simulator, SWEET, uses a backward Monte Carlo approach, and its performance is analyzed [...] Read more.

Accurate luminance-based image generation is critical in physically based simulations, as even minor inaccuracies in radiative transfer calculations can introduce noise or artifacts, adversely affecting image quality. The radiative transfer simulator, SWEET, uses a backward Monte Carlo approach, and its performance is analyzed alongside other simulators to assess how Monte Carlo-induced biases vary with parameters like optical thickness and medium anisotropy. This work details the advancements made to SWEET since the previous publication, with a specific focus on a more comprehensive comparison with other simulators such as Mitsuba. The core objective is to evaluate the precision of SWEET by comparing radiometric quantities like luminance, which serves as a method for validating the simulator. This analysis is particularly important in contexts such as automotive camera imaging, where accurate scene representation is crucial to reducing noise and ensuring the reliability of image-based systems in autonomous driving. By focusing on detailed radiometric comparisons, this study underscores SWEET’s ability to minimize noise, thus providing high-quality imaging for advanced applications. Full article

► Show Figures

Figure 1

16 pages, 20362 KiB

Open AccessArticle

IngredSAM: Open-World Food Ingredient Segmentation via a Single Image Prompt

by Leyi Chen, Bowen Wang and Jiaxin Zhang

J. Imaging 2024, 10(12), 305; https://doi.org/10.3390/jimaging10120305 - 26 Nov 2024

Viewed by 776

Abstract

Food semantic segmentation is of great significance in the field of computer vision and artificial intelligence, especially in the application of food image analysis. Due to the complexity and variety of food, it is difficult to effectively handle this task using supervised methods. [...] Read more.

Food semantic segmentation is of great significance in the field of computer vision and artificial intelligence, especially in the application of food image analysis. Due to the complexity and variety of food, it is difficult to effectively handle this task using supervised methods. Thus, we introduce IngredSAM, a novel approach for open-world food ingredient semantic segmentation, extending the capabilities of the Segment Anything Model (SAM). Utilizing visual foundation models (VFMs) and prompt engineering, IngredSAM leverages discriminative and matchable semantic features between a single clean image prompt of specific ingredients and open-world images to guide the generation of accurate segmentation masks in real-world scenarios. This method addresses the challenges of traditional supervised models in dealing with the diverse appearances and class imbalances of food ingredients. Our framework demonstrates significant advancements in the segmentation of food ingredients without any training process, achieving 2.85% and 6.01% better performance than previous state-of-the-art methods on both FoodSeg103 and UECFoodPix datasets. IngredSAM exemplifies a successful application of one-shot, open-world segmentation, paving the way for downstream applications such as enhancements in nutritional analysis and consumer dietary trend monitoring. Full article

(This article belongs to the Section AI in Imaging)

► Show Figures

Figure 1

Figure 1
The prompt image provides the segmentation target, while the Visual Foundation Models generate point prompts for the open-world image. Full article ">Figure 2
Pipeline of SAM. Mask, points, box, and text are four types of prompts. Full article ">Figure 3
IngredSAM Architecture: Our model is divided into three stages: Feature Aggregation, Feature Processing, and Prompt Generation. The final stage outputs point prompts used to prompt SAM to generate reasonable masks for the open-world image. Full article ">Figure 4
UECFoodPix Complete and FoodSeg103 datasets samples: it can be observed that the UECFoodPix Complete dataset does not provide detailed annotations for food ingredients, whereas FoodSeg103 includes detailed annotated masks for all ingredients. Full article ">Figure 5
IngredSAM Segmentation Visualization Results: It can be seen that the food ingredients represented by the prompt image are completely segmented in the open-world image. Full article ">Figure 6
Visualization of the effectiveness of using a background filtering algorithm. Full article ">

Journal Menu

Journal Browser

J. Imaging, Volume 10, Issue 12 (December 2024) – 39 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI