[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Next Issue
Volume 11, January
Previous Issue
Volume 10, November
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 

J. Imaging, Volume 10, Issue 12 (December 2024) – 39 articles

Cover Story (view full-size image): This article presents a novel computational framework that addresses a fundamental challenge in geohazard assessment: determining landslide thickness from satellite-derived elevation data. By combining mass conservation principles with sophisticated regularization techniques, our method converts surface measurements into meaningful subsurface information. The accompanying images show its application to the Fels landslide in Alaska, demonstrating how modern imaging techniques and mathematical algorithms can work together to solve complex geological problems. This development represents a significant step forward in the field of quantitative image analysis for natural hazard assessment, offering new possibilities for understanding landslide mechanics through non-invasive observations. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
13 pages, 2632 KiB  
Article
Volumetric Humeral Canal Fill Ratio Effects Primary Stability and Cortical Bone Loading in Short and Standard Stem Reverse Shoulder Arthroplasty: A Biomechanical and Computational Study
by Daniel Ritter, Patric Raiss, Patrick J. Denard, Brian C. Werner, Peter E. Müller, Matthias Woiczinski, Coen A. Wijdicks and Samuel Bachmaier
J. Imaging 2024, 10(12), 334; https://doi.org/10.3390/jimaging10120334 - 23 Dec 2024
Viewed by 881
Abstract
Objective: This study evaluated the effect of three-dimensional (3D) volumetric humeral canal fill ratios (VFR) of reverse shoulder arthroplasty (RSA) short and standard stems on biomechanical stability and bone deformations in the proximal humerus. Methods: Forty cadaveric shoulder specimens were analyzed in a [...] Read more.
Objective: This study evaluated the effect of three-dimensional (3D) volumetric humeral canal fill ratios (VFR) of reverse shoulder arthroplasty (RSA) short and standard stems on biomechanical stability and bone deformations in the proximal humerus. Methods: Forty cadaveric shoulder specimens were analyzed in a clinical computed tomography (CT) scanner allowing for segmentation of the humeral canal to calculate volumetric measures which were verified postoperatively with plain radiographs. Virtual implant positioning allowed for group assignment (VFR < 0.72): Standard stem with low (n = 10) and high (n = 10) filling ratios, a short stem with low (n = 10) and high filling ratios (n = 10). Biomechanical testing included cyclic loading of the native bone and the implanted humeral component. Optical recording allowed for spatial implant tracking and the quantification of cortical bone deformations in the proximal humerus. Results: Planned filling ratios based on 3D volumetric measures had a good-to-excellent correlation (ICC = 0.835; p < 0.001) with implanted filling ratios. Lower canal fill ratios resulted in significantly higher variability between short and standard stems regarding implant tilt (820 N: p = 0.030) and subsidence (220 N: p = 0.046, 520 N: p = 0.007 and 820 N: p = 0.005). Higher filling ratios resulted in significantly lower bone deformations in the medial calcar area compared to the native bone, while the bone deformations in lower filling ratios did not differ significantly (p > 0.177). Conclusions: Lower canal filling ratios maintain dynamic bone loading in the medial calcar of the humerus similar to the native situation in this biomechanical loading setup. Short stems implanted with a low filling ratio have an increased risk for implant tilt and subsidence compared to high filling ratios or standard stems. Full article
Show Figures

Figure 1

Figure 1
<p>Methodical framework, from virtually planning and developing a volumetric measure of the humeral canal which was used in this study for group assignment and planning of low and high filling ratios. Canal fill ratios were controlled using postoperative X-rays after the implantation and before testing the implanted humeral component biomechanically.</p>
Full article ">Figure 2
<p>Measurement and calculation of the filling ratios by dividing the red marked measure through the respective blue one. The three-dimensional rendered and segmented CT data on the left side allowed for volumetric calculation of the canal fill ratio (3D VFR). Calculation of the canal fill ratios based on two-dimensional plane radiographs (2D Metaphysis FR and 2D Diaphysis FR) is shown on the right side based on current clinical practice [<a href="#B14-jimaging-10-00334" class="html-bibr">14</a>,<a href="#B16-jimaging-10-00334" class="html-bibr">16</a>].</p>
Full article ">Figure 3
<p>The 2D to 3D registration allowedto validate the accuracy of preoperative canal fill measurements with the actual postoperative implant seating: (<b>A</b>). preoperative planning of the humeral implant (purple) and segmentation of the humeral canal (orange), (<b>B</b>). registration of postOP X-rays, (<b>C</b>). correction of the implant position according to postOP position (blue) and (<b>D</b>). calculation of the true postOP canal fill ratio for comparison with the preOP ratio.</p>
Full article ">Figure 4
<p>(<b>A</b>) Testing protocol shows the loading cycles including the points of data analysis (a–g). (<b>B</b>) Experimental cyclic loading setups and the optical tracking points (green) for data analysis. (<b>C</b>) Evaluated tracking points during cyclic loading force (F) to analyze implant subsidence and tilt measurements between analysis points a and b, d or f, respectively, (s<sub>imlant</sub> and α<sub>imlant</sub>, Δab, Δad, and Δaf) at the end of each loading block. Bone micromotion (s<sub>BoneHW</sub>, Δbc, Δde, and Δfg) was evaluated as bone displacement within each final load cycle (hysteresis width (HW)). Total compressive transmission caused deformation of the bone was measured at the end of each loading block (s<sub>BoneTot</sub>, Δab, Δad, and Δaf).</p>
Full article ">Figure 5
<p>Boxplot of implant subsidence (<b>A</b>) and tilt (<b>B</b>) at the end of each cyclic loading block (220 N, 520 N, and 820 N) comparing short and standard stem implants, respectively, implanted with high and low filling ratios.</p>
Full article ">Figure 6
<p>Boxplots of total bone deformation (<b>A</b>) and bone micromotion (<b>B</b>) for each cyclic loading block (220 N, 520 N, and 820 N) comparing low- and high filling ratios to the biomechanical behavior of the native bone.</p>
Full article ">
20 pages, 4678 KiB  
Article
Deep Learning-Based Diagnosis Algorithm for Alzheimer’s Disease
by Zhenhao Jin, Junjie Gong, Minghui Deng, Piaoyi Zheng and Guiping Li
J. Imaging 2024, 10(12), 333; https://doi.org/10.3390/jimaging10120333 - 23 Dec 2024
Viewed by 571
Abstract
Alzheimer’s disease (AD), a degenerative condition affecting the central nervous system, has witnessed a notable rise in prevalence along with the increasing aging population. In recent years, the integration of cutting-edge medical imaging technologies with forefront theories in artificial intelligence has dramatically enhanced [...] Read more.
Alzheimer’s disease (AD), a degenerative condition affecting the central nervous system, has witnessed a notable rise in prevalence along with the increasing aging population. In recent years, the integration of cutting-edge medical imaging technologies with forefront theories in artificial intelligence has dramatically enhanced the efficiency of identifying and diagnosing brain diseases such as AD. This paper presents an innovative two-stage automatic auxiliary diagnosis algorithm for AD, based on an improved 3D DenseNet segmentation model and an improved MobileNetV3 classification model applied to brain MR images. In the segmentation network, the backbone network was simplified, the activation function and loss function were replaced, and the 3D GAM attention mechanism was introduced. In the classification network, firstly, the CA attention mechanism was added to enhance the model’s ability to capture positional information of disease features; secondly, dilated convolutions were introduced to extract richer features from the input feature maps; and finally, the fully connected layer of MobileNetV3 was modified and the idea of transfer learning was adopted to improve the model’s feature extraction capability. The results of the study showed that the proposed approach achieved classification accuracies of 97.85% for AD/NC, 95.31% for MCI/NC, 93.96% for AD/MCI, and 92.63% for AD/MCI/NC, respectively, which were 3.1, 2.8, 2.6, and 2.8 percentage points higher than before the improvement. Comparative and ablation experiments have validated the proposed classification performance of this method, demonstrating its capability to facilitate an accurate and efficient automated auxiliary diagnosis of AD, offering a deep learning-based solution for it. Full article
Show Figures

Figure 1

Figure 1
<p>Brain MR image preprocessing.</p>
Full article ">Figure 2
<p>DAGAN structure.</p>
Full article ">Figure 3
<p>The workflow flowchart of 3D GAM.</p>
Full article ">Figure 4
<p>Improved 3D DenseNet model structure.</p>
Full article ">Figure 5
<p>Improved MobileNetV3 structure.</p>
Full article ">Figure 6
<p>AD automatic auxiliary diagnosis algorithm based on improved MobileNetV3.</p>
Full article ">Figure 7
<p>Local magnification and comparison of segmentation slices.</p>
Full article ">Figure 8
<p>Comparison of segmentation results on ADNI dataset with different methods.</p>
Full article ">Figure 9
<p>AD/NC confusion matrices before and after model improvement.</p>
Full article ">Figure 10
<p>AD/NC ROC curve before and after model improvement.</p>
Full article ">Figure 11
<p>MCI/NC confusion matrices before and after model improvement.</p>
Full article ">Figure 12
<p>MCI/NC ROC curve before and after model improvement.</p>
Full article ">Figure 13
<p>AD/MCI confusion matrices before and after model improvement.</p>
Full article ">Figure 14
<p>AD/MCI ROC curve before and after model improvement.</p>
Full article ">Figure 15
<p>AD/MCI/NC confusion matrices before and after model improvement.</p>
Full article ">Figure 16
<p>AD/MCI/NC ROC curve before and after model improvement.</p>
Full article ">
26 pages, 21880 KiB  
Article
Explainable AI-Based Skin Cancer Detection Using CNN, Particle Swarm Optimization and Machine Learning
by Syed Adil Hussain Shah, Syed Taimoor Hussain Shah, Roa’a Khaled, Andrea Buccoliero, Syed Baqir Hussain Shah, Angelo Di Terlizzi, Giacomo Di Benedetto and Marco Agostino Deriu
J. Imaging 2024, 10(12), 332; https://doi.org/10.3390/jimaging10120332 - 22 Dec 2024
Viewed by 853
Abstract
Skin cancer is among the most prevalent cancers globally, emphasizing the need for early detection and accurate diagnosis to improve outcomes. Traditional diagnostic methods, based on visual examination, are subjective, time-intensive, and require specialized expertise. Current artificial intelligence (AI) approaches for skin cancer [...] Read more.
Skin cancer is among the most prevalent cancers globally, emphasizing the need for early detection and accurate diagnosis to improve outcomes. Traditional diagnostic methods, based on visual examination, are subjective, time-intensive, and require specialized expertise. Current artificial intelligence (AI) approaches for skin cancer detection face challenges such as computational inefficiency, lack of interpretability, and reliance on standalone CNN architectures. To address these limitations, this study proposes a comprehensive pipeline combining transfer learning, feature selection, and machine-learning algorithms to improve detection accuracy. Multiple pretrained CNN models were evaluated, with Xception emerging as the optimal choice for its balance of computational efficiency and performance. An ablation study further validated the effectiveness of freezing task-specific layers within the Xception architecture. Feature dimensionality was optimized using Particle Swarm Optimization, reducing dimensions from 1024 to 508, significantly enhancing computational efficiency. Machine-learning classifiers, including Subspace KNN and Medium Gaussian SVM, further improved classification accuracy. Evaluated on the ISIC 2018 and HAM10000 datasets, the proposed pipeline achieved impressive accuracies of 98.5% and 86.1%, respectively. Moreover, Explainable-AI (XAI) techniques, such as Grad-CAM, LIME, and Occlusion Sensitivity, enhanced interpretability. This approach provides a robust, efficient, and interpretable solution for automated skin cancer diagnosis in clinical applications. Full article
(This article belongs to the Special Issue Deep Learning in Image Analysis: Progress and Challenges)
Show Figures

Figure 1

Figure 1
<p>The complete pipeline of the proposed methodology.</p>
Full article ">Figure 2
<p>Resulting images of augmentation operation.</p>
Full article ">Figure 3
<p>Training and validation accuracy (top) and loss (bottom) curves over iterations during the training process of the proposed model. The gray bars indicate specific epochs of interest, highlighting regions where the training and validation metrics stabilized or showed notable changes. Additional training details, such as elapsed time, learning rate, and hardware resources, are provided for context.</p>
Full article ">Figure 4
<p>Confusion matrix of improved Xception network: (<b>A</b>) confusion matrix on validation dataset; (<b>B</b>) confusion matrix on testing dataset.</p>
Full article ">Figure 5
<p>Confusion matrix and ROC curve of Experiment 2 by Medium Gaussian SVM classifier: (<b>A</b>) confusion matrix and ROC curve on training dataset; (<b>B</b>) confusion matrix and ROC curve on testing dataset. Additionally, the dashed line in the ROC curve represents the reference line for random classification (AUC = 0.5).</p>
Full article ">Figure 6
<p>Confusion matrix and ROC curve of Experiment 3 by Ensemble Subspace KNN classifier: (<b>A</b>) confusion matrix and ROC curve on training dataset; (<b>B</b>) confusion matrix and ROC curve on testing dataset. Additionally, the dashed line in the ROC curve represents the reference line for random classification (AUC = 0.5).</p>
Full article ">Figure 7
<p>Confusion matrix and ROC curve for the Subspace KNN classifier on the HAM10000 dataset, showing classification performance with an AUC of 0.8785 for both benign and malignant classes. The confusion matrix highlights true positives, false positives, and misclassifications, while the ROC curve demonstrates the model’s discriminative ability. Additionally, the dashed line in the ROC curve represents the reference line for random classification (AUC = 0.5).</p>
Full article ">Figure 8
<p>Visualization of the proposed Xception-based pipeline applied to ISIC and HAM10000 datasets for skin cancer classification. Input images are classified as benign or malignant with confidence scores. Grad-CAM highlights critical regions, LIME provides pixel-level interpretations, and Occlusion Sensitivity validates predictions, enhancing model transparency for clinical applications. Additionally, the color legend bars indicate the intensity of contribution, with “min” and “max” representing low to high importance, enhancing the model’s transparency and interpretability for clinical applications.</p>
Full article ">
13 pages, 6526 KiB  
Article
Towards Robust Supervised Pectoral Muscle Segmentation in Mammography Images
by Parvaneh Aliniya, Mircea Nicolescu, Monica Nicolescu and George Bebis
J. Imaging 2024, 10(12), 331; https://doi.org/10.3390/jimaging10120331 - 22 Dec 2024
Viewed by 510
Abstract
Mammography images are the most commonly used tool for breast cancer screening. The presence of pectoral muscle in images for the mediolateral oblique view makes designing a robust automated breast cancer detection system more challenging. Most of the current methods for removing the [...] Read more.
Mammography images are the most commonly used tool for breast cancer screening. The presence of pectoral muscle in images for the mediolateral oblique view makes designing a robust automated breast cancer detection system more challenging. Most of the current methods for removing the pectoral muscle are based on traditional machine learning approaches. This is partly due to the lack of segmentation masks of pectoral muscle in available datasets. In this paper, we provide the segmentation masks of the pectoral muscle for the INbreast, MIAS, and CBIS-DDSM datasets, which will enable the development of supervised methods and the utilization of deep learning. Training deep learning-based models using segmentation masks will also be a powerful tool for removing pectoral muscle for unseen data. To test the validity of this idea, we trained AU-Net separately on the INbreast and CBIS-DDSM for the segmentation of the pectoral muscle. We used cross-dataset testing to evaluate the performance of the models on an unseen dataset. In addition, the models were tested on all of the images in the MIAS dataset. The experimental results show that cross-dataset testing achieves a comparable performance to the same-dataset experiments. Full article
Show Figures

Figure 1

Figure 1
<p>Examples of the performance of the proposed method for the same- and cross-dataset tests for INbreast and CBIS-DDSM datasets. Each row from (<b>a</b>–<b>f</b>) presents two examples from INbreast and CBIS-DDSM datasets. The green and blue colors present boundaries for the ground truth and predicted segmentation.</p>
Full article ">Figure 2
<p>Examples of the performance of the proposed method for cross-dataset tests for the MIAS dataset as the test set. Each row (<b>a</b>–<b>d</b>) presents results for one sample in the MIAS dataset. The green and blue colors present boundaries for the ground truth and predicted segmentation.</p>
Full article ">Figure 3
<p>Examples of the performance of the proposed method compared to the method proposed in [<a href="#B7-jimaging-10-00331" class="html-bibr">7</a>] for the MIAS dataset. The names of the samples in the dataset are mentioned in (<b>a</b>–<b>c</b>).</p>
Full article ">
18 pages, 36094 KiB  
Article
Arbitrary Optics for Gaussian Splatting Using Space Warping
by Jakob Nazarenus, Simin Kou, Fang-Lue Zhang and Reinhard Koch
J. Imaging 2024, 10(12), 330; https://doi.org/10.3390/jimaging10120330 - 22 Dec 2024
Viewed by 557
Abstract
Due to recent advances in 3D reconstruction from RGB images, it is now possible to create photorealistic representations of real-world scenes that only require minutes to be reconstructed and can be rendered in real time. In particular, 3D Gaussian splatting shows promising results, [...] Read more.
Due to recent advances in 3D reconstruction from RGB images, it is now possible to create photorealistic representations of real-world scenes that only require minutes to be reconstructed and can be rendered in real time. In particular, 3D Gaussian splatting shows promising results, outperforming preceding reconstruction methods while simultaneously reducing the overall computational requirements. The main success of 3D Gaussian splatting relies on the efficient use of a differentiable rasterizer to render the Gaussian scene representation. One major drawback of this method is its underlying pinhole camera model. In this paper, we propose an extension of the existing method that removes this constraint and enables scene reconstructions using arbitrary camera optics such as highly distorting fisheye lenses. Our method achieves this by applying a differentiable warping function to the Gaussian scene representation. Additionally, we reduce overfitting in outdoor scenes by utilizing a learnable skybox, reducing the presence of floating artifacts within the reconstructed scene. Based on synthetic and real-world image datasets, we show that our method is capable of creating an accurate scene reconstruction from highly distorted images and rendering photorealistic images from such reconstructions. Full article
(This article belongs to the Special Issue Geometry Reconstruction from Images (2nd Edition))
Show Figures

Figure 1

Figure 1
<p>Pipeline of our proposed method. Before being forwarded to the pinhole Gaussian rasterizer, we apply a space-warping module to the position, rotation, and scale to emulate the distortion of a lens specified by the camera’s intrinsics.</p>
Full article ">Figure 2
<p>Distortion of scale and rotation. The four images show the steps in the distortion pipeline from left to right. An undistorted Gaussian (<b>a</b>) is non-linearly distorted (<b>b</b>). This distortion is linearly approximated using Jacobian <math display="inline"><semantics> <msub> <mi>J</mi> <mi mathvariant="script">W</mi> </msub> </semantics></math> (<b>c</b>), with a subsequent orthogonalization of the axes (<b>d</b>). For (<b>c</b>,<b>d</b>), the gray area shows the true distorted Gaussian to visualize the approximation error.</p>
Full article ">Figure 3
<p>Reconstruction results for the synthetic <span class="html-italic">Classroom</span> scene.</p>
Full article ">Figure 4
<p>Results of our proposed method on synthetic Blender scenes (<span class="html-italic">Archiviz</span>, <span class="html-italic">Barbershop</span>, and <span class="html-italic">Classroom</span>). Red rectangles indicate areas in which our method produced reconstruction artifacts. Zoom in for details.</p>
Full article ">Figure 5
<p>Results of our proposed method on synthetic Blender scenes (<span class="html-italic">Monk</span>, <span class="html-italic">Pabellon</span>, and <span class="html-italic">Sky</span>). Red rectangles indicate areas in which our method produced reconstruction artifacts. Zoom in for details.</p>
Full article ">Figure 6
<p>Results of our proposed method and Fisheye-GS on ScanNet++ scenes (<span class="html-italic">Bedroom</span>, <span class="html-italic">Kitchen</span>, and <span class="html-italic">Office Day</span>). Zoom in for details.</p>
Full article ">Figure 7
<p>Results of our proposed method and Fisheye-GS on ScanNet++ scenes (<span class="html-italic">Office Night</span>, <span class="html-italic">Tool Room</span>, and <span class="html-italic">Utility Room</span>). Zoom in for details.</p>
Full article ">Figure 8
<p>Renderings of a cube with Gaussians along the edges. The left rendering has the scale and rotation adjusted according to the Jacobian; for the right rendering, scale and rotation were left unmodified.</p>
Full article ">Figure 9
<p>Evaluation metrics for the <span class="html-italic">Utility Room</span> scene for varying degrees of the polynomial polar distortion function.</p>
Full article ">Figure 10
<p>Results for our proposed model trained on synthetic data with the learned skybox enabled (middle) and disabled (right).</p>
Full article ">Figure A1
<p>Results for three validation views optimized on our synthetic orthographic dataset.</p>
Full article ">Figure A2
<p>Results for the five additional real-world scenes from the ScanNet++ dataset.</p>
Full article ">
23 pages, 7813 KiB  
Article
The Use of Hybrid CNN-RNN Deep Learning Models to Discriminate Tumor Tissue in Dynamic Breast Thermography
by Andrés Munguía-Siu, Irene Vergara and Juan Horacio Espinoza-Rodríguez
J. Imaging 2024, 10(12), 329; https://doi.org/10.3390/jimaging10120329 - 21 Dec 2024
Viewed by 887
Abstract
Breast cancer is one of the leading causes of death for women worldwide, and early detection can help reduce the death rate. Infrared thermography has gained popularity as a non-invasive and rapid method for detecting this pathology and can be further enhanced by [...] Read more.
Breast cancer is one of the leading causes of death for women worldwide, and early detection can help reduce the death rate. Infrared thermography has gained popularity as a non-invasive and rapid method for detecting this pathology and can be further enhanced by applying neural networks to extract spatial and even temporal data derived from breast thermographic images if they are acquired sequentially. In this study, we evaluated hybrid convolutional-recurrent neural network (CNN-RNN) models based on five state-of-the-art pre-trained CNN architectures coupled with three RNNs to discern tumor abnormalities in dynamic breast thermographic images. The hybrid architecture that achieved the best performance for detecting breast cancer was VGG16-LSTM, which showed accuracy (ACC), sensitivity (SENS), and specificity (SPEC) of 95.72%, 92.76%, and 98.68%, respectively, with a CPU runtime of 3.9 s. However, the hybrid architecture that showed the fastest CPU runtime was AlexNet-RNN with 0.61 s, although with lower performance (ACC: 80.59%, SENS: 68.52%, SPEC: 92.76%), but still superior to AlexNet (ACC: 69.41%, SENS: 52.63%, SPEC: 86.18%) with 0.44 s. Our findings show that hybrid CNN-RNN models outperform stand-alone CNN models, indicating that temporal data recovery from dynamic breast thermographs is possible without significantly compromising classifier runtime. Full article
Show Figures

Figure 1

Figure 1
<p>Diagram of the proposed methodology for binary breast cancer classification using hybrid CNN-RNN-based deep learning models.</p>
Full article ">Figure 2
<p>Sample grayscale thermograms from volunteers for a breast study: (<b>a</b>) The image is clear, so it is selected; (<b>b</b>) The image is blurry, so it is not selected; (<b>c</b>) The image contains material (bandaged breast) that covers the study region, so it is not selected.</p>
Full article ">Figure 3
<p>U-Net architecture. The contracting path is on the left side of the U-shape, and the expanding path is on the right. The blue boxes represent multi-channel feature maps. The number of channels is indicated on the top of the box. The x-y size is shown at the bottom left edge of the box. An orange arrow indicated each operation.</p>
Full article ">Figure 4
<p>Example of a grayscale thermogram of the volunteer with ID 28: (<b>a</b>) selected image by data cleansing; (<b>b</b>) thermal image segmented using U-Net; (<b>c</b>) data augmentation using the transformations of horizontal flip, rotation 15°, rotation 30°, and zoom 15%.</p>
Full article ">Figure 5
<p>CNN model for the binary classification of breast tissue heterogeneity (normal or abnormal) in thermographic images.</p>
Full article ">Figure 6
<p>Performance evaluation (accuracy, sensitivity, and specificity) of the different hybrid CNN-RNN architectures to classify the presence or absence of a tumor in breast thermographic images: (<b>a</b>) The independent CNN model; (<b>b</b>) The hybrid CNN-RNN model; (<b>c</b>) The hybrid CNN-LSTM model; (<b>d</b>) The hybrid CNN-GRU model. Inception-V3, VGG16, ResNet101, GoogLeNet, and AlexNet are the five CNN models that are coupled to the three sequential networks (RNN, LSTM, and GRU).</p>
Full article ">Figure 7
<p>CPU execution time of different coupled CNN-RNN deep learning architectures for breast cancer classification in images acquired using the DIT acquisition protocol.</p>
Full article ">Figure A1
<p>Validation accuracy of InceptionV3 using (<b>a</b>) single CNN; (<b>b</b>) coupled RNN; (<b>c</b>) coupled LSTM; and (<b>d</b>) coupled GRU.</p>
Full article ">Figure A2
<p>Validation accuracy of VGG16 using (<b>a</b>) single CNN; (<b>b</b>) coupled RNN; (<b>c</b>) coupled LSTM; and (<b>d</b>) coupled GRU.</p>
Full article ">Figure A3
<p>Validation accuracy of ResNet101 using (<b>a</b>) single CNN; (<b>b</b>) coupled RNN; (<b>c</b>) coupled LSTM; and (<b>d</b>) coupled GRU.</p>
Full article ">Figure A4
<p>Validation accuracy of AlexNet using (<b>a</b>) single CNN; (<b>b</b>) coupled RNN; (<b>c</b>) coupled LSTM; and (<b>d</b>) coupled GRU.</p>
Full article ">Figure A5
<p>Validation accuracy of GoogLeNet using (<b>a</b>) single CNN; (<b>b</b>) coupled RNN; (<b>c</b>) coupled LSTM; and (<b>d</b>) coupled GRU.</p>
Full article ">Figure A6
<p>Confusion matrix of InceptionV3 using (<b>a</b>) single CNN; (<b>b</b>) coupled RNN; (<b>c</b>) coupled LSTM; and (<b>d</b>) coupled GRU.</p>
Full article ">Figure A7
<p>Confusion matrix of VGG16 using (<b>a</b>) single CNN; (<b>b</b>) coupled RNN; (<b>c</b>) coupled LSTM; and (<b>d</b>) coupled GRU.</p>
Full article ">Figure A8
<p>Confusion matrix of ResNet101 using (<b>a</b>) single CNN; (<b>b</b>) coupled RNN; (<b>c</b>) coupled LSTM; and (<b>d</b>) coupled GRU.</p>
Full article ">Figure A9
<p>Confusion matrix of GoogLeNet using (<b>a</b>) single CNN; (<b>b</b>) coupled RNN; (<b>c</b>) coupled LSTM; and (<b>d</b>) coupled GRU.</p>
Full article ">Figure A10
<p>Confusion matrix of AlexNet using (<b>a</b>) single CNN; (<b>b</b>) coupled RNN; (<b>c</b>) coupled LSTM; and (<b>d</b>) coupled GRU.</p>
Full article ">
15 pages, 11038 KiB  
Article
X-Ray Image-Based Real-Time COVID-19 Diagnosis Using Deep Neural Networks (CXR-DNNs)
by Ali Yousuf Khan, Miguel-Angel Luque-Nieto, Muhammad Imran Saleem and Enrique Nava-Baro
J. Imaging 2024, 10(12), 328; https://doi.org/10.3390/jimaging10120328 - 19 Dec 2024
Viewed by 752
Abstract
On 11 February 2020, the prevalent outbreak of COVID-19, a coronavirus illness, was declared a global pandemic. Since then, nearly seven million people have died and over 765 million confirmed cases of COVID-19 have been reported. The goal of this study is to [...] Read more.
On 11 February 2020, the prevalent outbreak of COVID-19, a coronavirus illness, was declared a global pandemic. Since then, nearly seven million people have died and over 765 million confirmed cases of COVID-19 have been reported. The goal of this study is to develop a diagnostic tool for detecting COVID-19 infections more efficiently. Currently, the most widely used method is Reverse Transcription Polymerase Chain Reaction (RT-PCR), a clinical technique for infection identification. However, RT-PCR is expensive, has limited sensitivity, and requires specialized medical expertise. One of the major challenges in the rapid diagnosis of COVID-19 is the need for reliable imaging, particularly X-ray imaging. This work takes advantage of artificial intelligence (AI) techniques to enhance diagnostic accuracy by automating the detection of COVID-19 infections from chest X-ray (CXR) images. We obtained and analyzed CXR images from the Kaggle public database (4035 images in total), including cases of COVID-19, viral pneumonia, pulmonary opacity, and healthy controls. By integrating advanced techniques with transfer learning from pre-trained convolutional neural networks (CNNs), specifically InceptionV3, ResNet50, and Xception, we achieved an accuracy of 95%, significantly higher than the 85.5% achieved with ResNet50 alone. Additionally, our proposed method, CXR-DNNs, can accurately distinguish between three different types of chest X-ray images for the first time. This computer-assisted diagnostic tool has the potential to significantly enhance the speed and accuracy of COVID-19 diagnoses. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

Figure 1
<p>Block diagram of CXR-DNN used for screening COVID-19.</p>
Full article ">Figure 2
<p>Proposed EfficientNetB7 architecture.</p>
Full article ">Figure 3
<p>CXR images of lungs in patients: (<b>a</b>) healthy, (<b>b</b>) COVID-19, and (<b>c</b>) pneumonia.</p>
Full article ">Figure 4
<p>3 × 3 confusion matrices for the (<b>a</b>) Training, (<b>b</b>) Validation, and (<b>c</b>) Testing datasets representing the model’s performance in true positives, false positives, true negatives, and false negatives for each class (COVID-19, Normal, and Pneumonia).</p>
Full article ">Figure 5
<p>Accuracy and loss per epoch for each dataset, illustrating the model’s performance and learning progression: (<b>a</b>) Training dataset, (<b>b</b>) Validation dataset, (<b>c</b>) Testing dataset. In the accuracy graphs, the blue curve represents accuracy, and the orange curve represents loss. In the loss graphs, the blue curve represents loss, and the orange curve represents accuracy, showing their variation over the epochs.</p>
Full article ">Figure 5 Cont.
<p>Accuracy and loss per epoch for each dataset, illustrating the model’s performance and learning progression: (<b>a</b>) Training dataset, (<b>b</b>) Validation dataset, (<b>c</b>) Testing dataset. In the accuracy graphs, the blue curve represents accuracy, and the orange curve represents loss. In the loss graphs, the blue curve represents loss, and the orange curve represents accuracy, showing their variation over the epochs.</p>
Full article ">Figure 6
<p>Convergence for precision, Recall and F1-score in every dataset used: (<b>a</b>) training, (<b>b</b>) validation, (<b>c</b>) testing. Classes: 1—COVID-19, 2—normal, 3—pneumonia.</p>
Full article ">Figure 6 Cont.
<p>Convergence for precision, Recall and F1-score in every dataset used: (<b>a</b>) training, (<b>b</b>) validation, (<b>c</b>) testing. Classes: 1—COVID-19, 2—normal, 3—pneumonia.</p>
Full article ">Figure 7
<p>Vision sample COVID-19 CXR picture with transformer attention map (layer 1).</p>
Full article ">
18 pages, 2563 KiB  
Article
Optimization of Cocoa Pods Maturity Classification Using Stacking and Voting with Ensemble Learning Methods in RGB and LAB Spaces
by Kacoutchy Jean Ayikpa, Abou Bakary Ballo, Diarra Mamadou and Pierre Gouton
J. Imaging 2024, 10(12), 327; https://doi.org/10.3390/jimaging10120327 - 18 Dec 2024
Viewed by 685
Abstract
Determining the maturity of cocoa pods early is not just about guaranteeing harvest quality and optimizing yield. It is also about efficient resource management. Rapid identification of the stage of maturity helps avoid losses linked to a premature or late harvest, improving productivity. [...] Read more.
Determining the maturity of cocoa pods early is not just about guaranteeing harvest quality and optimizing yield. It is also about efficient resource management. Rapid identification of the stage of maturity helps avoid losses linked to a premature or late harvest, improving productivity. Early determination of cocoa pod maturity ensures both the quality and quantity of the harvest, as immature or overripe pods cannot produce premium cocoa beans. Our innovative research harnesses artificial intelligence and computer vision technologies to revolutionize the cocoa industry, offering precise and advanced tools for accurately assessing cocoa pod maturity. Providing an objective and rapid assessment enables farmers to make informed decisions about the optimal time to harvest, helping to maximize the yield of their plantations. Furthermore, by automating this process, these technologies reduce the margins for human error and improve the management of agricultural resources. With this in mind, our study proposes to exploit a computer vision method based on the GLCM (gray level co-occurrence matrix) algorithm to extract the characteristics of images in the RGB (red, green, blue) and LAB (luminance, axis between red and green, axis between yellow and blue) color spaces. This approach allows for in-depth image analysis, which is essential for capturing the nuances of cocoa pod maturity. Next, we apply classification algorithms to identify the best performers. These algorithms are then combined via stacking and voting techniques, allowing our model to be optimized by taking advantage of the strengths of each method, thus guaranteeing more robust and precise results. The results demonstrated that the combination of algorithms produced superior performance, especially in the LAB color space, where voting scored 98.49% and stacking 98.71%. In comparison, in the RGB color space, voting scored 96.59% and stacking 97.06%. These results surpass those generally reported in the literature, showing the increased effectiveness of combined approaches in improving the accuracy of classification models. This highlights the importance of exploring ensemble techniques to maximize performance in complex contexts such as cocoa pod maturity classification. Full article
(This article belongs to the Special Issue Imaging Applications in Agriculture)
Show Figures

Figure 1

Figure 1
<p>Diagram representing the voting process.</p>
Full article ">Figure 2
<p>Illustration of the stacking process of the algorithms in our study.</p>
Full article ">Figure 3
<p>The overall architecture of our method.</p>
Full article ">Figure 4
<p>Histogram of model performance comparison (accuracy) in the RGB space.</p>
Full article ">Figure 5
<p>Confusion matrix of the best-performing models in the RGB color space.</p>
Full article ">Figure 6
<p>Histogram of model performance comparison (accuracy) in the LAB space.</p>
Full article ">Figure 7
<p>Confusion matrix of the best-performing models in the LAB color space.</p>
Full article ">Figure 8
<p>Histogram of algorithm performance in RGB and LAB color spaces.</p>
Full article ">
38 pages, 3841 KiB  
Review
Computer Vision-Based Gait Recognition on the Edge: A Survey on Feature Representations, Models, and Architectures
by Edwin Salcedo
J. Imaging 2024, 10(12), 326; https://doi.org/10.3390/jimaging10120326 - 18 Dec 2024
Viewed by 1616
Abstract
Computer vision-based gait recognition (CVGR) is a technology that has gained considerable attention in recent years due to its non-invasive, unobtrusive, and difficult-to-conceal nature. Beyond its applications in biometrics, CVGR holds significant potential for healthcare and human–computer interaction. Current CVGR systems often transmit [...] Read more.
Computer vision-based gait recognition (CVGR) is a technology that has gained considerable attention in recent years due to its non-invasive, unobtrusive, and difficult-to-conceal nature. Beyond its applications in biometrics, CVGR holds significant potential for healthcare and human–computer interaction. Current CVGR systems often transmit collected data to a cloud server for machine learning-based gait pattern recognition. While effective, this cloud-centric approach can result in increased system response times. Alternatively, the emerging paradigm of edge computing, which involves moving computational processes to local devices, offers the potential to reduce latency, enable real-time surveillance, and eliminate reliance on internet connectivity. Furthermore, recent advancements in low-cost, compact microcomputers capable of handling complex inference tasks (e.g., Jetson Nano Orin, Jetson Xavier NX, and Khadas VIM4) have created exciting opportunities for deploying CVGR systems at the edge. This paper reports the state of the art in gait data acquisition modalities, feature representations, models, and architectures for CVGR systems suitable for edge computing. Additionally, this paper addresses the general limitations and highlights new avenues for future research in the promising intersection of CVGR and edge computing. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
Show Figures

Figure 1

Figure 1
<p>A comparative analysis illustrating the growing preference for DL architectures. The illustrations summarise the findings from the papers reviewed in this survey.</p>
Full article ">Figure 2
<p>General structure of this survey paper and our proposed taxonomy of the existing technologies that facilitate on-device deployment of CVGR systems for real-time recognition.</p>
Full article ">Figure 3
<p>Broad perspective on gait feature representations.</p>
Full article ">Figure 4
<p>CVGR systems based on handcrafted representations typically employ one of two approaches: the systems extract silhouettes from 2D images (model-free) or rely on human body models (model-based). The video sample shown in the figure comes from the CASIA-B dataset [<a href="#B23-jimaging-10-00326" class="html-bibr">23</a>].</p>
Full article ">Figure 5
<p>DL-based end-to-end gait recognition scheme for CVGR systems. The sample with the walking subjects shown in the figure comes from the Penn–Fudan dataset [<a href="#B94-jimaging-10-00326" class="html-bibr">94</a>].</p>
Full article ">Figure 6
<p>Graphical depictions of various edge-oriented inference architectures.</p>
Full article ">Figure 7
<p>A large-scale scalable framework to support gait recognition computations in a distributed manner. This framework would incorporate multiple nodes and an edge server to handle data acquisition, detection, segmentation, and classification, enabling more feasible real-time computation. The video sample shown in the figure comes from the CASIA-A dataset [<a href="#B24-jimaging-10-00326" class="html-bibr">24</a>].</p>
Full article ">
14 pages, 1855 KiB  
Article
Point-Cloud Instance Segmentation for Spinning Laser Sensors
by Alvaro Casado-Coscolla, Carlos Sanchez-Belenguer, Erik Wolfart and Vitor Sequeira
J. Imaging 2024, 10(12), 325; https://doi.org/10.3390/jimaging10120325 - 17 Dec 2024
Viewed by 602
Abstract
In this paper, we face the point-cloud segmentation problem for spinning laser sensors from a deep-learning (DL) perspective. Since the sensors natively provide their measurements in a 2D grid, we directly use state-of-the-art models designed for visual information for the segmentation task and [...] Read more.
In this paper, we face the point-cloud segmentation problem for spinning laser sensors from a deep-learning (DL) perspective. Since the sensors natively provide their measurements in a 2D grid, we directly use state-of-the-art models designed for visual information for the segmentation task and then exploit the range information to ensure 3D accuracy. This allows us to effectively address the main challenges of applying DL techniques to point clouds, i.e., lack of structure and increased dimensionality. To the best of our knowledge, this is the first work that faces the 3D segmentation problem from a 2D perspective without explicitly re-projecting 3D point clouds. Moreover, our approach exploits multiple channels available in modern sensors, i.e., range, reflectivity, and ambient illumination. We also introduce a novel data-mining pipeline that enables the annotation of 3D scans without human intervention. Together with this paper, we present a new public dataset with all the data collected for training and evaluating our approach, where point clouds preserve their native sensor structure and where every single measurement contains range, reflectivity, and ambient information, together with its associated 3D point. As experimental results show, our approach achieves state-of-the-art results both in terms of performance and inference time. Additionally, we provide a novel ablation test that analyses the individual and combined contributions of the different channels provided by modern laser sensors. Full article
Show Figures

Figure 1

Figure 1
<p>Ouster data used in this paper. (<b>left</b>) Structured view in the projective space (1024 × 128 pixels). From top to bottom: range, reflectivity, and ambient channels. (<b>right</b>) Partial view of the associated point cloud in the 3D Cartesian space.</p>
Full article ">Figure 2
<p>Background removal and foreground clustering. (<b>a</b>) Partial slice of a scan (range channel). (<b>b</b>) Segmentation mask after voxel filtering. (<b>c</b>) Resulting mask after the <span class="html-italic">shrink</span> operation (seeds for the flooding algorithm). (<b>d</b>) Masks of the resulting clusters, with their associated labels. (<b>e</b>) 3D view of the clusters with their 3D bounding boxes.</p>
Full article ">Figure 3
<p>Data remapping from the projective space to the CNN space. All three channels (range, reflectivity, and ambient) are split into four overlapping segments and stacked together to comply with the expected input tensor size (640 × 640 × 3). Segmentation masks and bounding boxes are split accordingly. Only the reflectivity channel is represented in this figure.</p>
Full article ">Figure 4
<p>Segmentation masks fusion with range information. (<b>a</b>) Raw mask of the yellow person in (<b>b</b>), as inferred by the model. (<b>b</b>) Direct projection of the raw masks to the 3D data. Notice the artifacts around the edges. (<b>c</b>) Largest cluster after the flooding algorithm over a), i.e., the final segmentation mask, with its associated raw mask on the back. (<b>d</b>) Results after projecting the final segmentation masks into the 3D data.</p>
Full article ">Figure 5
<p>Ablation test results. CNN performance without post-processing for each CNN size nano, small, medium, large, extra-large) and each possible combination of input channels, (ambient, depth, reflectivity, A+D, A+R, D+R, A+D+R).</p>
Full article ">Figure 6
<p>Full pipeline performance, using the Medium (M) CNN and with all input channels (A+D+R). PR curves and comparison with other techniques when predicting 3D bounding boxes with IoU thresholds of <math display="inline"><semantics> <mrow> <mn>50</mn> <mo>%</mo> </mrow> </semantics></math> (first plot) and <math display="inline"><semantics> <mrow> <mn>75</mn> <mo>%</mo> </mrow> </semantics></math> (second plot). PR curves for different IoU thresholds when predicting 3D bounding boxes (third plot) and segmentation masks (fourth plot).</p>
Full article ">
12 pages, 2922 KiB  
Article
Exploiting 2D Neural Network Frameworks for 3D Segmentation Through Depth Map Analytics of Harvested Wild Blueberries (Vaccinium angustifolium Ait.)
by Connor C. Mullins, Travis J. Esau, Qamar U. Zaman, Ahmad A. Al-Mallahi and Aitazaz A. Farooque
J. Imaging 2024, 10(12), 324; https://doi.org/10.3390/jimaging10120324 - 15 Dec 2024
Viewed by 837
Abstract
This study introduced a novel approach to 3D image segmentation utilizing a neural network framework applied to 2D depth map imagery, with Z axis values visualized through color gradation. This research involved comprehensive data collection from mechanically harvested wild blueberries to populate 3D [...] Read more.
This study introduced a novel approach to 3D image segmentation utilizing a neural network framework applied to 2D depth map imagery, with Z axis values visualized through color gradation. This research involved comprehensive data collection from mechanically harvested wild blueberries to populate 3D and red–green–blue (RGB) images of filled totes through time-of-flight and RGB cameras, respectively. Advanced neural network models from the YOLOv8 and Detectron2 frameworks were assessed for their segmentation capabilities. Notably, the YOLOv8 models, particularly YOLOv8n-seg, demonstrated superior processing efficiency, with an average time of 18.10 ms, significantly faster than the Detectron2 models, which exceeded 57 ms, while maintaining high performance with a mean intersection over union (IoU) of 0.944 and a Matthew’s correlation coefficient (MCC) of 0.957. A qualitative comparison of segmentation masks indicated that the YOLO models produced smoother and more accurate object boundaries, whereas Detectron2 showed jagged edges and under-segmentation. Statistical analyses, including ANOVA and Tukey’s HSD test (α = 0.05), confirmed the superior segmentation performance of models on depth maps over RGB images (p < 0.001). This study concludes by recommending the YOLOv8n-seg model for real-time 3D segmentation in precision agriculture, providing insights that can enhance volume estimation, yield prediction, and resource management practices. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

Figure 1
<p>Example of wild blueberries (<span class="html-italic">Vaccinium angustifolium</span> Ait.) at time of harvest, illustrating the irregular clustering.</p>
Full article ">Figure 2
<p>Visual demonstration of conversion from point cloud to depth map using the jet colormap as <span class="html-italic">Z</span> axis representation in mm, where the background color of the depth map was set to blue.</p>
Full article ">Figure 3
<p>Dual camera mount setup for data collection with Basler Blaze-101 (67° by 51° in the X and Y axes, respectively) and Lucid Vision Labs Triton (60° by 46° in the X and Y axes, respectively).</p>
Full article ">Figure 4
<p>Visualization of segmentation mask correctness of YOLO masks for ToF 3D camera and 2D RGB camera, with true positive as green, true negative as blue, false positive as red, and false negative as orange.</p>
Full article ">Figure 5
<p>Visualization of segmentation mask correctness of Detectron2 masks for ToF 3D and 2D RGB cameras, with true positive as green, true negative as blue, false positive as red, and false negative as orange.</p>
Full article ">Figure 6
<p>Sample confusion matrices of Detectron2 R50 with FPN and YOLOv8n-seg on the testing dataset of the depth image dataset.</p>
Full article ">Figure 7
<p>Sample confusion matrices of Detectron2 R50 with FPN and YOLOv8n-seg on the testing dataset of the RGB image dataset.</p>
Full article ">
13 pages, 3641 KiB  
Review
Current Role of CT Pulmonary Angiography in Pulmonary Embolism: A State-of-the-Art Review
by Ignacio Diaz-Lorenzo, Alberto Alonso-Burgos, Alfonsa Friera Reyes, Ruben Eduardo Pacios Blanco, Maria del Carmen de Benavides Bernaldo de Quiros and Guillermo Gallardo Madueño
J. Imaging 2024, 10(12), 323; https://doi.org/10.3390/jimaging10120323 - 15 Dec 2024
Cited by 1 | Viewed by 1031
Abstract
The purpose of this study is to conduct a literature review on the current role of computed tomography pulmonary angiography (CTPA) in the diagnosis and prognosis of pulmonary embolism (PE). It addresses key topics such as the quantification of the thrombotic burden, its [...] Read more.
The purpose of this study is to conduct a literature review on the current role of computed tomography pulmonary angiography (CTPA) in the diagnosis and prognosis of pulmonary embolism (PE). It addresses key topics such as the quantification of the thrombotic burden, its role as a predictor of mortality, new diagnostic techniques that are available, the possibility of analyzing the thrombus composition to differentiate its evolutionary stage, and the applicability of artificial intelligence (AI) in PE through CTPA. The only finding from CTPA that has been validated as a prognostic factor so far is the right ventricle/left ventricle (RV/LV) diameter ratio being >1, which is associated with a 2.5-fold higher risk of all-cause mortality or adverse events, and a 5-fold higher risk of PE-related mortality. The increasing use of techniques such as dual-energy computed tomography allows for the more accurate diagnosis of perfusion defects, which may go undetected in conventional computed tomography, identifying up to 92% of these defects compared to 78% being detected by CTPA. Additionally, it is essential to explore the latest advances in the application of AI to CTPA, which are currently expanding and have demonstrated a 23% improvement in the detection of subsegmental emboli compared to manual interpretation. With deep image analysis, up to a 95% accuracy has been achieved in predicting PE severity based on the thrombus volume and perfusion deficits. These advancements over the past 10 years significantly contribute to early intervention strategies and, therefore, to the improvement of morbidity and mortality outcomes for these patients. Full article
(This article belongs to the Special Issue Tools and Techniques for Improving Radiological Imaging Applications)
Show Figures

Figure 1

Figure 1
<p>Fifty-six-year-old woman diagnosed with acute pulmonary thromboembolism, by axial CT angiography. (<b>A</b>). Axial RV/LV diameter ratio &gt; 1 measured at the base of both ventricles (black arrows). (<b>B</b>). Filling defects in both main pulmonary arteries (*), with a saddle thrombus.</p>
Full article ">Figure 2
<p>Eighty-nine-year-old woman diagnosed with chronic pulmonary thromboembolism. (<b>A</b>) Axial CT angiography (maximum intensity projection—MIP—reconstruction) showing severe narrowing in the superior segmental artery of the left lower lobe (white arrow) as sequela of PE. (<b>B</b>) Fusion image of CT angiography and color-coded iodine density showing wedge-shaped perfusion defects (*) in the middle lobe, lingula, and left lower lobe, with the latter corresponding to the findings in image (<b>A</b>). (<b>C</b>) SPECT-CT fusion image showing wedge-shaped perfusion defects (*) similar to those obtained with dual-energy CT (<b>B</b>).</p>
Full article ">
22 pages, 838 KiB  
Article
MediScan: A Framework of U-Health and Prognostic AI Assessment on Medical Imaging
by Sibtain Syed, Rehan Ahmed, Arshad Iqbal, Naveed Ahmad and Mohammed Ali Alshara
J. Imaging 2024, 10(12), 322; https://doi.org/10.3390/jimaging10120322 - 13 Dec 2024
Viewed by 1265
Abstract
With technological advancements, remarkable progress has been made with the convergence of health sciences and Artificial Intelligence (AI). Modern health systems are proposed to ease patient diagnostics. However, the challenge is to provide AI-based precautions to patients and doctors for more accurate risk [...] Read more.
With technological advancements, remarkable progress has been made with the convergence of health sciences and Artificial Intelligence (AI). Modern health systems are proposed to ease patient diagnostics. However, the challenge is to provide AI-based precautions to patients and doctors for more accurate risk assessment. The proposed healthcare system aims to integrate patients, doctors, laboratories, pharmacies, and administrative personnel use cases and their primary functions onto a single platform. The proposed framework can also process microscopic images, CT scans, X-rays, and MRI to classify malignancy and give doctors a set of AI precautions for patient risk assessment. The proposed framework incorporates various DCNN models for identifying different forms of tumors and fractures in the human body i.e., brain, bones, lungs, kidneys, and skin, and generating precautions with the help of the Fined-Tuned Large Language Model (LLM) i.e., Generative Pretrained Transformer 4 (GPT-4). With enough training data, DCNN can learn highly representative, data-driven, hierarchical image features. The GPT-4 model is selected for generating precautions due to its explanation, reasoning, memory, and accuracy on prior medical assessments and research studies. Classification models are evaluated by classification report (i.e., Recall, Precision, F1 Score, Support, Accuracy, and Macro and Weighted Average) and confusion matrix and have shown robust performance compared to the conventional schemes. Full article
Show Figures

Figure 1

Figure 1
<p>Graphical scheme of the system architecture.</p>
Full article ">Figure 2
<p>Graphical scheme of use cases in the proposed framework.</p>
Full article ">Figure 3
<p>Graphical illustration of proposed AI bones fracture detection model.</p>
Full article ">Figure 4
<p>Graphical illustration of the proposed AI lung cancer detection model.</p>
Full article ">Figure 5
<p>Graphical illustration of the proposed AI brain tumor detection model.</p>
Full article ">Figure 6
<p>Graphical illustration of the proposed AI skin cancer detection model.</p>
Full article ">Figure 7
<p>Graphical illustration of the proposed AI kidney malignancy detection model.</p>
Full article ">Figure 8
<p>Graphical illustration of the proposed GPT-4 model system integration.</p>
Full article ">Figure 9
<p>Graphical illustration of the confusion matrix for Bone Fracture recognition; Lung Tumor recognition; Brain Tumor detection; Skin Lesion identification; and Renal Malignancy recognition AI model.</p>
Full article ">Figure 10
<p>Graphical illustration of accuracy graph for Bone Fracture recognition; Lung Tumor recognition; Brain Tumor detection; Skin Lesion identification; and Renal Malignancy recognition AI model.</p>
Full article ">
16 pages, 1289 KiB  
Article
DAT: Deep Learning-Based Acceleration-Aware Trajectory Forecasting
by Ali Asghar Sharifi, Ali Zoljodi and Masoud Daneshtalab
J. Imaging 2024, 10(12), 321; https://doi.org/10.3390/jimaging10120321 - 13 Dec 2024
Viewed by 616
Abstract
As the demand for autonomous driving (AD) systems has increased, the enhancement of their safety has become critically important. A fundamental capability of AD systems is object detection and trajectory forecasting of vehicles and pedestrians around the ego-vehicle, which is essential for preventing [...] Read more.
As the demand for autonomous driving (AD) systems has increased, the enhancement of their safety has become critically important. A fundamental capability of AD systems is object detection and trajectory forecasting of vehicles and pedestrians around the ego-vehicle, which is essential for preventing potential collisions. This study introduces the Deep learning-based Acceleration-aware Trajectory forecasting (DAT) model, a deep learning-based approach for object detection and trajectory forecasting, utilizing raw sensor measurements. DAT is an end-to-end model that processes sequential sensor data to detect objects and forecasts their future trajectories at each time step. The core innovation of DAT lies in its novel forecasting module, which leverages acceleration data to enhance trajectory forecasting, leading to the consideration of a variety of agent motion models. We propose a robust and innovative method for estimating ground-truth acceleration for objects, along with an object detector that predicts acceleration attributes for each detected object and a novel method for trajectory forecasting. DAT is trained and evaluated on the NuScenes dataset, demonstrating its empirical effectiveness through extensive experiments. The results indicate that DAT significantly surpasses state-of-the-art methods, particularly in enhancing forecasting accuracy for objects exhibiting both linear and nonlinear motion patterns, achieving up to a 2× improvement. This advancement highlights the critical role of incorporating acceleration data into predictive models, representing a substantial step forward in the development of safer autonomous driving systems. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)
Show Figures

Figure 1

Figure 1
<p>(<b>Top</b> Row) Cascade methods, which handle detection, tracking, and forecasting in a sequential pipeline, are vulnerable to error propagation. In the diagram, the arrows indicate the direction of processing for the Lidar data, moving from raw input to the final output. The input data, represented in blue, is gathered from past observations, while the future output is shown in orange. This is because each stage assumes error-free input from the previous one, which is often unrealistic in real-world applications. As a result, errors can accumulate and negatively impact the final predictions. (<b>Bottom</b> Row) End-to-end methods, on the other hand, directly predict future trajectories from raw data. This unified approach allows for the joint optimization of detection, tracking, and forecasting, leading to more accurate and reliable results.</p>
Full article ">Figure 2
<p>Acceleration error comparison across different methods.</p>
Full article ">Figure 3
<p>Initial velocity error comparison across different methods.</p>
Full article ">Figure 4
<p>DAT: based on a LiDAR sequence, DAT detects objects in both the present frame (t) and future frames (up to t + T). These future detections are projected back to the current frame allowing for alignment with detections in the present moment.</p>
Full article ">Figure 5
<p>Qualitative evaluation of trajectory forecasts using DAT. In the first row, ground-truth trajectories are depicted in <span style="color: #00FF00">green</span>, the highest confidence forecast in <span style="color: #0000FF">blue</span>, and other potential future trajectories in <span style="color: #00FFFF">cyan</span>. The second row compares the highest confidence forecasts of DAT (<span style="color: #0000FF">blue</span>) with those of TrajectoryNAS (<span style="color: #FF00FF">magenta</span>), alongside the ground-truth trajectories (<span style="color: #00FF00">green</span>). The results illustrate that DAT predictions are closer to the ground truth.</p>
Full article ">
16 pages, 5125 KiB  
Article
Multi-Level Feature Fusion in CNN-Based Human Action Recognition: A Case Study on EfficientNet-B7
by Pitiwat Lueangwitchajaroen, Sitapa Watcharapinchai, Worawit Tepsan and Sorn Sooksatra
J. Imaging 2024, 10(12), 320; https://doi.org/10.3390/jimaging10120320 (registering DOI) - 12 Dec 2024
Viewed by 735
Abstract
Accurate human action recognition is becoming increasingly important across various fields, including healthcare and self-driving cars. A simple approach to enhance model performance is incorporating additional data modalities, such as depth frames, point clouds, and skeleton information, while previous studies have predominantly used [...] Read more.
Accurate human action recognition is becoming increasingly important across various fields, including healthcare and self-driving cars. A simple approach to enhance model performance is incorporating additional data modalities, such as depth frames, point clouds, and skeleton information, while previous studies have predominantly used late fusion techniques to combine these modalities, our research introduces a multi-level fusion approach that combines information at early, intermediate, and late stages together. Furthermore, recognizing the challenges of collecting multiple data types in real-world applications, our approach seeks to exploit multimodal techniques while relying solely on RGB frames as the single data source. In our work, we used RGB frames from the NTU RGB+D dataset as the sole data source. From these frames, we extracted 2D skeleton coordinates and optical flow frames using pre-trained models. We evaluated our multi-level fusion approach with EfficientNet-B7 as a case study, and our methods demonstrated significant improvement, achieving 91.5% in NTU RGB+D 60 dataset accuracy compared to single-modality and single-view models. Despite their simplicity, our methods are also comparable to other state-of-the-art approaches. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

Figure 1
<p>The proposed architecture integrates multi-level fusion through early, intermediate, and late fusion techniques. Early fusion is applied in the temporal stream, enriching the ROI-based OF with additional information from the ROI-based RGB frames. In addition, the spatial stream uses only ROI-based RGB frames as input. Intermediate fusion is used to merge extracted features from the spatial stream into the temporal stream, while late fusion is used to combine the softmax scores from both streams.</p>
Full article ">Figure 2
<p>The top row presents five examples of OF frames extracted from pairs of selected video frames. The second row illustrates the corresponding ROI RGB frames, ROI OF frames, and the result of early fusion combining the ROI RGB and ROI OF for the same five frames. This demonstrates a soft shading effect, highlighting the motion of people while also showing non-moving parts in the images.</p>
Full article ">Figure 3
<p>The confusion matrix for the proposed architecture on the NTU RGB+D 60 dataset, displaying actual classes on the vertical axis and predicted classes on the horizontal axis for both the cross-subject (CS) and the cross-view (CV) protocols, respectively.</p>
Full article ">
14 pages, 2304 KiB  
Article
Improved Generalizability in Medical Computer Vision: Hyperbolic Deep Learning in Multi-Modality Neuroimaging
by Cyrus Ayubcha, Sulaiman Sajed, Chady Omara, Anna B. Veldman, Shashi B. Singh, Yashas Ullas Lokesha, Alex Liu, Mohammad Ali Aziz-Sultan, Timothy R. Smith and Andrew Beam
J. Imaging 2024, 10(12), 319; https://doi.org/10.3390/jimaging10120319 - 12 Dec 2024
Viewed by 841
Abstract
Deep learning has shown significant value in automating radiological diagnostics but can be limited by a lack of generalizability to external datasets. Leveraging the geometric principles of non-Euclidean space, certain geometric deep learning approaches may offer an alternative means of improving model generalizability. [...] Read more.
Deep learning has shown significant value in automating radiological diagnostics but can be limited by a lack of generalizability to external datasets. Leveraging the geometric principles of non-Euclidean space, certain geometric deep learning approaches may offer an alternative means of improving model generalizability. This study investigates the potential advantages of hyperbolic convolutional neural networks (HCNNs) over traditional convolutional neural networks (CNNs) in neuroimaging tasks. We conducted a comparative analysis of HCNNs and CNNs across various medical imaging modalities and diseases, with a focus on a compiled multi-modality neuroimaging dataset. The models were assessed for their performance parity, robustness to adversarial attacks, semantic organization of embedding spaces, and generalizability. Zero-shot evaluations were also performed with ischemic stroke non-contrast CT images. HCNNs matched CNNs’ performance in less complex settings and demonstrated superior semantic organization and robustness to adversarial attacks. While HCNNs equaled CNNs in out-of-sample datasets identifying Alzheimer’s disease, in zero-shot evaluations, HCNNs outperformed CNNs and radiologists. HCNNs deliver enhanced robustness and organization in neuroimaging data. This likely underlies why, while HCNNs perform similarly to CNNs with respect to in-sample tasks, they confer improved generalizability. Nevertheless, HCNNs encounter efficiency and performance challenges with larger, complex datasets. These limitations underline the need for further optimization of HCNN architectures. HCNNs present promising improvements in generalizability and resilience for medical imaging applications, particularly in neuroimaging. Despite facing challenges with larger datasets, HCNNs enhance performance under adversarial conditions and offer better semantic organization, suggesting valuable potential in generalizable deep learning models in medical imaging and neuroimaging diagnostics. Full article
Show Figures

Figure 1

Figure 1
<p>Relative model performance across datasets. The bar plot above shows the Top-1 accuracy metrics with 95% confidence intervals for the Euclidean ResNet 18 and the Euclidean–Lorentz ResNet 18 across the three datasets, increasing in size from left to right (i.e., Miniature Multi-Disease (MDD) Dataset, Multi-Modality Neuroimaging (MMN) Dataset, and Multi-Disease (MD) Dataset).</p>
Full article ">Figure 2
<p>Euclidean and hyperbolic model T-SNE in the Neuroimaging Dataset. This Figure shows the low-dimensional representation T-SNE of the average class embedding space from the Euclidean ResNet 18 (<b>A</b>) and the Euclidean–Lorentz ResNet 18 (<b>B</b>) for the Multi-Modality Neuroimaging (MMN) Dataset. The colors denote the broader category per class.</p>
Full article ">Figure 3
<p>Euclidean and hyperbolic model dendrograms for the Neuroimaging Dataset. This Figure illustrates the hierarchical clustering dendrogram of the average class embedding space of the Euclidean ResNet 18 (<b>A</b>) and the Euclidean–Lorentz ResNet 18 (<b>B</b>) for the Multi-Modality Neuroimaging (MMN) Dataset.</p>
Full article ">Figure 4
<p>Zero-shot identification of stroke patients. The diagram above shows how many of the zero-shot stroke patients were identified across the Euclidean and Euclidean–Lorentz models, as well as by human radiologists with emergent non-contrast brain CT imaging. We also note that 26 patients were not identified using any of the three approaches.</p>
Full article ">
22 pages, 15973 KiB  
Article
Three-Dimensional Bone-Image Synthesis with Generative Adversarial Networks
by Christoph Angermann, Johannes Bereiter-Payr, Kerstin Stock, Gerald Degenhart and Markus Haltmeier
J. Imaging 2024, 10(12), 318; https://doi.org/10.3390/jimaging10120318 - 11 Dec 2024
Viewed by 639
Abstract
Medical image processing has been highlighted as an area where deep-learning-based models have the greatest potential. However, in the medical field, in particular, problems of data availability and privacy are hampering research progress and, thus, rapid implementation in clinical routine. The generation of [...] Read more.
Medical image processing has been highlighted as an area where deep-learning-based models have the greatest potential. However, in the medical field, in particular, problems of data availability and privacy are hampering research progress and, thus, rapid implementation in clinical routine. The generation of synthetic data not only ensures privacy but also allows the drawing of new patients with specific characteristics, enabling the development of data-driven models on a much larger scale. This work demonstrates that three-dimensional generative adversarial networks (GANs) can be efficiently trained to generate high-resolution medical volumes with finely detailed voxel-based architectures. In addition, GAN inversion is successfully implemented for the three-dimensional setting and used for extensive research on model interpretability and applications such as image morphing, attribute editing, and style mixing. The results are comprehensively validated on a database of three-dimensional HR-pQCT instances representing the bone micro-architecture of the distal radius. Full article
(This article belongs to the Special Issue Advances in Medical Imaging and Machine Learning)
Show Figures

Figure 1

Figure 1
<p>HR-pQCT bone samples of real patients with isotropic voxel size 60.7 <math display="inline"><semantics> <mi mathvariant="sans-serif">μ</mi> </semantics></math><math display="inline"><semantics> <mi mathvariant="normal">m</mi> </semantics></math>. Volumes are cropped to a region of interest (ROI) with varying numbers of voxels for each scan.</p>
Full article ">Figure 2
<p>Preprocessing. From left to right: The sample is cropped or padded to a constant size of 168 × 576 × 448 voxels. The mirrored volume is used as padding. The samples are considered regarding the discrete cosine basis. Clipping the basis coefficients to range <math display="inline"><semantics> <mrow> <mo stretchy="false">[</mo> <mo>−</mo> <mn>0.001</mn> <mo>,</mo> <mn>0.001</mn> <mo stretchy="false">]</mo> </mrow> </semantics></math> yields the noise volume. The padded regions are replaced by the corresponding noise volume.</p>
Full article ">Figure 3
<p>Exemplary visualization of the progressive growing strategy for the synthesis of 3D bone HR-pQCT data.</p>
Full article ">Figure 4
<p>Ten HR-pQCT volumes sampled from the proposed 3D-ProGAN (<b>first row</b>) and 3D-StyleGAN (<b>second row</b>). Synthesized volumes have spatial size of 32 × 288 × 224.</p>
Full article ">Figure 5
<p><b>First row</b>: samples with weak trabecular bone mineralization (Tb.BMD). <b>Second row</b>: samples with weak cortical bone mineralization (Ct.BMD). From left to right: <math display="inline"><semantics> <mrow> <msub> <mi>x</mi> <mn>1</mn> </msub> <mo>,</mo> <mspace width="4pt"/> <msubsup> <mi>x</mi> <mrow> <mn>1</mn> <mo>,</mo> <mn>2</mn> </mrow> <mrow> <mn>0.25</mn> </mrow> </msubsup> <mo>,</mo> <mspace width="4pt"/> <msubsup> <mi>x</mi> <mrow> <mn>1</mn> <mo>,</mo> <mn>2</mn> </mrow> <mrow> <mn>0.5</mn> </mrow> </msubsup> <mo>,</mo> <mspace width="4pt"/> <msubsup> <mi>x</mi> <mrow> <mn>1</mn> <mo>,</mo> <mn>2</mn> </mrow> <mrow> <mn>0.75</mn> </mrow> </msubsup> <mo>,</mo> <mspace width="4pt"/> <msub> <mi>x</mi> <mn>2</mn> </msub> </mrow> </semantics></math>. The areas marked in red allow the reader to better recognize the low Tb.BMD and the weak Ct.BMD of the examined radii, respectively.</p>
Full article ">Figure 6
<p>An illustration of the style combination based on the 3D-StyleGAN approach. For both examples, the first row denotes the source image (real patient data). The second row contains the target image at the left most position and style mix results where the style of the source is fed to the generator in the first three convolutional layers (<math display="inline"><semantics> <msubsup> <mi>x</mi> <mrow> <mi>s</mi> <mo>,</mo> <mi>t</mi> </mrow> <mn>3</mn> </msubsup> </semantics></math>), in the first seven layers (<math display="inline"><semantics> <msubsup> <mi>x</mi> <mrow> <mi>s</mi> <mo>,</mo> <mi>t</mi> </mrow> <mn>7</mn> </msubsup> </semantics></math>) and in the first twelve layers (<math display="inline"><semantics> <msubsup> <mi>x</mi> <mrow> <mi>s</mi> <mo>,</mo> <mi>t</mi> </mrow> <mn>12</mn> </msubsup> </semantics></math>), from left to right.</p>
Full article ">Figure 7
<p>3D-ProGAN results for attribute editing. For each volumetric sample, the center axial slice is visualized. Left: Existing patient <span class="html-italic">x</span>. Middle: Generated samples <math display="inline"><semantics> <mrow> <msub> <mi>G</mi> <mn>1</mn> </msub> <mrow> <mo stretchy="false">(</mo> <msub> <mi>z</mi> <mi>opt</mi> </msub> <mrow> <mo stretchy="false">(</mo> <mi>x</mi> <mo stretchy="false">)</mo> </mrow> <mo>+</mo> <mi>α</mi> <msub> <mi>n</mi> <mi>k</mi> </msub> <mo stretchy="false">)</mo> </mrow> <mo>,</mo> <mspace width="4pt"/> <mi>k</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mn>3</mn> <mo>,</mo> <mn>4</mn> </mrow> </semantics></math>. Right: difference <math display="inline"><semantics> <mrow> <msub> <mi>G</mi> <mn>1</mn> </msub> <mrow> <mo stretchy="false">(</mo> <msub> <mi>z</mi> <mi>opt</mi> </msub> <mrow> <mo stretchy="false">(</mo> <mi>x</mi> <mo stretchy="false">)</mo> </mrow> <mo>+</mo> <mi>α</mi> <msub> <mi>n</mi> <mi>k</mi> </msub> <mo stretchy="false">)</mo> </mrow> <mo>−</mo> <mi>x</mi> </mrow> </semantics></math>, where red and blue voxels denote positive and negative residuals, respectively.</p>
Full article ">Figure 8
<p>Comparison between computer-based realism scores and the subjective rating by Expert 1 (<b>first row</b>) and Expert 2 (<b>second row</b>) on HR-pQCT images. The horizontal axes denote the expert rating 1–5, while the vertical axes show the calculated realism scores. From left to right: <math display="inline"><semantics> <msub> <mi mathvariant="bold">r</mi> <mi>inc</mi> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi mathvariant="bold">r</mi> <mi>res</mi> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi mathvariant="bold">r</mi> <mi>vgs</mi> </msub> </semantics></math>.</p>
Full article ">Figure A1
<p>Synthetic HR-pQCT volumes sampled from the proposed 3D-ProGAN approach with varying parameters for the truncated normal distribution. From left to right column: truncation parameter equals <math display="inline"><semantics> <mrow> <mo>{</mo> <mn>2.6</mn> <mo>,</mo> <mn>1.8</mn> <mo>,</mo> <mn>1</mn> <mo>,</mo> <mn>0.2</mn> <mo>}</mo> </mrow> </semantics></math>.</p>
Full article ">Figure A2
<p>Synthetic HR-pQCT volumes sampled from the proposed 3D-StyleGAN approach with varying truncation levels. From left to right column: <math display="inline"><semantics> <mrow> <mi>ψ</mi> <mo>=</mo> <mo>{</mo> <mn>1</mn> <mo>,</mo> <mn>0.7</mn> <mo>,</mo> <mn>0.4</mn> <mo>,</mo> <mn>0.1</mn> <mo>}</mo> </mrow> </semantics></math>.</p>
Full article ">
22 pages, 3640 KiB  
Article
Evaluation of Color Difference Models for Wide Color Gamut and High Dynamic Range
by Olga Basova, Sergey Gladilin, Vladislav Kokhan, Mikhalina Kharkevich, Anastasia Sarycheva, Ivan Konovalenko, Mikhail Chobanu and Ilya Nikolaev
J. Imaging 2024, 10(12), 317; https://doi.org/10.3390/jimaging10120317 - 10 Dec 2024
Viewed by 643
Abstract
Color difference models (CDMs) are essential for accurate color reproduction in image processing. While CDMs aim to reflect perceived color differences (CDs) from psychophysical data, they remain largely untested in wide color gamut (WCG) and high dynamic range (HDR) contexts, which are underrepresented [...] Read more.
Color difference models (CDMs) are essential for accurate color reproduction in image processing. While CDMs aim to reflect perceived color differences (CDs) from psychophysical data, they remain largely untested in wide color gamut (WCG) and high dynamic range (HDR) contexts, which are underrepresented in current datasets. This gap highlights the need to validate CDMs across WCG and HDR. Moreover, the non-geodesic structure of perceptual color space necessitates datasets covering CDs of various magnitudes, while most existing datasets emphasize only small and threshold CDs. To address this, we collected a new dataset encompassing a broad range of CDs in WCG and HDR contexts and developed a novel CDM fitted to these data. Benchmarking various CDMs using STRESS and significant error fractions on both new and established datasets reveals that CAM16-UCS with power correction is the most versatile model, delivering strong average performance across WCG colors up to 1611 cd/m2. However, even the best CDM fails to achieve the desired accuracy limits and yields significant errors. CAM16-UCS, though promising, requires further refinement, particularly in its power correction component to better capture the non-geodesic structure of perceptual color space. Full article
(This article belongs to the Special Issue Color in Image Processing and Computer Vision)
Show Figures

Figure 1

Figure 1
<p>The VCD gamut (blue), BT.2020 (red), human visual gamut (gray), and sRGB gamut (colored) shown on the proLab chromaticity diagram.</p>
Full article ">Figure 2
<p>Front and side views of the experiment setup.</p>
Full article ">Figure 3
<p>Spectral power distributions of the luminance inside the booth compared to CIE D50, D55, and D65 illuminants.</p>
Full article ">Figure 4
<p>Stimuli setup for SCD experiments. Quadrants: Q1, upper right; Q2, upper left; Q3, lower right; and Q4, lower left.</p>
Full article ">Figure 5
<p>Interior of the light booth for TCD measurements.</p>
Full article ">Figure 6
<p>Spectral power distributions of Q1 primaries R, G, B, and C.</p>
Full article ">Figure 7
<p>Luminance vs. channel value for Q1 primaries (left to right, top to bottom: R, G, B, and C). Dots illustrate measurements; dashed lines represent ideal linearity; and solid lines, parabolic approximation.</p>
Full article ">Figure 8
<p>Arrangement of 18 measurement directions around the color center in xyY space. Blue points indicate chromaticity-only shifts (<math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi mathvariant="normal">Y</mi> <mo>=</mo> <mn>0</mn> </mrow> </semantics></math>), yellow points show luminance shifts (<math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi mathvariant="normal">x</mi> <mo>=</mo> <mn>0</mn> <mo>,</mo> <mo>Δ</mo> <mi mathvariant="normal">y</mi> <mo>=</mo> <mn>0</mn> </mrow> </semantics></math>), and red points represent combined chromaticity and luminance shifts. (<b>a</b>) 3D view; (<b>b</b>) section showing chromaticity-only shifts; (<b>c</b>) section with x-coordinate fixed at 0; (<b>d</b>) section with y-coordinate fixed.</p>
Full article ">Figure 9
<p>Collected data shown on the proLab chromaticity diagram (<b>left</b>) and corresponding luminance Y in cd/m<sup>2</sup> (<b>right</b>). The centers of the blue segment intersections indicate the centers of the measured ellipses, while the segments themselves represent the measured color differences.</p>
Full article ">Figure 10
<p>Histogram of errors for the current state-of-the-art CDMs (CIEDE2000 and CAM16-UCS-PC) on the COMBVD dataset. The x-axis represents the magnitude of the CDM error, while the y-axis indicates the number of these errors.</p>
Full article ">
11 pages, 1525 KiB  
Article
Toward Closing the Loop in Image-to-Image Conversion in Radiotherapy: A Quality Control Tool to Predict Synthetic Computed Tomography Hounsfield Unit Accuracy
by Paolo Zaffino, Ciro Benito Raggio, Adrian Thummerer, Gabriel Guterres Marmitt, Johannes Albertus Langendijk, Anna Procopio, Carlo Cosentino, Joao Seco, Antje Christin Knopf, Stefan Both and Maria Francesca Spadea
J. Imaging 2024, 10(12), 316; https://doi.org/10.3390/jimaging10120316 - 10 Dec 2024
Viewed by 808
Abstract
In recent years, synthetic Computed Tomography (CT) images generated from Magnetic Resonance (MR) or Cone Beam Computed Tomography (CBCT) acquisitions have been shown to be comparable to real CT images in terms of dose computation for radiotherapy simulation. However, until now, there has [...] Read more.
In recent years, synthetic Computed Tomography (CT) images generated from Magnetic Resonance (MR) or Cone Beam Computed Tomography (CBCT) acquisitions have been shown to be comparable to real CT images in terms of dose computation for radiotherapy simulation. However, until now, there has been no independent strategy to assess the quality of each synthetic image in the absence of ground truth. In this work, we propose a Deep Learning (DL)-based framework to predict the accuracy of synthetic CT in terms of Mean Absolute Error (MAE) without the need for a ground truth (GT). The proposed algorithm generates a volumetric map as an output, informing clinicians of the predicted MAE slice-by-slice. A cascading multi-model architecture was used to deal with the complexity of the MAE prediction task. The workflow was trained and tested on two cohorts of head and neck cancer patients with different imaging modalities: 27 MR scans and 33 CBCT. The algorithm evaluation revealed an accurate HU prediction (a median absolute prediction deviation equal to 4 HU for CBCT-based synthetic CTs and 6 HU for MR-based synthetic CTs), with discrepancies that do not affect the clinical decisions made on the basis of the proposed estimation. The workflow exhibited no systematic error in MAE prediction. This work represents a proof of concept about the feasibility of synthetic CT evaluation in daily clinical practice, and it paves the way for future patient-specific quality assessment strategies. Full article
Show Figures

Figure 1

Figure 1
<p>Representation of the general MAE prediction pipeline. An axial sCT slice is given as input, and the associated MAE scalar for the image slice is predicted by using a DL pipeline.</p>
Full article ">Figure 2
<p>A more detailed graphical representation of the MAE prediction pipeline. The final MAE prediction is obtained as a result of two DL steps: First a raw MAE interval classification is performed, followed by a more precise MAE estimation based on a regression algorithm.</p>
Full article ">Figure 3
<p>Exemplary <math display="inline"><semantics> <mrow> <mi>s</mi> <mi>C</mi> <msub> <mi>T</mi> <mrow> <mi>C</mi> <mi>B</mi> <mi>C</mi> <mi>T</mi> </mrow> </msub> </mrow> </semantics></math> overlaid with its <math display="inline"><semantics> <mrow> <mi>p</mi> <mi>M</mi> <mi>A</mi> <msub> <mi>E</mi> <mrow> <mi>v</mi> <mi>o</mi> <mi>l</mi> <mi>u</mi> <mi>m</mi> <mi>e</mi> </mrow> </msub> </mrow> </semantics></math>. In addition to the 2D views (axial, sagittal, and coronal planes), the 3D representation is also shown.</p>
Full article ">Figure 4
<p>Detailed workflow of MAE prediction. A single sCT axial slice is fed firstly into a DL model that classifies it as belonging to a specific MAE class. According to this prediction, the 2D image is then provided as input to a connected DL regression model, specifically trained to operate on a restricted range of MAE values. As a result, the MAE of a single sCT slice can be forecasted. In order to train the different models with a GT MAE, the ground truth CT is needed (dashed lines are needed only to train the models).</p>
Full article ">Figure 5
<p>PD distributions for modality-specific and mixed pipelines. Results for <math display="inline"><semantics> <mrow> <mi>s</mi> <mi>C</mi> <msub> <mi>T</mi> <mrow> <mi>C</mi> <mi>B</mi> <mi>C</mi> <mi>T</mi> </mrow> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>s</mi> <mi>C</mi> <msub> <mi>T</mi> <mrow> <mi>M</mi> <mi>R</mi> </mrow> </msub> </mrow> </semantics></math> are reported, respectively, in the left and in the right panel.</p>
Full article ">Figure 6
<p>APD distributions for modality-specific and mixed pipelines. Results for <math display="inline"><semantics> <mrow> <mi>s</mi> <mi>C</mi> <msub> <mi>T</mi> <mrow> <mi>C</mi> <mi>B</mi> <mi>C</mi> <mi>T</mi> </mrow> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>s</mi> <mi>C</mi> <msub> <mi>T</mi> <mrow> <mi>M</mi> <mi>R</mi> </mrow> </msub> </mrow> </semantics></math> are reported, respectively, in the left and in the right panel.</p>
Full article ">
12 pages, 599 KiB  
Article
PAS or Not PAS? The Sonographic Assessment of Placenta Accreta Spectrum Disorders and the Clinical Validation of a New Diagnostic and Prognostic Scoring System
by Antonella Vimercati, Arianna Galante, Margherita Fanelli, Francesca Cirignaco, Amerigo Vitagliano, Pierpaolo Nicolì, Andrea Tinelli, Antonio Malvasi, Miriam Dellino, Gianluca Raffaello Damiani, Barbara Crescenza, Giorgio Maria Baldini, Ettore Cicinelli and Marco Cerbone
J. Imaging 2024, 10(12), 315; https://doi.org/10.3390/jimaging10120315 - 10 Dec 2024
Viewed by 681
Abstract
This study aimed to evaluate our center’s experience in diagnosing and managing placenta accreta spectrum (PAS) in a high-risk population, focusing on prenatal ultrasound features associated with PAS severity and maternal outcomes. We conducted a retrospective analysis of 102 high-risk patients with confirmed [...] Read more.
This study aimed to evaluate our center’s experience in diagnosing and managing placenta accreta spectrum (PAS) in a high-risk population, focusing on prenatal ultrasound features associated with PAS severity and maternal outcomes. We conducted a retrospective analysis of 102 high-risk patients with confirmed placenta previa who delivered at our center between 2018 and 2023. Patients underwent transabdominal and transvaginal ultrasound scans, assessing typical sonographic features. Binary and multivariate logistic regression analyses were performed to identify sonographic markers predictive of PAS and relative complications. Key ultrasound features—retroplacental myometrial thinning (<1 mm), vascular lacunae, and retroplacental vascularization—were significantly associated with PAS and a higher risk of surgical complications. An exceedingly rare sign, the “riddled cervix” sign, was observed in only three patients with extensive cervical or parametrial involvement. Those patients had the worst surgical outcomes. This study highlights the utility of specific ultrasound features in stratifying PAS risk and guiding clinical and surgical management in high-risk pregnancies. The findings support integrating these markers into prenatal diagnostic protocols to improve patient outcomes and inform surgical planning. Full article
Show Figures

Figure 1

Figure 1
<p>Sonographic findings in a case of placenta percreta. (<b>A</b>) “Riddled cervix” sign at 28 weeks of gestation. Color Doppler transvaginal scan of a highly vascularized (Color Score 3–4) cervix with multiple vascular lakes. Normal cervical length: 35 mm. (<b>B</b>) Same patient at 33 weeks of gestation: evidence of multiple white line interruptions (yellow arrows). (<b>C</b>) Vascular lacunae sign at 32 weeks. Para-sagittal right sonographic scan with evidence of a riddled cervix sign (*) and a right placental cotyledon (yellow triangle) surrounded by large peripheral vascular lacunae with bulging. Absent myometrial thickness. (<b>D</b>) Anatomical specimen of the uterus after cesarean section and subsequent hysterectomy at 34 weeks of gestation. Evidence of the longitudinal incision on the fundus (yellow arrows). Placenta previa percreta on the right isthmic side (yellow triangle) In this case, there was a riddled cervix sign, correlated with parametrial invasion.</p>
Full article ">
19 pages, 9164 KiB  
Article
A Regularization Method for Landslide Thickness Estimation
by Lisa Borgatti, Davide Donati, Liwei Hu, Germana Landi and Fabiana Zama
J. Imaging 2024, 10(12), 314; https://doi.org/10.3390/jimaging10120314 - 10 Dec 2024
Viewed by 607
Abstract
Accurate estimation of landslide depth is essential for practical hazard assessment and risk mitigation. This work addresses the problem of determining landslide depth from satellite-derived elevation data. Using the principle of mass conservation, this problem can be formulated as a linear inverse problem. [...] Read more.
Accurate estimation of landslide depth is essential for practical hazard assessment and risk mitigation. This work addresses the problem of determining landslide depth from satellite-derived elevation data. Using the principle of mass conservation, this problem can be formulated as a linear inverse problem. To solve the inverse problem, we present a regularization approach that computes approximate solutions and regularization parameters using the Balancing Principle. Synthetic data were carefully designed and generated to evaluate the method under controlled conditions, allowing for precise validation of its performance. Through comprehensive testing with this synthetic dataset, we demonstrate the method’s robustness across varying noise levels. When applied to real-world data from the Fels landslide in Alaska, the proposed method proved its practical value in reconstructing landslide thickness patterns. These reconstructions showed good agreement with existing geological interpretations, validating the method’s effectiveness in real-world scenarios. Full article
Show Figures

Figure 1

Figure 1
<p>Overview of the synthetic landslide model: (<b>a</b>) three-dimensional geometry created in Rhinoceros; and (<b>b</b>) synthetic landslide model geometry in 3DEC, showing the different materials that form the model slope.</p>
Full article ">Figure 2
<p>Overview of the synthetic landslide model: (<b>a</b>) displacement magnitude distribution at the end of the simulation (<math display="inline"><semantics> <msub> <mi>t</mi> <mn>1</mn> </msub> </semantics></math>), where the white square outlines the area depicted in (<b>b</b>); and (<b>b</b>) detail of the landslide area, with the displacement vectors displayed as black arrows.</p>
Full article ">Figure 3
<p>Data of a synthetic landslide: (<b>a</b>) <math display="inline"><semantics> <mrow> <mi>x</mi> </mrow> </semantics></math>-displacement component; (<b>b</b>) <math display="inline"><semantics> <mrow> <mi>y</mi> </mrow> </semantics></math>-displacement component.</p>
Full article ">Figure 4
<p>Data of a synthetic landslide: (<b>a</b>) elevation change; (<b>b</b>) landslide thickness.</p>
Full article ">Figure 5
<p>Elevation r.h.s <math display="inline"><semantics> <mi mathvariant="bold">b</mi> </semantics></math>. (<b>a</b>) Noiseless r.h.s; (<b>b</b>) noisy r.h.s with a noise level <math display="inline"><semantics> <mrow> <mi>σ</mi> <mo>=</mo> <mn>0.01</mn> </mrow> </semantics></math>; and (<b>c</b>) noisy r.h.s with a noise level <math display="inline"><semantics> <mrow> <mi>σ</mi> <mo>=</mo> <mn>0.05</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 6
<p>Noise level <math display="inline"><semantics> <mrow> <mi>σ</mi> <mo>=</mo> <mn>0.01</mn> </mrow> </semantics></math>, relative error on the (<b>left</b>), and squared residual norm on the (<b>right</b>) for each iteration <span class="html-italic">k</span>.</p>
Full article ">Figure 7
<p>Noise level <math display="inline"><semantics> <mrow> <mi>σ</mi> <mo>=</mo> <mn>0.05</mn> </mrow> </semantics></math>, relative error on the (<b>left</b>), and squared residual norm on the (<b>right</b>) for each iteration <span class="html-italic">k</span>.</p>
Full article ">Figure 8
<p>Noise level <math display="inline"><semantics> <mrow> <mi>σ</mi> <mo>=</mo> <mn>0.01</mn> </mrow> </semantics></math>. On the (<b>left</b>): computed values of the regularization parameter <math display="inline"><semantics> <msup> <mi>λ</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </msup> </semantics></math>. On the (<b>right</b>): the number of internal iterations of the GP method at each iteration <span class="html-italic">k</span>.</p>
Full article ">Figure 9
<p>Noise level <math display="inline"><semantics> <mrow> <mi>σ</mi> <mo>=</mo> <mn>0.05</mn> </mrow> </semantics></math>. On the (<b>left</b>): computed values of the regularization parameter <math display="inline"><semantics> <msup> <mi>λ</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </msup> </semantics></math>. On the (<b>right</b>): the number of internal iterations of the GP method at each iteration <span class="html-italic">k</span>.</p>
Full article ">Figure 10
<p>(<b>a</b>) Ground truth thickness. (<b>b</b>) Computed thickness with a noise level <math display="inline"><semantics> <mrow> <mi>σ</mi> <mo>=</mo> <mn>0.01</mn> </mrow> </semantics></math>. (<b>c</b>) Computed thickness with a noise level <math display="inline"><semantics> <mrow> <mi>σ</mi> <mo>=</mo> <mn>0.05</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 11
<p>Error maps. (<b>a</b>) Noise level <math display="inline"><semantics> <mrow> <mi>σ</mi> <mo>=</mo> <mn>0.01</mn> </mrow> </semantics></math>, (<b>b</b>) noise level <math display="inline"><semantics> <mrow> <mi>σ</mi> <mo>=</mo> <mn>0.05</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 12
<p>Overview of the Fels landslide and its displacement: (<b>a</b>) View of the Fels landslide from the opposite slope; (<b>b</b>) displacement magnitude map derived from the SAR analysis described in [<a href="#B31-jimaging-10-00314" class="html-bibr">31</a>]; and (<b>c</b>) the elevation change that occurred between 2014 and 2016, as derived from repeated airborne LiDAR data [<a href="#B15-jimaging-10-00314" class="html-bibr">15</a>].</p>
Full article ">Figure 13
<p>Overview of the computed thickness map: (<b>a</b>) The computed thickness map (shades of red) and location of the profiles within the landslide area. The basemap is a hill-shaded relief map derived from the 2016 LiDAR dataset. Section lines are indicated in solid black. Landslide boundaries are marked by the dashed line, while the dotted black line shows the boundary of the fast moving toe. (<b>b</b>–<b>d</b>) Profiles 1–3 show the morphology of the basal surface inferred with our method and the VIM method. The horizontal axis represents the distance from the upper part of the slide. The vertical axis holds the value of the surface elevation (black), the elevation of the basal surface inferred with VIM (yellow), and our proposed method (red), respectively.</p>
Full article ">
15 pages, 627 KiB  
Review
Real-Time Emotion Recognition for Improving the Teaching–Learning Process: A Scoping Review
by Cèlia Llurba and Ramon Palau
J. Imaging 2024, 10(12), 313; https://doi.org/10.3390/jimaging10120313 - 9 Dec 2024
Viewed by 768
Abstract
Emotion recognition (ER) is gaining popularity in various fields, including education. The benefits of ER in the classroom for educational purposes, such as improving students’ academic performance, are gradually becoming known. Thus, real-time ER is proving to be a valuable tool for teachers [...] Read more.
Emotion recognition (ER) is gaining popularity in various fields, including education. The benefits of ER in the classroom for educational purposes, such as improving students’ academic performance, are gradually becoming known. Thus, real-time ER is proving to be a valuable tool for teachers as well as for students. However, its feasibility in educational settings requires further exploration. This review offers learning experiences based on real-time ER with students to explore their potential in learning and in improving their academic achievement. The purpose is to present evidence of good implementation and suggestions for their successful application. The content analysis finds that most of the practices lead to significant improvements in terms of educational purposes. Nevertheless, the analysis identifies problems that might block the implementation of these practices in the classroom and in education; among the obstacles identified are the absence of privacy of the students and the support needs of the students. We conclude that artificial intelligence (AI) and ER are potential tools to approach the needs in ordinary classrooms, although reliable automatic recognition is still a challenge for researchers to achieve the best ER feature in real time, given the high input data variability. Full article
(This article belongs to the Special Issue Deep Learning in Image Analysis: Progress and Challenges)
Show Figures

Figure 1

Figure 1
<p>Study selection process flow diagram.</p>
Full article ">Figure 2
<p>Number of publications by year of the 22 articles included in the scoping review.</p>
Full article ">
7 pages, 1102 KiB  
Communication
Quantitative MRI Assessment of Post-Surgical Spinal Cord Injury Through Radiomic Analysis
by Azadeh Sharafi, Andrew P. Klein and Kevin M. Koch
J. Imaging 2024, 10(12), 312; https://doi.org/10.3390/jimaging10120312 - 8 Dec 2024
Viewed by 616
Abstract
This study investigates radiomic efficacy in post-surgical traumatic spinal cord injury (SCI), overcoming MRI limitations from metal artifacts to enhance diagnosis, severity assessment, and lesion characterization or prognosis and therapy guidance. Traumatic spinal cord injury (SCI) causes severe neurological deficits. While MRI allows [...] Read more.
This study investigates radiomic efficacy in post-surgical traumatic spinal cord injury (SCI), overcoming MRI limitations from metal artifacts to enhance diagnosis, severity assessment, and lesion characterization or prognosis and therapy guidance. Traumatic spinal cord injury (SCI) causes severe neurological deficits. While MRI allows qualitative injury evaluation, standard imaging alone has limitations for precise SCI diagnosis, severity stratification, and pathology characterization, which are needed to guide prognosis and therapy. Radiomics enables quantitative tissue phenotyping by extracting a high-dimensional set of descriptive texture features from medical images. However, the efficacy of postoperative radiomic quantification in the presence of metal-induced MRI artifacts from spinal instrumentation has yet to be fully explored. A total of 50 healthy controls and 12 SCI patients post-stabilization surgery underwent 3D multi-spectral MRI. Automated spinal cord segmentation was followed by radiomic feature extraction. Supervised machine learning categorized SCI versus controls, injury severity, and lesion location relative to instrumentation. Radiomics differentiated SCI patients (Matthews correlation coefficient (MCC) 0.97; accuracy 1.0), categorized injury severity (MCC: 0.95; ACC: 0.98), and localized lesions (MCC: 0.85; ACC: 0.90). Combined T1 and T2 features outperformed individual modalities across tasks with gradient boosting models showing the highest efficacy. The radiomic framework achieved excellent performance, differentiating SCI from controls and accurately categorizing injury severity. The ability to reliably quantify SCI severity and localization could potentially inform diagnosis, prognosis, and guide therapy. Further research is warranted to validate radiomic SCI biomarkers and explore clinical integration. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

Figure 1
<p>The flowchart depicts the pipeline for segmenting the spinal cord as suggested in [<a href="#B19-jimaging-10-00312" class="html-bibr">19</a>].</p>
Full article ">Figure 2
<p>Sagittal (<b>a</b>) T<sub>2</sub>-weighted and (<b>b</b>) T<sub>1</sub>-weighted 3D-MSI MRI images of an instrumented damaged spinal cord. Axial sections, reformatted at the level of the dashed green line from (<b>a</b>,<b>b</b>), are shown in (<b>c</b>,<b>d</b>), respectively. The spinal cord is outlined in red in all images.</p>
Full article ">Figure 3
<p>Comparison of accuracy, F1 score, area under the curve (AUC-ROC), and mean per-class error across radiomic classification tasks using T<sub>1</sub>, T<sub>2</sub>, and combined T<sub>1</sub>/T<sub>2</sub> feature sets. The tasks include categorizing cohorts into healthy or spinal cord injury (SCI) groups, determining injury severity levels, and distinguishing between cord zones relative to the injury site.</p>
Full article ">
59 pages, 3270 KiB  
Review
State-of-the-Art Deep Learning Methods for Microscopic Image Segmentation: Applications to Cells, Nuclei, and Tissues
by Fatma Krikid, Hugo Rositi and Antoine Vacavant
J. Imaging 2024, 10(12), 311; https://doi.org/10.3390/jimaging10120311 - 6 Dec 2024
Cited by 1 | Viewed by 1547
Abstract
Microscopic image segmentation (MIS) is a fundamental task in medical imaging and biological research, essential for precise analysis of cellular structures and tissues. Despite its importance, the segmentation process encounters significant challenges, including variability in imaging conditions, complex biological structures, and artefacts (e.g., [...] Read more.
Microscopic image segmentation (MIS) is a fundamental task in medical imaging and biological research, essential for precise analysis of cellular structures and tissues. Despite its importance, the segmentation process encounters significant challenges, including variability in imaging conditions, complex biological structures, and artefacts (e.g., noise), which can compromise the accuracy of traditional methods. The emergence of deep learning (DL) has catalyzed substantial advancements in addressing these issues. This systematic literature review (SLR) provides a comprehensive overview of state-of-the-art DL methods developed over the past six years for the segmentation of microscopic images. We critically analyze key contributions, emphasizing how these methods specifically tackle challenges in cell, nucleus, and tissue segmentation. Additionally, we evaluate the datasets and performance metrics employed in these studies. By synthesizing current advancements and identifying gaps in existing approaches, this review not only highlights the transformative potential of DL in enhancing diagnostic accuracy and research efficiency but also suggests directions for future research. The findings of this study have significant implications for improving methodologies in medical and biological applications, ultimately fostering better patient outcomes and advancing scientific understanding. Full article
Show Figures

Figure 1

Figure 1
<p>The structure of U-Net [<a href="#B9-jimaging-10-00311" class="html-bibr">9</a>]. This figure depicts the U-Net structure, highlighting the contracting and expansive paths, with emphasis on the skip connections that facilitate efficient feature integration and detailed pixel-wise segmentation. The model processes an input of 572 × 572 pixels and generates an output of 388 × 388 pixels. Image from [<a href="#B9-jimaging-10-00311" class="html-bibr">9</a>].</p>
Full article ">Figure 2
<p>The structure of R-CNNs [<a href="#B10-jimaging-10-00311" class="html-bibr">10</a>]. This figure outlines the R-CNN architecture, illustrating the process of region proposal generation and CNN-based feature extraction for object detection and classification. Image from [<a href="#B10-jimaging-10-00311" class="html-bibr">10</a>].</p>
Full article ">Figure 3
<p>The structure of GANs [<a href="#B11-jimaging-10-00311" class="html-bibr">11</a>]. This diagram illustrates the structure of a GAN, comprising a generator and a discriminator. The generator, using a random input, produces synthetic images that mimic real images, such as the brain image shown.</p>
Full article ">Figure 4
<p>Main idea of YOLO [<a href="#B12-jimaging-10-00311" class="html-bibr">12</a>]. This figure illustrates the YOLO architecture, showcasing its single-shot detection approach that predicts bounding boxes and class probabilities directly from full images for real-time object detection and segmentation. Image from [<a href="#B12-jimaging-10-00311" class="html-bibr">12</a>].</p>
Full article ">Figure 5
<p>Architecture of the ViT from [<a href="#B13-jimaging-10-00311" class="html-bibr">13</a>]. This figure illustrates the process of dividing an input image into non-overlapping patches, transforming these patches by adding learnable embeddings, and feeding them through multiple layers of multi-head self-attention and feed-forward networks. Image from [<a href="#B13-jimaging-10-00311" class="html-bibr">13</a>].</p>
Full article ">Figure 6
<p>PRISMA flow diagram of the literature selection process. This diagram illustrates the systematic review process undertaken to select relevant articles for the study. It details the total number, the filtering criteria applied, and the final count of articles included in the SLR.</p>
Full article ">Figure 7
<p>Microscopic image segmentation levels. This figure illustrates the segmentation process across three levels: cell, nucleus, and tissue.</p>
Full article ">Figure 8
<p>Number of studies published by year for cell segmentation.</p>
Full article ">Figure 9
<p>Number of studies published by year for nucleus segmentation.</p>
Full article ">Figure 10
<p>Number of studies published by year for tissue segmentation.</p>
Full article ">
17 pages, 10713 KiB  
Article
UV Hyperspectral Imaging with Xenon and Deuterium Light Sources: Integrating PCA and Neural Networks for Analysis of Different Raw Cotton Types
by Mohammad Al Ktash, Mona Knoblich, Max Eberle, Frank Wackenhut and Marc Brecht
J. Imaging 2024, 10(12), 310; https://doi.org/10.3390/jimaging10120310 - 5 Dec 2024
Viewed by 745
Abstract
Ultraviolet (UV) hyperspectral imaging shows significant promise for the classification and quality assessment of raw cotton, a key material in the textile industry. This study evaluates the efficacy of UV hyperspectral imaging (225–408 nm) using two different light sources: xenon arc (XBO) and [...] Read more.
Ultraviolet (UV) hyperspectral imaging shows significant promise for the classification and quality assessment of raw cotton, a key material in the textile industry. This study evaluates the efficacy of UV hyperspectral imaging (225–408 nm) using two different light sources: xenon arc (XBO) and deuterium lamps, in comparison to NIR hyperspectral imaging. The aim is to determine which light source provides better differentiation between cotton types in UV hyperspectral imaging, as each interacts differently with the materials, potentially affecting imaging quality and classification accuracy. Principal component analysis (PCA) and Quadratic Discriminant Analysis (QDA) were employed to differentiate between various cotton types and hemp plant. PCA for the XBO illumination revealed that the first three principal components (PCs) accounted for 94.8% of the total variance: PC1 (78.4%) and PC2 (11.6%) clustered the samples into four main groups—hemp (HP), recycled cotton (RcC), and organic cotton (OC) from the other cotton samples—while PC3 (6%) further separated RcC. When using the deuterium light source, the first three PCs explained 89.4% of the variance, effectively distinguishing sample types such as HP, RcC, and OC from the remaining samples, with PC3 clearly separating RcC. When combining the PCA scores with QDA, the classification accuracy reached 76.1% for the XBO light source and 85.1% for the deuterium light source. Furthermore, a deep learning technique called a fully connected neural network for classification was applied. The classification accuracy for the XBO and deuterium light sources reached 83.6% and 90.1%, respectively. The results highlight the ability of this method to differentiate conventional and organic cotton, as well as hemp, and to identify distinct types of recycled cotton, suggesting varying recycling processes and possible common origins with raw cotton. These findings underscore the potential of UV hyperspectral imaging, coupled with chemometric models, as a powerful tool for enhancing cotton classification accuracy in the textile industry. Full article
(This article belongs to the Section Color, Multi-spectral, and Hyperspectral Imaging)
Show Figures

Figure 1

Figure 1
<p>Visual representation of the pressed disc-shaped samples. The samples were mechanically cleaned cotton (MCC), raw hemp plant (HP), recycled organic bright cotton (RcBC), recycled cotton (RcC), organic raw material cotton (OC), and standard raw material cotton (StC).</p>
Full article ">Figure 2
<p>Averaged spectra recorded by UV hyperspectral imaging of raw cotton samples and hemp with (<b>a</b>) XBO as a light source and (<b>b</b>) deuterium as a light source. The samples were mechanically cleaned cotton (MCC), raw hemp plant (HP), recycled organic bright cotton (RcBC), recycled cotton (RcC), organic raw material cotton (OC), and standard raw material cotton (StC).</p>
Full article ">Figure 3
<p>PCA model, calculated using two light sources, XBO and deuterium, with the following scores: (<b>a</b>) PC1 (78.4%) vs. PC2 (11.9%) vs. PC3 (6%) using XBO. (<b>b</b>) PC1 (73%) vs. PC2 (10%) vs. PC3 (6%) using deuterium. (<b>c</b>,<b>d</b>) The corresponding loading plots for XBO and deuterium, respectively.</p>
Full article ">Figure 4
<p>Prediction of mixed cotton and hemp samples using PCA-QDA using XBO (column 1) and deuterium lamps (column 2). First row: mixture of OC (orange) and StC (brown). Second row: mixture of RcBC (light green) and RcC (yellow). Third row: mixture of MCC (dark blue) and HP (light blue).</p>
Full article ">Figure 5
<p>ROC curve and corresponding AUC values for the developed neural networks to classify different types of cotton using (<b>a</b>) XBO and (<b>b</b>) deuterium light sources. The calculation was performed using the one-vs-rest method for multiclass classification problems.</p>
Full article ">Figure 6
<p>Normalized intensity of (<b>a</b>) XBO and (<b>b</b>) deuterium lamps [<a href="#B16-jimaging-10-00310" class="html-bibr">16</a>].</p>
Full article ">Figure A1
<p>Two-dimensional scores of the PCA model with (<b>a</b>) PC1 vs. PC2 and (<b>b</b>) PC1 vs. PC3. (<b>c</b>,<b>d</b>) Corresponding loadings.</p>
Full article ">Figure A2
<p>PCA model with MCC, RcBC, and StC; (<b>a</b>) scores of PC1 vs. PC2 vs. PC3 and (<b>b</b>) corresponding loadings.</p>
Full article ">Figure A3
<p>Two-dimensional scores of the PCA model with (<b>a</b>) PC1 vs. PC2 and (<b>b</b>) PC1 vs. PC3. (<b>c</b>,<b>d</b>) Corresponding loadings.</p>
Full article ">Figure A4
<p>PCA model with MCC, RcCB, and StC; (<b>a</b>) scores of PC1 vs. PC2 vs. PC3 and (<b>b</b>) corresponding loadings.</p>
Full article ">Figure A5
<p>(<b>a</b>) The learning curve of the neural network model for cotton type classification using XBO lamp data, illustrating the model’s performance, as evaluated on both training and validation datasets across each epoch of the training process. (<b>b</b>) The learning curve of the neural network model for cotton type classification using deuterium lamp data, illustrating the model’s performance, as evaluated on both training and validation datasets across each epoch of the training process. Model accuracies achieved after the final training epoch for the training, validation, and test datasets using (<b>c</b>) deuterium lamp and (<b>d</b>) XBO lamp.</p>
Full article ">
17 pages, 3796 KiB  
Article
FastQAFPN-YOLOv8s-Based Method for Rapid and Lightweight Detection of Walnut Unseparated Material
by Junqiu Li, Jiayi Wang, Dexiao Kong, Qinghui Zhang and Zhenping Qiang
J. Imaging 2024, 10(12), 309; https://doi.org/10.3390/jimaging10120309 - 2 Dec 2024
Cited by 1 | Viewed by 783
Abstract
Walnuts possess significant nutritional and economic value. Fast and accurate sorting of shells and kernels will enhance the efficiency of automated production. Therefore, we propose a FastQAFPN-YOLOv8s object detection network to achieve rapid and precise detection of unsorted materials. The method uses lightweight [...] Read more.
Walnuts possess significant nutritional and economic value. Fast and accurate sorting of shells and kernels will enhance the efficiency of automated production. Therefore, we propose a FastQAFPN-YOLOv8s object detection network to achieve rapid and precise detection of unsorted materials. The method uses lightweight Pconv (Partial Convolution) operators to build the FasterNextBlock structure, which serves as the backbone feature extractor for the Fasternet feature extraction network. The ECIoU loss function, combining EIoU (Efficient-IoU) and CIoU (Complete-IoU), speeds up the adjustment of the prediction frame and the network regression. In the Neck section of the network, the QAFPN feature fusion extraction network is proposed to replace the PAN-FPN (Path Aggregation Network—Feature Pyramid Network) in YOLOv8s with a Rep-PAN structure based on the QARepNext reparameterization framework for feature fusion extraction to strike a balance between network performance and inference speed. To validate the method, we built a three-axis mobile sorting device and created a dataset of 3000 images of walnuts after shell removal for experiments. The results show that the improved network contains 6071008 parameters, a training time of 2.49 h, a model size of 12.3 MB, an mAP (Mean Average Precision) of 94.5%, and a frame rate of 52.1 FPS. Compared with the original model, the number of parameters decreased by 45.5%, with training time reduced by 32.7%, the model size shrunk by 45.3%, and frame rate improved by 40.8%. However, some accuracy is sacrificed due to the lightweight design, resulting in a 1.2% decrease in mAP. The network reduces the model size by 59.7 MB and 23.9 MB compared to YOLOv7 and YOLOv6, respectively, and improves the frame rate by 15.67 fps and 22.55 fps, respectively. The average confidence and mAP show minimal changes compared to YOLOv7 and improved by 4.2% and 2.4% compared to YOLOv6, respectively. The FastQAFPN-YOLOv8s detection method effectively reduces model size while maintaining recognition accuracy. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

Figure 1
<p>Sample images of the dataset.</p>
Full article ">Figure 2
<p>Experimental platform preparation. (<b>a</b>) Experimental platform (<b>b</b>) Field of view of the camera.</p>
Full article ">Figure 3
<p>YOLOv8s improvement process diagram.</p>
Full article ">Figure 4
<p>YOLOv8s specific improvement layer.</p>
Full article ">Figure 5
<p>FasterNext structure construction.</p>
Full article ">Figure 6
<p>QAFPN structure construction.</p>
Full article ">Figure 7
<p>Comparison of different loss functions.</p>
Full article ">Figure 8
<p>Graph of recognition results of different models.</p>
Full article ">
12 pages, 1486 KiB  
Article
Elucidating Early Radiation-Induced Cardiotoxicity Markers in Preclinical Genetic Models Through Advanced Machine Learning and Cardiac MRI
by Dayeong An and El-Sayed Ibrahim
J. Imaging 2024, 10(12), 308; https://doi.org/10.3390/jimaging10120308 - 1 Dec 2024
Viewed by 788
Abstract
Radiation therapy (RT) is widely used to treat thoracic cancers but carries a risk of radiation-induced heart disease (RIHD). This study aimed to detect early markers of RIHD using machine learning (ML) techniques and cardiac MRI in a rat model. SS.BN3 consomic rats, [...] Read more.
Radiation therapy (RT) is widely used to treat thoracic cancers but carries a risk of radiation-induced heart disease (RIHD). This study aimed to detect early markers of RIHD using machine learning (ML) techniques and cardiac MRI in a rat model. SS.BN3 consomic rats, which have a more subtle RIHD phenotype compared to Dahl salt-sensitive (SS) rats, were treated with localized cardiac RT or sham at 10 weeks of age. Cardiac MRI was performed 8 and 10 weeks post-treatment to assess global and regional cardiac function. ML algorithms were applied to differentiate sham-treated and irradiated rats based on early changes in myocardial function. Despite normal global left ventricular ejection fraction in both groups, strain analysis showed significant reductions in the anteroseptal and anterolateral segments of irradiated rats. Gradient boosting achieved an F1 score of 0.94 and an ROC value of 0.95, while random forest showed an accuracy of 88%. These findings suggest that ML, combined with cardiac MRI, can effectively detect early preclinical changes in RIHD, particularly alterations in regional myocardial contractility, highlighting the potential of these techniques for early detection and monitoring of radiation-induced cardiac dysfunction. Full article
(This article belongs to the Special Issue Progress and Challenges in Biomedical Image Analysis)
Show Figures

Figure 1

Figure 1
<p>Ventricular remodeling post-RT to preserve global cardiac function. Mid-ventricular short axis cine images showing end-diastolic and end-systolic images in sham, 8 weeks post-RT, and 10 weeks post-RT in SS.BN3 rats. The images show preserved cardiac function post-RT, along with concentric hypertrophy (arrow).</p>
Full article ">Figure 2
<p>Contractility pattern changes post-RT. Segmental (<b>a</b>) circumferential, (<b>b</b>) radial, and (<b>c</b>) longitudinal strain curves throughout the whole cardiac cycle (20 timeframes starting after the R-wave of the ECG signal) in SS.BN3 sham, 8 weeks post-RT, and 10 weeks post-RT rats. Myocardial segmental color code is shown on the left panel in each row (short-axis slices for circumferential and radial strain and long-axis slice for longitudinal strain), where Ant, Inf, Sept, and Lat show the anterior, inferior, septal, and lateral segments, respectively. Note reduced peak strain post-RT. Note more heterogeneity (mechanical dyssynchrony) between strain curves from different heart segments at 10 weeks post-RT.</p>
Full article ">Figure 3
<p>The bar plots illustrate the differences across four key metrics: (<b>a</b>) circumferential strain, (<b>b</b>) radial strain, (<b>c</b>) rotation angle, and (<b>d</b>) short-axis (SAX) motion. Data are represented for the six myocardial sectors: anterior, anteroseptal, inferoseptal, inferior, inferolateral, and anterolateral. Error bars show SEM. Asterisk (*) indicates statistically significant (<span class="html-italic">p</span> &lt; 0.05) differences between sham and post-RT.</p>
Full article ">Figure 4
<p>Thebar plots illustrate the differences across four key metrics: (<b>a</b>) circumferential strain, (<b>b</b>) radial strain, (<b>c</b>) rotation angle, and (<b>d</b>) short-axis (SAX) motion. Data are represented for the six myocardial sectors: anterior, anteroseptal, inferoseptal, inferior, inferolateral, and anterolateral. Error bars show SEM. Asterisk (*) indicates statistically significant (<span class="html-italic">p</span> &lt; 0.05) differences between sham and 8 weeks post-RT or 10 weeks post-RT. Hash (#) indicates statistically significant differences between 8 weeks post-RT and 10 weeks post-RT.</p>
Full article ">Figure 5
<p>Comparison of performance metrics across different classifiers for various feature sets to differentiate sham vs. irradiated SS.BN3 rats. Bars represent metrics (accuracy, F1 score, specificity, sensitivity, ROC AUC) for feature sets (<b>a</b>) Lasso, (<b>b</b>) selected 7 features, (<b>c</b>) selected 19 features, and (<b>d</b>) all features.</p>
Full article ">Figure 6
<p>Comparison of performance metrics across different classifiers for various feature sets to differentiate sham vs. 8 weeks post-RT vs. 10 weeks post-RT. Bars represent metrics (accuracy, F1 score, specificity, sensitivity, ROC AUC) for feature sets (<b>a</b>) Lasso, (<b>b</b>) selected 7 features, (<b>c</b>) selected 21 features, and (<b>d</b>) all features.</p>
Full article ">
17 pages, 648 KiB  
Article
Temporal Gap-Aware Attention Model for Temporal Action Proposal Generation
by Sorn Sooksatra and Sitapa Watcharapinchai
J. Imaging 2024, 10(12), 307; https://doi.org/10.3390/jimaging10120307 - 29 Nov 2024
Viewed by 667
Abstract
Temporal action proposal generation is a method for extracting temporal action instances or proposals from untrimmed videos. Existing methods often struggle to segment contiguous action proposals, which are a group of action boundaries with small temporal gaps. To address this limitation, we propose [...] Read more.
Temporal action proposal generation is a method for extracting temporal action instances or proposals from untrimmed videos. Existing methods often struggle to segment contiguous action proposals, which are a group of action boundaries with small temporal gaps. To address this limitation, we propose incorporating an attention mechanism to weigh the importance of each proposal within a contiguous group. This mechanism leverages the gap displacement between proposals to calculate attention scores, enabling a more accurate localization of action boundaries. We evaluate our method against a state-of-the-art boundary-based baseline on ActivityNet v1.3 and Thumos 2014 datasets. The experimental results demonstrate that our approach significantly improves the performance of short-duration and contiguous action proposals, achieving an average recall of 78.22%. Full article
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) An example of a video frame sequence with contiguous action proposals (within the green box) in predicted results, (<b>b</b>) the BMN candidate action proposal boundaries in different intensities with confidence scores, with actual proposals shown as gray shaded regions, and (<b>c</b>) the confidence scores on proposals by temporal length (best viewed in color).</p>
Full article ">Figure 2
<p>Examples of attention masks from videos with various gap displacements in the range of (<b>a</b>) 0–10 s, (<b>b</b>) 20–30 s, and (<b>c</b>) more than 100 s, where green dots represent the action proposal positions. (Best viewed in color.)</p>
Full article ">Figure 3
<p>The proposed TAPG network architecture (G-MCBD) with score fusion and SNMS in an inference phase. Green and red circles represent the starting and ending time of each action proposal, respectively (best viewed in color).</p>
Full article ">Figure 4
<p>Examples of successful cases in merging contiguous proposals (<b>top</b>) and emphasizing small action proposals (<b>bottom</b>) with predicted proposals from MCBD (blue lines), G-MCBD (green lines), and their ground truths (red lines). The predicted starting and ending times of each are indicated by the beginning and ending of each line, respectively (best viewed in color).</p>
Full article ">Figure 5
<p>Examples of failure cases in overlapping proposals (<b>top</b>) and proposals within temporal gaps (<b>bottom</b>) with predicted proposals from MCBD (blue lines), G-MCBD (green lines), and their ground truths (red lines). The predicted starting and ending times of each are indicated by the beginning and ending of each line, respectively (best viewed in color).</p>
Full article ">
39 pages, 3120 KiB  
Article
A Comparative Review of the SWEET Simulator: Theoretical Verification Against Other Simulators
by Amine Ben-Daoued, Frédéric Bernardin and Pierre Duthon
J. Imaging 2024, 10(12), 306; https://doi.org/10.3390/jimaging10120306 - 27 Nov 2024
Viewed by 626
Abstract
Accurate luminance-based image generation is critical in physically based simulations, as even minor inaccuracies in radiative transfer calculations can introduce noise or artifacts, adversely affecting image quality. The radiative transfer simulator, SWEET, uses a backward Monte Carlo approach, and its performance is analyzed [...] Read more.
Accurate luminance-based image generation is critical in physically based simulations, as even minor inaccuracies in radiative transfer calculations can introduce noise or artifacts, adversely affecting image quality. The radiative transfer simulator, SWEET, uses a backward Monte Carlo approach, and its performance is analyzed alongside other simulators to assess how Monte Carlo-induced biases vary with parameters like optical thickness and medium anisotropy. This work details the advancements made to SWEET since the previous publication, with a specific focus on a more comprehensive comparison with other simulators such as Mitsuba. The core objective is to evaluate the precision of SWEET by comparing radiometric quantities like luminance, which serves as a method for validating the simulator. This analysis is particularly important in contexts such as automotive camera imaging, where accurate scene representation is crucial to reducing noise and ensuring the reliability of image-based systems in autonomous driving. By focusing on detailed radiometric comparisons, this study underscores SWEET’s ability to minimize noise, thus providing high-quality imaging for advanced applications. Full article
Show Figures

Figure 1

Figure 1
<p>Schematic overview of the most important surface scattering models in SWEET.</p>
Full article ">Figure 2
<p>Comparing SWEET to Steven’s Monte Carlo code for a simple situation on fluence evaluation, which is computed as the integral of the radiance over the solid angles of 3D space <math display="inline"><semantics> <mrow> <msub> <mo>∫</mo> <mo>Ω</mo> </msub> <mi>L</mi> <mrow> <mo>(</mo> <mi>r</mi> <mo>,</mo> <mover accent="true"> <mi>u</mi> <mo>→</mo> </mover> <mo>)</mo> </mrow> <mi>d</mi> <mo>Ω</mo> </mrow> </semantics></math> (<span class="html-italic">L</span> is the radiance at position <span class="html-italic">r</span> and in a direction <math display="inline"><semantics> <mover accent="true"> <mi>u</mi> <mo>→</mo> </mover> </semantics></math>), for asymetry phase parameter: g = 0.0, optical parameters: <math display="inline"><semantics> <mi>σ</mi> </semantics></math> = 0.5 m<sup>−1</sup> and <math display="inline"><semantics> <mi>κ</mi> </semantics></math> = 0.5 m<sup>−1</sup>, the 95% confidence intervals for SWEET are in grey.</p>
Full article ">Figure 3
<p>Comparing SWEET to Mitsuba for a simple situation on radiance evaluation at a distance of 1 mm, for asymetry phase parameter: g = 0.5, optical parameters: <math display="inline"><semantics> <mi>σ</mi> </semantics></math> = 0.9 m<sup>−1</sup> and <math display="inline"><semantics> <mi>κ</mi> </semantics></math> = 0.1 m<sup>−1</sup>.</p>
Full article ">Figure 4
<p>Relative discrepancies for fluence between SWEET to Steven’s Monte Carlo code for a set of distances, optical coeficients (<math display="inline"><semantics> <mi>κ</mi> </semantics></math> &amp; <math display="inline"><semantics> <mi>σ</mi> </semantics></math>) and anisotropy phase parameter (g). Albedo <math display="inline"><semantics> <mrow> <mo>=</mo> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mi>σ</mi> <mrow> <mi>σ</mi> <mo>+</mo> <mi>κ</mi> </mrow> </mfrac> </mstyle> </mrow> </semantics></math>, where <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mi>σ</mi> <mo>+</mo> <mi>κ</mi> </mrow> </semantics></math> is taken equal to 1.0; g = 0 in the <b>left</b> and g = 0.9 in the <b>right</b>.</p>
Full article ">Figure 5
<p>Relative 95% confidence intervals of SWEET for fluence for a set of distances, optical coeficients (<math display="inline"><semantics> <mi>κ</mi> </semantics></math> &amp; <math display="inline"><semantics> <mi>σ</mi> </semantics></math>) and anisotropy phase parameter (g). Albedo <math display="inline"><semantics> <mrow> <mo>=</mo> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mi>σ</mi> <mrow> <mi>σ</mi> <mo>+</mo> <mi>κ</mi> </mrow> </mfrac> </mstyle> </mrow> </semantics></math>, g = 0 in the <b>left</b> and g = 0.9 in the <b>right</b>.</p>
Full article ">Figure 6
<p>Comparing SWEET to Mitsuba Monte Carlo code for luminance for a set of distances and optical coeficients (<math display="inline"><semantics> <mi>κ</mi> </semantics></math> &amp; <math display="inline"><semantics> <mi>σ</mi> </semantics></math>). Anisotropy phase parameter (g) equals to 0.0, for the case of punctual light source. Albedo <math display="inline"><semantics> <mrow> <mo>=</mo> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mi>σ</mi> <mrow> <mi>σ</mi> <mo>+</mo> <mi>κ</mi> </mrow> </mfrac> </mstyle> </mrow> </semantics></math>, where <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mi>σ</mi> <mo>+</mo> <mi>κ</mi> </mrow> </semantics></math> is taken equal to 1.0, each figure corresponds to a distance as in this matrix: <math display="inline"><semantics> <mfenced open="[" close="]"> <mtable> <mtr> <mtd> <mrow> <mn>0.001</mn> </mrow> </mtd> <mtd> <mrow> <mn>0.5</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>1.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>5.0</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>10.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>20.0</mn> </mrow> </mtd> </mtr> </mtable> </mfenced> </semantics></math> m.</p>
Full article ">Figure 7
<p>Comparing SWEET to Mitsuba Monte Carlo code for luminance for a set of distances and optical coeficients (<math display="inline"><semantics> <mi>κ</mi> </semantics></math> &amp; <math display="inline"><semantics> <mi>σ</mi> </semantics></math>). Anisotropy phase parameter (g) equals to 0.9, for the case of punctual light source. Albedo <math display="inline"><semantics> <mrow> <mo>=</mo> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mi>σ</mi> <mrow> <mi>σ</mi> <mo>+</mo> <mi>κ</mi> </mrow> </mfrac> </mstyle> </mrow> </semantics></math>, where <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mi>σ</mi> <mo>+</mo> <mi>κ</mi> </mrow> </semantics></math> is taken equal to 1.0, each figure corresponds to a distance as in this matrix: <math display="inline"><semantics> <mfenced open="[" close="]"> <mtable> <mtr> <mtd> <mrow> <mn>0.001</mn> </mrow> </mtd> <mtd> <mrow> <mn>0.5</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>1.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>5.0</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>10.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>20.0</mn> </mrow> </mtd> </mtr> </mtable> </mfenced> </semantics></math> m.</p>
Full article ">Figure 8
<p>SWEET relative 95% confidence interval for luminance for a set of distances and optical coeficients (<math display="inline"><semantics> <mi>κ</mi> </semantics></math> &amp; <math display="inline"><semantics> <mi>σ</mi> </semantics></math>). Anisotropy phase parameter (g) equals to 0.0, for the case of punctual light source. Albedo <math display="inline"><semantics> <mrow> <mo>=</mo> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mi>σ</mi> <mrow> <mi>σ</mi> <mo>+</mo> <mi>κ</mi> </mrow> </mfrac> </mstyle> </mrow> </semantics></math>, where <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mi>σ</mi> <mo>+</mo> <mi>κ</mi> </mrow> </semantics></math> is taken equal to 1.0, each figure corresponds to a distance as in this matrix: <math display="inline"><semantics> <mfenced open="[" close="]"> <mtable> <mtr> <mtd> <mrow> <mn>0.001</mn> </mrow> </mtd> <mtd> <mrow> <mn>0.5</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>1.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>5.0</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>10.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>20.0</mn> </mrow> </mtd> </mtr> </mtable> </mfenced> </semantics></math> m.</p>
Full article ">Figure 9
<p>SWEET relative 95% confidence interval for luminance for a set of distances and optical coeficients (<math display="inline"><semantics> <mi>κ</mi> </semantics></math> &amp; <math display="inline"><semantics> <mi>σ</mi> </semantics></math>). Anisotropy phase parameter (g) equals to 0.9, for the case of punctual light source. Albedo <math display="inline"><semantics> <mrow> <mo>=</mo> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mi>σ</mi> <mrow> <mi>σ</mi> <mo>+</mo> <mi>κ</mi> </mrow> </mfrac> </mstyle> </mrow> </semantics></math>, where <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mi>σ</mi> <mo>+</mo> <mi>κ</mi> </mrow> </semantics></math> is taken equal to 1.0, each figure corresponds to a distance as in this matrix: <math display="inline"><semantics> <mfenced open="[" close="]"> <mtable> <mtr> <mtd> <mrow> <mn>0.001</mn> </mrow> </mtd> <mtd> <mrow> <mn>0.5</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>1.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>5.0</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>10.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>20.0</mn> </mrow> </mtd> </mtr> </mtable> </mfenced> </semantics></math> m.</p>
Full article ">Figure 10
<p>Comparing SWEET to Mitsuba Monte Carlo code for luminance for a set of distances and optical coeficients (<math display="inline"><semantics> <mi>κ</mi> </semantics></math> &amp; <math display="inline"><semantics> <mi>σ</mi> </semantics></math>). Anisotropy phase parameter (g) equals to 0.0, for the case of rectangular light source. Albedo <math display="inline"><semantics> <mrow> <mo>=</mo> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mi>σ</mi> <mrow> <mi>σ</mi> <mo>+</mo> <mi>κ</mi> </mrow> </mfrac> </mstyle> </mrow> </semantics></math>, where <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mi>σ</mi> <mo>+</mo> <mi>κ</mi> </mrow> </semantics></math> is taken equal to 1.0, each figure corresponds to a distance as in this matrix: <math display="inline"><semantics> <mfenced open="[" close="]"> <mtable> <mtr> <mtd> <mrow> <mn>0.001</mn> </mrow> </mtd> <mtd> <mrow> <mn>0.5</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>1.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>5.0</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>10.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>20.0</mn> </mrow> </mtd> </mtr> </mtable> </mfenced> </semantics></math> m.</p>
Full article ">Figure 11
<p>Comparing SWEET to Mitsuba Monte Carlo code for luminance for luminance for a set of distances and optical coeficients (<math display="inline"><semantics> <mi>κ</mi> </semantics></math> &amp; <math display="inline"><semantics> <mi>σ</mi> </semantics></math>). Anisotropy phase parameter (g) equals to 0.9, for the case of rectangular light source. Albedo <math display="inline"><semantics> <mrow> <mo>=</mo> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mi>σ</mi> <mrow> <mi>σ</mi> <mo>+</mo> <mi>κ</mi> </mrow> </mfrac> </mstyle> </mrow> </semantics></math>, where <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mi>σ</mi> <mo>+</mo> <mi>κ</mi> </mrow> </semantics></math> is taken equal to 1.0, each figure corresponds to a distance as in this matrix: <math display="inline"><semantics> <mfenced open="[" close="]"> <mtable> <mtr> <mtd> <mrow> <mn>0.001</mn> </mrow> </mtd> <mtd> <mrow> <mn>0.5</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>1.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>5.0</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>10.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>20.0</mn> </mrow> </mtd> </mtr> </mtable> </mfenced> </semantics></math> m.</p>
Full article ">Figure 12
<p>SWEET relative 95% confidence interval for a set of distances and optical coeficients (<math display="inline"><semantics> <mi>κ</mi> </semantics></math> &amp; <math display="inline"><semantics> <mi>σ</mi> </semantics></math>). Anisotropy phase parameter (g) equals to 0.0, for the case of rectangular light source. Albedo <math display="inline"><semantics> <mrow> <mo>=</mo> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mi>σ</mi> <mrow> <mi>σ</mi> <mo>+</mo> <mi>κ</mi> </mrow> </mfrac> </mstyle> </mrow> </semantics></math>, where <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mi>σ</mi> <mo>+</mo> <mi>κ</mi> </mrow> </semantics></math> is taken equal to 1.0, each figure corresponds to a distance as in this matrix: <math display="inline"><semantics> <mfenced open="[" close="]"> <mtable> <mtr> <mtd> <mrow> <mn>0.001</mn> </mrow> </mtd> <mtd> <mrow> <mn>0.5</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>1.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>5.0</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>10.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>20.0</mn> </mrow> </mtd> </mtr> </mtable> </mfenced> </semantics></math> m.</p>
Full article ">Figure 13
<p>SWEET relative 95% confidence interval for luminance for a set of distances and optical coeficients (<math display="inline"><semantics> <mi>κ</mi> </semantics></math> &amp; <math display="inline"><semantics> <mi>σ</mi> </semantics></math>). Anisotropy phase parameter (g) equals to 0.9, for the case of rectangular light source. Albedo <math display="inline"><semantics> <mrow> <mo>=</mo> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mi>σ</mi> <mrow> <mi>σ</mi> <mo>+</mo> <mi>κ</mi> </mrow> </mfrac> </mstyle> </mrow> </semantics></math>, where <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mi>σ</mi> <mo>+</mo> <mi>κ</mi> </mrow> </semantics></math> is taken equal to 1.0, each figure corresponds to a distance as in this matrix: <math display="inline"><semantics> <mfenced open="[" close="]"> <mtable> <mtr> <mtd> <mrow> <mn>0.001</mn> </mrow> </mtd> <mtd> <mrow> <mn>0.5</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>1.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>5.0</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>10.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>20.0</mn> </mrow> </mtd> </mtr> </mtable> </mfenced> </semantics></math> m.</p>
Full article ">Figure 14
<p>Comparing SWEET to Mitsuba Monte Carlo code for luminance for a set of distances and optical coeficients (<math display="inline"><semantics> <mi>κ</mi> </semantics></math> &amp; <math display="inline"><semantics> <mi>σ</mi> </semantics></math>). Anisotropy phase parameter (g) equals to 0.0, for the case of 2 point lights. Albedo <math display="inline"><semantics> <mrow> <mo>=</mo> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mi>σ</mi> <mrow> <mi>σ</mi> <mo>+</mo> <mi>κ</mi> </mrow> </mfrac> </mstyle> </mrow> </semantics></math>, where <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mi>σ</mi> <mo>+</mo> <mi>κ</mi> </mrow> </semantics></math> is taken equal to 1.0, each figure corresponds to a distance as in this matrix: <math display="inline"><semantics> <mfenced open="[" close="]"> <mtable> <mtr> <mtd> <mrow> <mn>0.001</mn> </mrow> </mtd> <mtd> <mrow> <mn>0.5</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>1.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>5.0</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>10.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>20.0</mn> </mrow> </mtd> </mtr> </mtable> </mfenced> </semantics></math> m.</p>
Full article ">Figure 15
<p>SWEET relative 95% confidence interval for luminance for a set of distances and optical coeficients (<math display="inline"><semantics> <mi>κ</mi> </semantics></math> &amp; <math display="inline"><semantics> <mi>σ</mi> </semantics></math>). Anisotropy phase parameter (g) equals to 0.0, for the case of 2 point lights. Albedo <math display="inline"><semantics> <mrow> <mo>=</mo> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mi>σ</mi> <mrow> <mi>σ</mi> <mo>+</mo> <mi>κ</mi> </mrow> </mfrac> </mstyle> </mrow> </semantics></math>, where <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mi>σ</mi> <mo>+</mo> <mi>κ</mi> </mrow> </semantics></math> is taken equal to 1.0, each figure corresponds to a distance as in this matrix: <math display="inline"><semantics> <mfenced open="[" close="]"> <mtable> <mtr> <mtd> <mrow> <mn>0.001</mn> </mrow> </mtd> <mtd> <mrow> <mn>0.5</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>1.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>5.0</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>10.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>20.0</mn> </mrow> </mtd> </mtr> </mtable> </mfenced> </semantics></math> m.</p>
Full article ">Figure 16
<p>Mean path length estimated with SWEET and theorically (using IP) for cubes with side lengths ranging from 10 cm to 5 m, for asymetry phase parameter: g = 0.0, optical parameters: <math display="inline"><semantics> <mi>σ</mi> </semantics></math> = 1.0 m<sup>−1</sup> and <math display="inline"><semantics> <mi>κ</mi> </semantics></math> = 0.0 m<sup>−1</sup>.</p>
Full article ">Figure 17
<p>Mean path length estimated with SWEET and theorically (using IP) for spheres with radii ranging from 10 cm to 5 m, for asymetry phase parameter: g = 0.0, optical parameters: <math display="inline"><semantics> <mi>σ</mi> </semantics></math> = 1.0 m<sup>−1</sup> and <math display="inline"><semantics> <mi>κ</mi> </semantics></math> = 0.0 m<sup>−1</sup>.</p>
Full article ">Figure 18
<p>Relative errors of invariance property (IP) estimated with SWEET and theorically for cubes with side lengths ranging from 10 cm to 4 m, for varying asymetry phase parameter g and optical parameters (<math display="inline"><semantics> <mi>σ</mi> </semantics></math> and <math display="inline"><semantics> <mi>κ</mi> </semantics></math>), each figure corresponds to a side length as in this matrix: <math display="inline"><semantics> <mfenced open="[" close="]"> <mtable> <mtr> <mtd> <mrow> <mn>0.1</mn> </mrow> </mtd> <mtd> <mrow> <mn>0.5</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>1.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>2.0</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>3.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>4.0</mn> </mrow> </mtd> </mtr> </mtable> </mfenced> </semantics></math> m.</p>
Full article ">Figure 19
<p>Relative errors of invariance property (IP) estimated with SWEET and theorically for spheres with radii ranging from 10 cm to 4 m, for varying asymetry phase parameter g and optical parameters (<math display="inline"><semantics> <mi>σ</mi> </semantics></math> and <math display="inline"><semantics> <mi>κ</mi> </semantics></math>), each figure corresponds to a radius as in this matrix: <math display="inline"><semantics> <mfenced open="[" close="]"> <mtable> <mtr> <mtd> <mrow> <mn>0.1</mn> </mrow> </mtd> <mtd> <mrow> <mn>0.5</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>1.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>2.0</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>3.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>4.0</mn> </mrow> </mtd> </mtr> </mtable> </mfenced> </semantics></math> m.</p>
Full article ">Figure 20
<p>Execution time for SWEET and MS done with <math display="inline"><semantics> <msup> <mn>10</mn> <mn>6</mn> </msup> </semantics></math> photons in luminance computing, for asymetry phase parameter: g = 0.9, optical parameters: <math display="inline"><semantics> <mi>σ</mi> </semantics></math> = 0.99 m<sup>−1</sup> and <math display="inline"><semantics> <mi>κ</mi> </semantics></math> = 0.01 m<sup>−1</sup>.</p>
Full article ">Figure 21
<p>Execution time variation with the photon count for SWEET and MS for a single luminance computing (<math display="inline"><semantics> <mrow> <mi>L</mi> <mo>(</mo> <mi>θ</mi> <mo>=</mo> <mn>0</mn> <mi>°</mi> <mo>)</mo> </mrow> </semantics></math>), for asymetry phase parameter: g = 0.9, optical parameters: <math display="inline"><semantics> <mi>σ</mi> </semantics></math> = 0.99 m<sup>−1</sup> and <math display="inline"><semantics> <mi>κ</mi> </semantics></math> = 0.01 m<sup>−1</sup>.</p>
Full article ">Figure 22
<p>Execution time variation for SWEET for luminance computing, done with <math display="inline"><semantics> <msup> <mn>10</mn> <mn>6</mn> </msup> </semantics></math> photons for varying optical parameters, for asymetry phase parameter: g = 0.9, each figure corresponds to a distance as in this matrix: <math display="inline"><semantics> <mfenced open="[" close="]"> <mtable> <mtr> <mtd> <mrow> <mn>0.001</mn> </mrow> </mtd> <mtd> <mrow> <mn>0.5</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>1.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>5.0</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>10.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>20.0</mn> </mrow> </mtd> </mtr> </mtable> </mfenced> </semantics></math> m.</p>
Full article ">Figure 23
<p>Execution time variation for MS for luminance computing, done with <math display="inline"><semantics> <msup> <mn>10</mn> <mn>6</mn> </msup> </semantics></math> photons for varying optical parameters, for asymetry phase parameter: g = 0.9, each figure corresponds to a distance as in this matrix: <math display="inline"><semantics> <mfenced open="[" close="]"> <mtable> <mtr> <mtd> <mrow> <mn>0.001</mn> </mrow> </mtd> <mtd> <mrow> <mn>0.5</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>1.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>5.0</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>10.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>20.0</mn> </mrow> </mtd> </mtr> </mtable> </mfenced> </semantics></math> m.</p>
Full article ">Figure 24
<p>Execution time ratio variation of MS to SWEET for luminance computing, done with <math display="inline"><semantics> <msup> <mn>10</mn> <mn>6</mn> </msup> </semantics></math> photons for varying optical parameters, for asymetry phase parameter: g = 0.9, each figure corresponds to a distance as in this matrix: <math display="inline"><semantics> <mfenced open="[" close="]"> <mtable> <mtr> <mtd> <mrow> <mn>0.001</mn> </mrow> </mtd> <mtd> <mrow> <mn>0.5</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>1.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>5.0</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>10.0</mn> </mrow> </mtd> <mtd> <mrow> <mn>20.0</mn> </mrow> </mtd> </mtr> </mtable> </mfenced> </semantics></math> m.</p>
Full article ">
16 pages, 20362 KiB  
Article
IngredSAM: Open-World Food Ingredient Segmentation via a Single Image Prompt
by Leyi Chen, Bowen Wang and Jiaxin Zhang
J. Imaging 2024, 10(12), 305; https://doi.org/10.3390/jimaging10120305 - 26 Nov 2024
Viewed by 776
Abstract
Food semantic segmentation is of great significance in the field of computer vision and artificial intelligence, especially in the application of food image analysis. Due to the complexity and variety of food, it is difficult to effectively handle this task using supervised methods. [...] Read more.
Food semantic segmentation is of great significance in the field of computer vision and artificial intelligence, especially in the application of food image analysis. Due to the complexity and variety of food, it is difficult to effectively handle this task using supervised methods. Thus, we introduce IngredSAM, a novel approach for open-world food ingredient semantic segmentation, extending the capabilities of the Segment Anything Model (SAM). Utilizing visual foundation models (VFMs) and prompt engineering, IngredSAM leverages discriminative and matchable semantic features between a single clean image prompt of specific ingredients and open-world images to guide the generation of accurate segmentation masks in real-world scenarios. This method addresses the challenges of traditional supervised models in dealing with the diverse appearances and class imbalances of food ingredients. Our framework demonstrates significant advancements in the segmentation of food ingredients without any training process, achieving 2.85% and 6.01% better performance than previous state-of-the-art methods on both FoodSeg103 and UECFoodPix datasets. IngredSAM exemplifies a successful application of one-shot, open-world segmentation, paving the way for downstream applications such as enhancements in nutritional analysis and consumer dietary trend monitoring. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

Figure 1
<p>The prompt image provides the segmentation target, while the Visual Foundation Models generate point prompts for the open-world image.</p>
Full article ">Figure 2
<p>Pipeline of SAM. Mask, points, box, and text are four types of prompts.</p>
Full article ">Figure 3
<p>IngredSAM Architecture: Our model is divided into three stages: Feature Aggregation, Feature Processing, and Prompt Generation. The final stage outputs point prompts used to prompt SAM to generate reasonable masks for the open-world image.</p>
Full article ">Figure 4
<p>UECFoodPix Complete and FoodSeg103 datasets samples: it can be observed that the UECFoodPix Complete dataset does not provide detailed annotations for food ingredients, whereas FoodSeg103 includes detailed annotated masks for all ingredients.</p>
Full article ">Figure 5
<p>IngredSAM Segmentation Visualization Results: It can be seen that the food ingredients represented by the prompt image are completely segmented in the open-world image.</p>
Full article ">Figure 6
<p>Visualization of the effectiveness of using a background filtering algorithm.</p>
Full article ">
Previous Issue
Next Issue
Back to TopTop