[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Next Issue
Volume 9, December
Previous Issue
Volume 9, October
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 

J. Imaging, Volume 9, Issue 11 (November 2023) – 21 articles

Cover Story (view full-size image): Chest radiography (CXR) is the most frequently performed radiological test worldwide because of its wide availability, non-invasive nature, and low cost. Considering the sustained increase in the incidence of cardiovascular diseases, it is critical to find accessible, fast, and reproducible tests to help diagnose these frequent conditions. AI-analyzed CXRs could be utilized in the future as a complimentary, easy-to-apply technology to improve diagnosis and risk stratification for cardiovascular diseases. Such advances will likely help better target more advanced investigations, which may reduce the burden of testing in some cases, as well as better identify higher-risk patients who would benefit from earlier, dedicated, and comprehensive cardiovascular evaluation. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
15 pages, 14472 KiB  
Article
Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture
by Petr Strakos, Milan Jaros, Lubomir Riha and Tomas Kozubek
J. Imaging 2023, 9(11), 254; https://doi.org/10.3390/jimaging9110254 - 20 Nov 2023
Viewed by 1643
Abstract
This paper presents a parallel implementation of a non-local transform-domain filter (BM4D). The effectiveness of the parallel implementation is demonstrated by denoising image series from computed tomography (CT) and magnetic resonance imaging (MRI). The basic idea of the filter is based on grouping [...] Read more.
This paper presents a parallel implementation of a non-local transform-domain filter (BM4D). The effectiveness of the parallel implementation is demonstrated by denoising image series from computed tomography (CT) and magnetic resonance imaging (MRI). The basic idea of the filter is based on grouping and filtering similar data within the image. Due to the high level of similarity and data redundancy, the filter can provide even better denoising quality than current extensively used approaches based on deep learning (DL). In BM4D, cubes of voxels named patches are the essential image elements for filtering. Using voxels instead of pixels means that the area for searching similar patches is large. Because of this and the application of multi-dimensional transformations, the computation time of the filter is exceptionally long. The original implementation of BM4D is only single-threaded. We provide a parallel version of the filter that supports multi-core and many-core processors and scales on such versatile hardware resources, typical for high-performance computing clusters, even if they are concurrently used for the task. Our algorithm uses hybrid parallelisation that combines open multi-processing (OpenMP) and message passing interface (MPI) technologies and provides up to 283× speedup, which is a 99.65% reduction in processing time compared to the sequential version of the algorithm. In denoising quality, the method performs considerably better than recent DL methods on the data type that these methods have yet to be trained on. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

Figure 1
<p>Volume rendering of MRI dataset from BrainWeb; (<b>left</b>)—the dataset corrupted by 25% Gaussian noise; (<b>right</b>)—the dataset without the noise; identical setup of a transfer function in shader properties for both data.</p>
Full article ">Figure 2
<p>Workflow of the collaborative filtering method graphically describing the two subsequent steps—hard thresholding and Wiener filtering.</p>
Full article ">Figure 3
<p>Sorting areas (large rectangles) around the reference patches (small rectangles) into groups with non-overlapping areas. One-dimensional example (X direction).</p>
Full article ">Figure 4
<p>Concept of server and clients used for parallelisation of BM4D algorithm on multiple CPU nodes of Anselm (<b>left</b>) and multiple MIC nodes of HLRN’s test system (<b>right</b>).</p>
Full article ">Figure 5
<p>Concept of server and clients used for parallelisation of BM4D algorithm on multiple CPU/MIC nodes of Salomon.</p>
Full article ">Figure 6
<p>MRI volume of a brain (BrainWeb). (<b>top left</b>)—Original noise-less data. (<b>top right</b>)—Original data corrupted by 25% Gaussian noise. (<b>bottom left</b>)—Filtered image data by BM4D.</p>
Full article ">Figure 7
<p>Real CT data of a patient—series of 170 images. (<b>left</b>)—Original noisy data. (<b>right</b>)—Filtered image data by BM4D.</p>
Full article ">Figure 8
<p>Real CT data of a patient—series of 779 images. (<b>left</b>)—Original noisy data. (<b>right</b>)—Filtered image data by BM4D.</p>
Full article ">Figure 9
<p>Comparison between original BM4D [<a href="#B16-jimaging-09-00254" class="html-bibr">16</a>] and our parallel implementation in terms of total runtime and achieved speed-up. Results shown on all three data sets.</p>
Full article ">Figure 10
<p>Strong scalability on different architectures <b>without</b> communication. Results on the real CT data of size 512 × 512 × 779 voxels.</p>
Full article ">Figure 11
<p>Strong scalability on different architectures <b>with</b> communication. Results on the real CT data of size 512 × 512 × 779 voxels.</p>
Full article ">Figure 12
<p>Visual comparison of different denoising methods applied on BrainWeb dataset (181 × 181 × 181 voxels). Image from axial slice 56 is used for comparison. (<b>top left</b>)—Original noise-less data. (<b>top right</b>)—Original data corrupted by 25% Gaussian noise. (<b>bottom left</b>)—RED-CNN. (<b>bottom middle</b>)—OIDN. (<b>bottom right</b>)— BM4D.</p>
Full article ">Figure 13
<p>Application of volume rendering on the outputs of different denoising methods. BrainWeb dataset (181 × 181 × 181 voxels). Coronal view on the datastack from slice 92 to 181. (<b>top left</b>)—Original noise-less data. (<b>top right</b>)—Original data corrupted by 25% Gaussian noise. (<b>bottom left</b>)—RED-CNN. (<b>bottom middle</b>)—OIDN. (<b>bottom right</b>)— BM4D.</p>
Full article ">
21 pages, 6929 KiB  
Article
Arteriovenous Length Ratio: A Novel Method for Evaluating Retinal Vasculature Morphology and Its Diagnostic Potential in Eye-Related Diseases
by Sufian A. Badawi, Maen Takruri, Mohammad Al-Hattab, Ghaleb Aldoboni, Djamel Guessoum, Isam ElBadawi, Mohamed Aichouni, Imran Ali Chaudhry, Nasrullah Mahar and Ajay Kamath Nileshwar
J. Imaging 2023, 9(11), 253; https://doi.org/10.3390/jimaging9110253 - 20 Nov 2023
Viewed by 2305
Abstract
Retinal imaging is a non-invasive technique used to scan the back of the eye, enabling the extraction of potential biomarkers like the artery and vein ratio (AVR). This ratio is known for its association with various diseases, such as hypertensive retinopathy (HR) or [...] Read more.
Retinal imaging is a non-invasive technique used to scan the back of the eye, enabling the extraction of potential biomarkers like the artery and vein ratio (AVR). This ratio is known for its association with various diseases, such as hypertensive retinopathy (HR) or diabetic retinopathy, and is crucial in assessing retinal health. HR refers to the morphological changes in retinal vessels caused by persistent high blood pressure. Timely identification of these alterations is crucial for preventing blindness and reducing the risk of stroke-related fatalities. The main objective of this paper is to propose a new method for assessing one of the morphological changes in the fundus through morphometric analysis of retinal images. The proposed method in this paper introduces a novel approach called the arteriovenous length ratio (AVLR), which has not been utilized in previous studies. Unlike commonly used measures such as the arteriovenous width ratio or tortuosity, AVLR focuses on assessing the relative length of arteries and veins in the retinal vasculature. The initial step involves segmenting the retinal blood vessels and distinguishing between arteries and veins; AVLR is calculated based on artery and vein caliber measurements for both eyes. Nine equations are used, and the length of both arteries and veins is measured in the region of interest (ROI) covering the optic disc for each eye. Using the AV-Classification dataset, the efficiency of the iterative AVLR assessment is evalutaed. The results show that the proposed approach performs better than the existing methods. By introducing AVLR as a diagnostic feature, this paper contributes to advancing retinal imaging analysis. It provides a valuable tool for the timely diagnosis of HR and other eye-related conditions and represents a novel diagnostic-feature-based method that can be integrated to serve as a clinical decision support system. Full article
(This article belongs to the Special Issue Advances in Retinal Image Processing)
Show Figures

Figure 1

Figure 1
<p>The dataset for AV-Classification utilized in this paper.</p>
Full article ">Figure 2
<p>The process of measuring the proposed AVLR metric (green denotes fully automated, and orange denotes the semi-automated steps).</p>
Full article ">Figure 3
<p>Illustration of the proposed AVLR calculation steps: (<b>a</b>) a representation of the fundus image where the optic disc is localized, (<b>b</b>) ROI segmentation, (<b>c</b>) extraction of blood vessels, (<b>d</b>) artery–vein classification (<b>e</b>) conversion to a skeletonized form, (<b>f</b>) segmentation of vessel segments into fragments where the computation for the nine metrics happens for each vessel segment; all those figures are the input to compute the AVLR for each retinal image.</p>
Full article ">Figure 4
<p>ERD diagram of the image-level and segment-level feature sets.</p>
Full article ">Figure 5
<p>Optic disc localization illustration: (<b>a</b>) OD in the fundus image (<b>b</b>) converted to grayscale (<b>c</b>) and monochrome (<b>d</b>) the image annotated with the OD center and radius.</p>
Full article ">Figure 6
<p>ROI segmentation: (<b>a</b>) Ring cut from 2ROD to 3ROD; (<b>b</b>) annotated original image (<b>c</b>); ring cut from 2ROD to 5ROD.</p>
Full article ">Figure 7
<p>Fundus image and vessel segmentation for the two ROI areas with improved accuracy using the trainable B-COSFIRE filter and optimized parameters.</p>
Full article ">Figure 8
<p>Artery–vein classification and segmentation of ROI of the segmented ring from 2ROD to 3ROD.</p>
Full article ">Figure 9
<p>Artery–vein classification and segmentation of ROI of the segmented ring from 2ROD to 5ROD.</p>
Full article ">Figure 10
<p>Illustration of improving the extraction of vessel segments by removing spurs after skeletonization and branch points removal.</p>
Full article ">Figure 11
<p>View of the enhanced vessel segments before and after the enhancement.</p>
Full article ">Figure 12
<p>Illustration of the vessel segments that have been extracted, along with the calculation of their respective lengths.</p>
Full article ">Figure 13
<p>A healthy fundus image showcasing balanced vasculature with metric values: AVLR_DM (1.013), AVLR_SOAM (1.15), AVLR_ICMN (1.055), AVLR_ICMB (1.004), and AVLR_LC (0.977), signifying normalcy as the arteries’ metric values closely match the mean of the veins.</p>
Full article ">Figure 14
<p>A fundus image displaying atypically elongated arteries and normal veins with metric values: AVLR_DM (7.194), AVLR_SOAM (0.843), AVLR_ICMN (29.811), AVLR_ICMB (13.785), and AVLR_LC (8.579).</p>
Full article ">Figure 15
<p>A fundus image displaying atypically elongated arteries and normal veins with metric values: AVLR_DM (0.424), AVLR_SOAM (0.942), AVLR_ICMN (0.27), AVLR_ICMB (0.615), and AVLR_LC (0.487).</p>
Full article ">Figure 16
<p>Comparison of AVLR between the right and left eyes. (<b>a</b>) Metric values for Image 1: AVLR_DM (0.102), AVLR_SOAM (1.097), AVLR_ICMN (0.042), AVLR_ICMB (0.102), AVLR_LC (0.136). (<b>b</b>) Metric values for Image 2: AVLR_DM (2.186), AVLR_SOAM (0.789), AVLR_ICMN (3.626), AVLR_ICMB (1.331), AVLR_LC (1.699).</p>
Full article ">Figure 17
<p>AVLR 2-3ROD Imagewise.</p>
Full article ">Figure 18
<p>AVLR 2-3ROD Segmentwise.</p>
Full article ">Figure 19
<p>AVLR 2-5ROD Imagewise.</p>
Full article ">Figure 20
<p>AVLR 2-5ROD Segmentwise.</p>
Full article ">Figure 21
<p>The AVLR dataset extends prior RVM work with new metrics quantified using the proposed methodology.</p>
Full article ">
13 pages, 2464 KiB  
Article
Radiomics Texture Analysis of Bone Marrow Alterations in MRI Knee Examinations
by Spiros Kostopoulos, Nada Boci, Dionisis Cavouras, Antonios Tsagkalis, Maria Papaioannou, Alexandra Tsikrika, Dimitris Glotsos, Pantelis Asvestas and Eleftherios Lavdas
J. Imaging 2023, 9(11), 252; https://doi.org/10.3390/jimaging9110252 - 20 Nov 2023
Cited by 2 | Viewed by 2098
Abstract
Accurate diagnosis and timely intervention are key to addressing common knee conditions effectively. In this work, we aim to identify textural changes in knee lesions based on bone marrow edema (BME), injury (INJ), and osteoarthritis (OST). One hundred and twenty-one MRI knee examinations [...] Read more.
Accurate diagnosis and timely intervention are key to addressing common knee conditions effectively. In this work, we aim to identify textural changes in knee lesions based on bone marrow edema (BME), injury (INJ), and osteoarthritis (OST). One hundred and twenty-one MRI knee examinations were selected. Cases were divided into three groups based on radiological findings: forty-one in the BME, thirty-seven in the INJ, and forty-three in the OST groups. From each ROI, eighty-one radiomic descriptors were calculated, encoding texture information. The results suggested differences in the texture characteristics of regions of interest (ROIs) extracted from PD-FSE and STIR sequences. We observed that the ROIs associated with BME exhibited greater local contrast and a wider range of structural diversity compared to the ROIs corresponding to OST. When it comes to STIR sequences, the ROIs related to BME showed higher uniformity in terms of both signal intensity and the variability of local structures compared to the INJ ROIs. A combined radiomic descriptor managed to achieve a high separation ability, with AUC of 0.93 ± 0.02 in the test set. Radiomics analysis may provide a non-invasive and quantitative means to assess the spatial distribution and heterogeneity of bone marrow edema, aiding in its early detection and characterization. Full article
(This article belongs to the Special Issue Advances in Image Analysis: Shapes, Textures and Multifractals)
Show Figures

Figure 1

Figure 1
<p>An instance of ROI delineation from bone marrow edema like lesion.</p>
Full article ">Figure 2
<p>Samples of region of interest for the three categories and two MRI sequences; bone marrow edema (first row), injured (middle row), and osteoarthritis (last row); PD-FSE (left column) and STIR (right column).</p>
Full article ">Figure 3
<p>Boxplots of the eleven features that present statistically significant differences amongst the groups (BME vs INJ vs OST) for the ROIs from PD-FSE sequence.</p>
Full article ">Figure 4
<p>Boxplots of the eleven features that present statistically significant differences amongst the groups (BME vs INJ vs OST) for the ROIs from STIR sequence.</p>
Full article ">Figure 5
<p>(<b>a</b>) ROC curves by the ensemble random forest classifier in test set; (<b>b</b>) boxplot for the predicted values of the five combined composite descriptor.</p>
Full article ">
20 pages, 5237 KiB  
Article
Digital Grading the Color Fastness to Rubbing of Fabrics Based on Spectral Reconstruction and BP Neural Network
by Jinxing Liang, Jing Zhou, Xinrong Hu, Hang Luo, Genyang Cao, Liu Liu and Kaida Xiao
J. Imaging 2023, 9(11), 251; https://doi.org/10.3390/jimaging9110251 - 16 Nov 2023
Cited by 2 | Viewed by 2268
Abstract
To digital grade the staining color fastness of fabrics after rubbing, an automatic grading method based on spectral reconstruction technology and BP neural network was proposed. Firstly, the modeling samples are prepared by rubbing the fabrics according to the ISO standard of 105-X12. [...] Read more.
To digital grade the staining color fastness of fabrics after rubbing, an automatic grading method based on spectral reconstruction technology and BP neural network was proposed. Firstly, the modeling samples are prepared by rubbing the fabrics according to the ISO standard of 105-X12. Then, to comply with visual rating standards for color fastness, the modeling samples are professionally graded to obtain the visual rating result. After that, a digital camera is used to capture digital images of the modeling samples inside a closed and uniform lighting box, and the color data values of the modeling samples are obtained through spectral reconstruction technology. Finally, the color fastness prediction model for rubbing was constructed using the modeling samples data and BP neural network. The color fastness level of the testing samples was predicted using the prediction model, and the prediction results were compared with the existing color difference conversion method and gray scale difference method based on the five-fold cross-validation strategy. Experiments show that the prediction model of fabric color fastness can be better constructed using the BP neural network. The overall performance of the method is better than the color difference conversion method and the gray scale difference method. It can be seen that the digital rating method of fabric staining color fastness to rubbing based on spectral reconstruction and BP neural network has high consistency with the visual evaluation, which will help for the automatic color fastness grading. Full article
(This article belongs to the Section Color, Multi-spectral, and Hyperspectral Imaging)
Show Figures

Figure 1

Figure 1
<p>The overall flowchart of the proposed color fastness digital grading method.</p>
Full article ">Figure 2
<p>(<b>a</b>) Left up: the geometric diagram of the uniformly illuminated light box; (<b>b</b>) right up: the real rendering effect of light box; (<b>c</b>) left down: the real product of light box; and (<b>d</b>) right down: the inner illumination uniformity over the imaging area over the platform.</p>
Full article ">Figure 3
<p>Digital images of the rubbed samples in the database; for each sample, the rubbed samples and their corresponding color-stained rubbing cloth are presented.</p>
Full article ">Figure 4
<p>The scene and geometric diagram of visual rating experiment settings.</p>
Full article ">Figure 5
<p>BP neural network structure diagram.</p>
Full article ">Figure 6
<p>Results of the paired-sample <span class="html-italic">t</span>-test between prediction results of color difference conversion method and visual rating results using 25 randomly selected fabric samples.</p>
Full article ">Figure 7
<p>The third-order polynomial curve to fit the relationship between gray scale difference in standard grading gray card and the corresponding color fastness grade.</p>
Full article ">Figure 8
<p>The color distribution of 70 samples in RGB space (<b>left</b>) and CIEXYZ color space (<b>right</b>).</p>
Full article ">Figure 9
<p>(<b>a</b>) The predicted error distribution of the proposed method tested with five-fold cross-validation; (<b>b</b>) predicted error histogram of proposed method.</p>
Full article ">Figure 10
<p>(<b>a</b>) The predicted error histogram of color difference conversion method; (<b>b</b>) the predicted error histogram of gray scale difference method.</p>
Full article ">Figure 11
<p>(<b>a</b>) The predicted error histogram of color difference conversion method tested with five-fold cross-validation; (<b>b</b>) the predicted error histogram of the optimized gray scale method tested with five-fold cross-validation.</p>
Full article ">Figure 12
<p>Boxplot of predicted error of proposed BP model, color difference conversion, and optimized gray scale difference method tested with five-fold cross-validation.</p>
Full article ">
21 pages, 3502 KiB  
Review
Aortic Valve Calcium Score by Computed Tomography as an Adjunct to Echocardiographic Assessment—A Review of Clinical Utility and Applications
by Isabel G. Scalia, Juan M. Farina, Ratnasari Padang, Clinton E. Jokerst, Milagros Pereyra, Ahmed K. Mahmoud, Tasneem Z. Naqvi, Chieh-Ju Chao, Jae K. Oh, Reza Arsanjani and Chadi Ayoub
J. Imaging 2023, 9(11), 250; https://doi.org/10.3390/jimaging9110250 - 15 Nov 2023
Cited by 1 | Viewed by 4475
Abstract
Aortic valve stenosis (AS) is increasing in prevalence due to the aging population, and severe AS is associated with significant morbidity and mortality. Echocardiography remains the mainstay for the initial detection and diagnosis of AS, as well as for grading of severity. However, [...] Read more.
Aortic valve stenosis (AS) is increasing in prevalence due to the aging population, and severe AS is associated with significant morbidity and mortality. Echocardiography remains the mainstay for the initial detection and diagnosis of AS, as well as for grading of severity. However, there are important subgroups of patients, for example, patients with low-flow low-gradient or paradoxical low-gradient AS, where quantification of severity of AS is challenging by echocardiography and underestimation of severity may delay appropriate management and impart a worse prognosis. Aortic valve calcium score by computed tomography has emerged as a useful clinical diagnostic test that is complimentary to echocardiography, particularly in cases where there may be conflicting data or clinical uncertainty about the degree of AS. In these situations, aortic valve calcium scoring may help re-stratify grading of severity and, therefore, further direct clinical management. This review presents the evolution of aortic valve calcium score by computed tomography, its diagnostic and prognostic value, as well as its utility in clinical care. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

Figure 1
<p>Stylized progression of aortic stenosis in a three-cusp aortic valve with listed areas representing cross-section of aortic valve opening (<b>top panel</b>). The bottom left image demonstrates the most common two morphologies in AS, bicuspid valve and three cusp AV with calcific degeneration, and the bottom right panel depicts secondary left ventricular hypertrophy due to severe aortic stenosis (<b>bottom panel</b>).</p>
Full article ">Figure 2
<p>A 69-year-old male with bicuspid aortic valve and shortness of breath on exertion only at high elevations. He had discordant echocardiographic parameters for severity of aortic stenosis, with a clinical echocardiogram report noting overall moderate–severe aortic valve stenosis: systolic mean Doppler gradient (MG) 37 mmHg (<b>A</b>), aortic valve area (AVA) by Doppler 1.06 cm<sup>2</sup> (<b>B</b>), dimensionless index 0.23, and normal indexed stroke volume (58 mL/m<sup>2</sup>). He proceeded to have an aortic valve calcium score (AVCS) by cardiac computed tomography ((<b>C</b>), red arrow) which demonstrated a score of 4568 AU, reclassifying aortic valve stenosis as severe. This scan also demonstrated calcification in the left anterior descending coronary artery ((<b>D</b>), yellow arrow).</p>
Full article ">Figure 3
<p>Flow-chart of classification of aortic stenosis (AS) grading based on echocardiographic measurements. Figure has been based off published work with permission from [<a href="#B24-jimaging-09-00250" class="html-bibr">24</a>]. Copyright 2017 Elsevier. Abbreviations; Aortic stenosis (AS); aortic valve area (AVA); mean gradient across aortic valve (MG); peak velocity across aortic valve (PAV); left ventricular (LV); left ventricular ejection fraction (LVEF); dobutamine stress echocardiography (DSE); stroke volume index (SVi); computed tomography (CT).</p>
Full article ">Figure 4
<p>A 62-year-old asymptomatic female sent for coronary artery calcium (CAC) score for risk stratification for statin therapy. The CAC score was 0, but significant aortic valve calcification was identified (<b>A</b>). AVCS was quantified as 660 AU. She was also incidentally noted to have mid ascending aorta dilation with diameter of 45 mm (<b>B</b>). Subsequent echocardiogram for further evaluation confirmed a bicuspid aortic valve with raphe between the left and non-coronary cusp, demonstrated by arrow (<b>C</b>), and overall moderate aortic stenosis with mean gradient (MG) of 22 mmHg and peak aortic velocity (PAV) of 3 m/s (<b>D</b>).</p>
Full article ">
14 pages, 2336 KiB  
Article
An Automatic Pixel-Wise Multi-Penalty Approach to Image Restoration
by Villiam Bortolotti, Germana Landi and Fabiana Zama
J. Imaging 2023, 9(11), 249; https://doi.org/10.3390/jimaging9110249 - 15 Nov 2023
Cited by 1 | Viewed by 1722
Abstract
This work tackles the problem of image restoration, a crucial task in many fields of applied sciences, focusing on removing degradation caused by blur and noise during the acquisition process. Drawing inspiration from the multi-penalty approach based on the Uniform Penalty principle, discussed [...] Read more.
This work tackles the problem of image restoration, a crucial task in many fields of applied sciences, focusing on removing degradation caused by blur and noise during the acquisition process. Drawing inspiration from the multi-penalty approach based on the Uniform Penalty principle, discussed in previous work, here we develop a new image restoration model and an iterative algorithm for its effective solution. The model incorporates pixel-wise regularization terms and establishes a rule for parameter selection, aiming to restore images through the solution of a sequence of constrained optimization problems. To achieve this, we present a modified version of the Newton Projection method, adapted to multi-penalty scenarios, and prove its convergence. Numerical experiments demonstrate the efficacy of the method in eliminating noise and blur while preserving the image edges. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
Show Figures

Figure 1

Figure 1
<p><tt>galaxy</tt> test problem: out-of-focus blur; <math display="inline"><semantics> <mrow> <mi>δ</mi> <mo>=</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>2</mn> </mrow> </msup> </mrow> </semantics></math>. <b>Top</b> row: original (<b>left</b>) and blurred (<b>right</b>) images. <b>Bottom</b> row: MULTI (<b>left</b>) and TGV (<b>right</b>) restorations.</p>
Full article ">Figure 2
<p><tt>mri</tt> test problem: out-of-focus blur; <math display="inline"><semantics> <mrow> <mi>δ</mi> <mo>=</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>2</mn> </mrow> </msup> </mrow> </semantics></math>. <b>Top</b> row: original (<b>left</b>) and blurred (<b>right</b>) images. <b>Bottom</b> row: MULTI (<b>left</b>) and TGV (<b>right</b>) restorations.</p>
Full article ">Figure 3
<p><tt>leopard</tt> test problem: out-of-focus blur; <math display="inline"><semantics> <mrow> <mi>δ</mi> <mo>=</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>2</mn> </mrow> </msup> </mrow> </semantics></math>. <b>Top</b> row: original (<b>left</b>) and blurred (<b>right</b>) images. <b>Bottom</b> row: MULTI (<b>left</b>) and TGV (<b>right</b>) restorations.</p>
Full article ">Figure 4
<p><tt>elaine</tt> test problem: out-of-focus blur; <math display="inline"><semantics> <mrow> <mi>δ</mi> <mo>=</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>2</mn> </mrow> </msup> </mrow> </semantics></math>. <b>Top</b> row: original (<b>left</b>) and blurred (<b>right</b>) images. <b>Bottom</b> row: MULTI (<b>left</b>) and TGV (<b>right</b>) restorations.</p>
Full article ">Figure 5
<p><tt>galaxy</tt> test problem: out-of-focus blur; <math display="inline"><semantics> <mrow> <mi>δ</mi> <mo>=</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>2</mn> </mrow> </msup> </mrow> </semantics></math>. A detail of the original image (<b>left</b>), MULTI restoration (<b>centre</b>), and TGV restoration (<b>right</b>). Red arrows highlight the different image features.</p>
Full article ">Figure 6
<p><tt>galaxy</tt> test problem: out-of-focus blur; <math display="inline"><semantics> <mrow> <mi>δ</mi> <mo>=</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>2</mn> </mrow> </msup> </mrow> </semantics></math>. A detail of the original image (<b>left</b>), MULTI restoration (<b>centre</b>), and TGV restoration (<b>right</b>). Red arrows highlight the different image features.</p>
Full article ">Figure 7
<p><tt>leopard</tt> test problem: out-of-focus blur; <math display="inline"><semantics> <mrow> <mi>δ</mi> <mo>=</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>2</mn> </mrow> </msup> </mrow> </semantics></math>. A detail of the original image (<b>left</b>), MULTI restoration (<b>centre</b>), and TGV restoration (<b>right</b>). Red arrows highlight the different image features.</p>
Full article ">Figure 8
<p><tt>elaine</tt> test problem: out-of-focus blur; <math display="inline"><semantics> <mrow> <mi>δ</mi> <mo>=</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>2</mn> </mrow> </msup> </mrow> </semantics></math>. A detail of the original image (<b>left</b>), MULTI restoration (<b>centre</b>), and TGV restoration (<b>right</b>). Red arrows highlight the different image features.</p>
Full article ">Figure 9
<p>Computed regularization parameters: out-of-focus blur, <math display="inline"><semantics> <mrow> <mi>δ</mi> <mo>=</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>2</mn> </mrow> </msup> </mrow> </semantics></math>.</p>
Full article ">Figure 10
<p><tt>Leopard</tt> test problem (out-of-focus blur, <math display="inline"><semantics> <mrow> <mi>δ</mi> <mo>=</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>2</mn> </mrow> </msup> </mrow> </semantics></math>). Top line: regularization parameters norm (<b>left</b>), relative error (<b>middle</b>), and residual norm (<b>right</b>) history for the multi-penalty model. Bottom line: objective function (<b>left</b>) and projected gradient norm history (<b>right</b>).</p>
Full article ">
20 pages, 3255 KiB  
Article
Explainable Connectionist-Temporal-Classification-Based Scene Text Recognition
by Rina Buoy, Masakazu Iwamura, Sovila Srun and Koichi Kise
J. Imaging 2023, 9(11), 248; https://doi.org/10.3390/jimaging9110248 - 15 Nov 2023
Cited by 1 | Viewed by 2455
Abstract
Connectionist temporal classification (CTC) is a favored decoder in scene text recognition (STR) for its simplicity and efficiency. However, most CTC-based methods utilize one-dimensional (1D) vector sequences, usually derived from a recurrent neural network (RNN) encoder. This results in the absence of explainable [...] Read more.
Connectionist temporal classification (CTC) is a favored decoder in scene text recognition (STR) for its simplicity and efficiency. However, most CTC-based methods utilize one-dimensional (1D) vector sequences, usually derived from a recurrent neural network (RNN) encoder. This results in the absence of explainable 2D spatial relationship between the predicted characters and corresponding image regions, essential for model explainability. On the other hand, 2D attention-based methods enhance recognition accuracy and offer character location information via cross-attention mechanisms, linking predictions to image regions. However, these methods are more computationally intensive, compared with the 1D CTC-based methods. To achieve both low latency and model explainability via character localization using a 1D CTC decoder, we propose a marginalization-based method that processes 2D feature maps and predicts a sequence of 2D joint probability distributions over the height and class dimensions. Based on the proposed method, we newly introduce an association map that aids in character localization and model prediction explanation. This map parallels the role of a cross-attention map, as seen in computationally-intensive attention-based architectures. With the proposed method, we consider a ViT-CTC STR architecture that uses a 1D CTC decoder and a pretrained vision Transformer (ViT) as a 2D feature extractor. Our ViT-CTC models were trained on synthetic data and fine-tuned on real labeled sets. These models outperform the recent state-of-the-art (SOTA) CTC-based methods on benchmarks in terms of recognition accuracy. Compared with the baseline Transformer-decoder-based models, our ViT-CTC models offer a speed boost up to 12 times regardless of the backbone, with a maximum 3.1% reduction in total word recognition accuracy. In addition, both qualitative and quantitative assessments of character locations estimated from the association map align closely with those from the cross-attention map and ground-truth character-level bounding boxes. Full article
(This article belongs to the Section Document Analysis and Processing)
Show Figures

Figure 1

Figure 1
<p>The cross-attention vs. the association maps. The first row consists of text images. The second and third rows consist of the cross-attention and association maps, respectively, that associate each predicted character with image regions. The last row consists of text transcriptions. The cross-attention map is obtained from a Transformer decoder, while the association map is obtained from a ViT-CTC model. Best viewed in color.</p>
Full article ">Figure 2
<p>The proposed marginalization-based method: A 2D feature sequence, <math display="inline"><semantics> <mrow> <mi mathvariant="bold-italic">F</mi> <mo>=</mo> <mo>(</mo> <msub> <mi mathvariant="bold-italic">F</mi> <mrow> <mn>1</mn> <mo>,</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <mo>…</mo> <mo>,</mo> <msub> <mi mathvariant="bold-italic">F</mi> <mrow> <msup> <mi>H</mi> <mo>′</mo> </msup> <mo>,</mo> <msup> <mi>W</mi> <mo>′</mo> </msup> </mrow> </msub> <mo>)</mo> </mrow> </semantics></math>, is produced by a 2D feature extractor such as a ViT backbone. <math display="inline"><semantics> <mi mathvariant="bold-italic">F</mi> </semantics></math> is fed to a linear layer to produce <math display="inline"><semantics> <mrow> <mi mathvariant="bold-italic">S</mi> <mo>=</mo> <mo>(</mo> <msub> <mi mathvariant="bold-italic">S</mi> <mrow> <mn>1</mn> <mo>,</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <mo>…</mo> <mo>,</mo> <msub> <mi mathvariant="bold-italic">S</mi> <mrow> <msup> <mi>H</mi> <mo>′</mo> </msup> <mo>,</mo> <msup> <mi>W</mi> <mo>′</mo> </msup> </mrow> </msub> <mo>)</mo> </mrow> </semantics></math> from which a softmax normalization is performed over both <math display="inline"><semantics> <msup> <mi>H</mi> <mo>′</mo> </msup> </semantics></math> and <span class="html-italic">C</span> dimensions. Next, the normalized <math display="inline"><semantics> <mrow> <mi mathvariant="bold-italic">U</mi> <mo>=</mo> <mo>(</mo> <msub> <mi mathvariant="bold-italic">U</mi> <mrow> <mn>1</mn> <mo>,</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <mo>…</mo> <mo>,</mo> <msub> <mi mathvariant="bold-italic">U</mi> <mrow> <msup> <mi>H</mi> <mo>′</mo> </msup> <mo>,</mo> <msup> <mi>W</mi> <mo>′</mo> </msup> </mrow> </msub> <mo>)</mo> </mrow> </semantics></math> is marginalized over the <math display="inline"><semantics> <msup> <mi>H</mi> <mo>′</mo> </msup> </semantics></math> dimension to produce <math display="inline"><semantics> <mrow> <mi mathvariant="bold-italic">P</mi> <mo>=</mo> <mo>(</mo> <msub> <mi mathvariant="bold-italic">P</mi> <mn>1</mn> </msub> <mo>,</mo> <mo>…</mo> <mo>,</mo> <msub> <mi mathvariant="bold-italic">P</mi> <msup> <mi>W</mi> <mo>′</mo> </msup> </msub> <mo>)</mo> </mrow> </semantics></math> that is fed to a CTC decoder. <span class="html-italic">D</span> and <span class="html-italic">C</span> are the feature and class dimensions, respectively.</p>
Full article ">Figure 3
<p>3D graphical illustration of <math display="inline"><semantics> <mi mathvariant="bold-italic">U</mi> </semantics></math> for an input image. (<b>a</b>) Input image. (<b>b</b>) The computed <math display="inline"><semantics> <mi mathvariant="bold-italic">U</mi> </semantics></math>. At <math display="inline"><semantics> <mrow> <msup> <mi>W</mi> <mo>′</mo> </msup> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>, the bright cells, responding to the character <span class="html-italic"><b>L</b></span>, have a high probability. Best viewed in color.</p>
Full article ">Figure 4
<p>The estimated character locations, <math display="inline"><semantics> <msub> <mi>R</mi> <mi>k</mi> </msub> </semantics></math>, from the association map. (<b>a</b>) Input image with ground-truth character bounding boxes, <math display="inline"><semantics> <msub> <mrow> <mi>G</mi> <mi>T</mi> </mrow> <mi>k</mi> </msub> </semantics></math>. (<b>b</b>) Estimated character regions. Best viewed in color.</p>
Full article ">Figure 5
<p>The estimated character locations, <math display="inline"><semantics> <msub> <mi>R</mi> <mi>k</mi> </msub> </semantics></math>, for the two predicted characters of the input image in <a href="#jimaging-09-00248-f004" class="html-fig">Figure 4</a>a, from the cross-attention maps. (<b>a</b>) Cross-attention maps. (<b>b</b>) Estimated character regions. Best viewed in color.</p>
Full article ">Figure 6
<p>Sample training and fine-tuning images. (<b>a</b>) Sample images from the training datasets. (<b>b</b>) Sample images from the fine-tuning datasets.</p>
Full article ">Figure 7
<p>Sample text images with character-level annotations.</p>
Full article ">Figure 8
<p>Inference time comparison between our ViT-CTC models and the Transformer-decoder-based models on an RTX 2060 GPU. Trendlines are projected to the maximum number of characters (i.e., 25) [<a href="#B1-jimaging-09-00248" class="html-bibr">1</a>]. Tr. Dec.: Transformer decoder. CTC-M: CTC decoder with the proposed method. CTC-FA: CTC decoder with feature averaging. Best viewed in color.</p>
Full article ">Figure 9
<p>Maximum inference time vs. recognition accuracy comparisons between the ViT-CTC models using the proposed method and the Transformer-decoder-based models on an RTX 2060 GPU. Tr. Dec.: Transformer decoder. Best viewed in color.</p>
Full article ">Figure 10
<p>Association maps for different values of <math display="inline"><semantics> <mi>α</mi> </semantics></math>. The color bars show image regions, corresponding to predicted characters. Best viewed in color.</p>
Full article ">Figure 11
<p>The average AEMs of the association and cross-attention maps as a function of <math display="inline"><semantics> <mi>α</mi> </semantics></math> and <math display="inline"><semantics> <mi>β</mi> </semantics></math>, respectively. Best viewed in color.</p>
Full article ">Figure 12
<p>Illustrations of the estimated character locations from the association (<math display="inline"><semantics> <mrow> <mi>α</mi> <mo>=</mo> <mn>0.8</mn> </mrow> </semantics></math>) and cross-attention (<math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mn>0.5</mn> </mrow> </semantics></math>) maps vs. the ground-truth character locations. Best viewed in color.</p>
Full article ">
13 pages, 1754 KiB  
Article
Breast Cancer Detection with an Ensemble of Deep Learning Networks Using a Consensus-Adaptive Weighting Method
by Mohammad Dehghan Rouzi, Behzad Moshiri, Mohammad Khoshnevisan, Mohammad Ali Akhaee, Farhang Jaryani, Samaneh Salehi Nasab and Myeounggon Lee
J. Imaging 2023, 9(11), 247; https://doi.org/10.3390/jimaging9110247 - 13 Nov 2023
Cited by 9 | Viewed by 4103
Abstract
Breast cancer’s high mortality rate is often linked to late diagnosis, with mammograms as key but sometimes limited tools in early detection. To enhance diagnostic accuracy and speed, this study introduces a novel computer-aided detection (CAD) ensemble system. This system incorporates advanced deep [...] Read more.
Breast cancer’s high mortality rate is often linked to late diagnosis, with mammograms as key but sometimes limited tools in early detection. To enhance diagnostic accuracy and speed, this study introduces a novel computer-aided detection (CAD) ensemble system. This system incorporates advanced deep learning networks—EfficientNet, Xception, MobileNetV2, InceptionV3, and Resnet50—integrated via our innovative consensus-adaptive weighting (CAW) method. This method permits the dynamic adjustment of multiple deep networks, bolstering the system’s detection capabilities. Our approach also addresses a major challenge in pixel-level data annotation of faster R-CNNs, highlighted in a prominent previous study. Evaluations on various datasets, including the cropped DDSM (Digital Database for Screening Mammography), DDSM, and INbreast, demonstrated the system’s superior performance. In particular, our CAD system showed marked improvement on the cropped DDSM dataset, enhancing detection rates by approximately 1.59% and achieving an accuracy of 95.48%. This innovative system represents a significant advancement in early breast cancer detection, offering the potential for more precise and timely diagnosis, ultimately fostering improved patient outcomes. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
Show Figures

Figure 1

Figure 1
<p>Flow chart of the CAD system. The grey area represents the proposed CAW model.</p>
Full article ">Figure 2
<p>A sample mammogram image from the DDSM dataset is on the left, with a lesion in the middle, and on the right, there is a cropped image of the same lesion with a size of 299 × 299 that is part of the cropped DDSM dataset.</p>
Full article ">Figure 3
<p>Sample augmented images from the INbreast dataset: (<b>A</b>) displays a random horizontal shift and a random vertical flip. (<b>B</b>) shows a random rotation, while (<b>C</b>) shows both a random rotation and a random vertical shift. Lastly, (<b>D</b>) displays the effect of shearing.</p>
Full article ">Figure 4
<p>Performance comparison of various DL models on breast cancer classification: evaluating F2 scores (%).</p>
Full article ">
10 pages, 910 KiB  
Article
OW-SLR: Overlapping Windows on Semi-Local Region for Image Super-Resolution
by Rishav Bhardwaj, Janarthanam Jothi Balaji and Vasudevan Lakshminarayanan
J. Imaging 2023, 9(11), 246; https://doi.org/10.3390/jimaging9110246 - 8 Nov 2023
Viewed by 1884
Abstract
There has been considerable progress in implicit neural representation to upscale an image to any arbitrary resolution. However, existing methods are based on defining a function to predict the Red, Green and Blue (RGB) value from just four specific loci. Relying on just [...] Read more.
There has been considerable progress in implicit neural representation to upscale an image to any arbitrary resolution. However, existing methods are based on defining a function to predict the Red, Green and Blue (RGB) value from just four specific loci. Relying on just four loci is insufficient as it leads to losing fine details from the neighboring region(s). We show that by taking into account the semi-local region leads to an improvement in performance. In this paper, we propose applying a new technique called Overlapping Windows on Semi-Local Region (OW-SLR) to an image to obtain any arbitrary resolution by taking the coordinates of the semi-local region around a point in the latent space. This extracted detail is used to predict the RGB value of a point. We illustrate the technique by applying the algorithm to the Optical Coherence Tomography-Angiography (OCT-A) images and show that it can upscale them to random resolution. This technique outperforms the existing state-of-the-art methods when applied to the OCT500 dataset. OW-SLR provides better results for classifying healthy and diseased retinal images such as diabetic retinopathy and normals from the given set of OCT-A images. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) An LR image is taken. (<b>b</b>) It is passed through EDSR [<a href="#B11-jimaging-09-00246" class="html-bibr">11</a>] and a feature map is produced. (<b>c</b>) Locating the semi-local region (M = 6) around a random selected point from HR image. (<b>d</b>) Semi-local region is passed through the proposed Overlapping Windows. (<b>e</b>) This output is passed through the MLP to give out the RGB value of a randomly selected point. Steps (<b>c</b>–<b>e</b>) are performed for all the points in the HR image.</p>
Full article ">Figure 2
<p>To extract features from a feature map of size 3 × 3, we focus on a specific query point represented by a red dot. In order to determine which pixel locations in the feature map correspond to this query point, we compute the Euclidean distance between the query point and the center points of each pixel location. In the provided image, the black line represents the closest pixel location in the feature map to the query point.</p>
Full article ">Figure 3
<p>(<b>a</b>) Given an HR image, a point of interest (red dot) is selected to predict its RGB value. (<b>b</b>) Its corresponding spatially equivalent 2D coordinate is selected from the feature map. (<b>c</b>) Locating the semi-local region (M = 6) around the calculated 2D coordinate.</p>
Full article ">Figure 4
<p>The first iteration of overlapping windows, where the window size = M − 1 (M = 6). Assuming the feature map is of negligible depth and four windows are positioned at the four corners of the feature map.</p>
Full article ">Figure 5
<p>A 96 × 96 patch is taken and its size is reduced to 24 × 24 (first row), 32 × 32 (second row) and 48 × 48 (third row) using bicubic interpolation. Our architecture uses the same set to weights reproduce the given results. However, others require different set of weights for a newer scale to be trained on. The PSNR results of each image are shown in <a href="#jimaging-09-00246-t001" class="html-table">Table 1</a>.</p>
Full article ">
11 pages, 2472 KiB  
Article
Synthetic Megavoltage Cone Beam Computed Tomography Image Generation for Improved Contouring Accuracy of Cardiac Pacemakers
by Hana Baroudi, Xinru Chen, Wenhua Cao, Mohammad D. El Basha, Skylar Gay, Mary Peters Gronberg, Soleil Hernandez, Kai Huang, Zaphanlene Kaffey, Adam D. Melancon, Raymond P. Mumme, Carlos Sjogreen, January Y. Tsai, Cenji Yu, Laurence E. Court, Ramiro Pino and Yao Zhao
J. Imaging 2023, 9(11), 245; https://doi.org/10.3390/jimaging9110245 - 8 Nov 2023
Viewed by 2101
Abstract
In this study, we aimed to enhance the contouring accuracy of cardiac pacemakers by improving their visualization using deep learning models to predict MV CBCT images based on kV CT or CBCT images. Ten pacemakers and four thorax phantoms were included, creating a [...] Read more.
In this study, we aimed to enhance the contouring accuracy of cardiac pacemakers by improving their visualization using deep learning models to predict MV CBCT images based on kV CT or CBCT images. Ten pacemakers and four thorax phantoms were included, creating a total of 35 combinations. Each combination was imaged on a Varian Halcyon (kV/MV CBCT images) and Siemens SOMATOM CT scanner (kV CT images). Two generative adversarial network (GAN)-based models, cycleGAN and conditional GAN (cGAN), were trained to generate synthetic MV (sMV) CBCT images from kV CT/CBCT images using twenty-eight datasets (80%). The pacemakers in the sMV CBCT images and original MV CBCT images were manually delineated and reviewed by three users. The Dice similarity coefficient (DSC), 95% Hausdorff distance (HD95), and mean surface distance (MSD) were used to compare contour accuracy. Visual inspection showed the improved visualization of pacemakers on sMV CBCT images compared to original kV CT/CBCT images. Moreover, cGAN demonstrated superior performance in enhancing pacemaker visualization compared to cycleGAN. The mean DSC, HD95, and MSD for contours on sMV CBCT images generated from kV CT/CBCT images were 0.91 ± 0.02/0.92 ± 0.01, 1.38 ± 0.31 mm/1.18 ± 0.20 mm, and 0.42 ± 0.07 mm/0.36 ± 0.06 mm using the cGAN model. Deep learning-based methods, specifically cycleGAN and cGAN, can effectively enhance the visualization of pacemakers in thorax kV CT/CBCT images, therefore improving the contouring precision of these devices. Full article
Show Figures

Figure 1

Figure 1
<p>Setups of each phantom: (<b>a</b>) CIRS, (<b>b</b>) 3D-printed phantom, and (<b>c</b>) Rando at the Halcyon table.</p>
Full article ">Figure 2
<p>Illustration of (<b>a</b>) the conditional GAN model: one generator and one discriminator and (<b>b</b>) the cycleGAN model: two generators and two discriminators. sMV CBCT: synthetic MV CBCT; sCT: synthetic CT.</p>
Full article ">Figure 3
<p>Visual inspection of the artifact correction in synthetic images. The synthetic images generated by different models were compared with the ground-truth real MV CBCT images. Axial views of (<b>a1</b>, <b>a2</b>) original CT or kV CBCT images, (<b>b1</b>, <b>b2</b>) synthetic images generated by cGAN models, (<b>c1</b>, <b>c2</b>) synthetic images generated by cycleGAN models, and (<b>d</b>) ground-truth MV CBCT image.</p>
Full article ">Figure 4
<p>Comparison between the contours on synthetic and real MV CBCT imagesRed lines indicate the pacemaker contours delineated and reviewed by three users. The Dice similarity coefficient (DSC) results comparing the contours between the original and synthetic images are included for reference.</p>
Full article ">Figure 5
<p>Workflow of the potential benefit of using GAN-based models to produce synthetic MV (sMV) CBCT images from planning CT images in a clinical setting.</p>
Full article ">
15 pages, 1596 KiB  
Article
Lesion Detection in Optical Coherence Tomography with Transformer-Enhanced Detector
by Hanya Ahmed, Qianni Zhang, Ferranti Wong, Robert Donnan and Akram Alomainy
J. Imaging 2023, 9(11), 244; https://doi.org/10.3390/jimaging9110244 - 7 Nov 2023
Viewed by 2169
Abstract
Optical coherence tomography (OCT) is an emerging imaging tool in healthcare with common applications in ophthalmology for the detection of retinal diseases and in dentistry for the early detection of tooth decay. Speckle noise is ubiquitous in OCT images, which can hinder diagnosis [...] Read more.
Optical coherence tomography (OCT) is an emerging imaging tool in healthcare with common applications in ophthalmology for the detection of retinal diseases and in dentistry for the early detection of tooth decay. Speckle noise is ubiquitous in OCT images, which can hinder diagnosis by clinicians. In this paper, a region-based, deep learning framework for the detection of anomalies is proposed for OCT-acquired images. The core of the framework is Transformer-Enhanced Detection (TED), which includes attention gates (AGs) to ensure focus is placed on the foreground while identifying and removing noise artifacts as anomalies. TED was designed to detect the different types of anomalies commonly present in OCT images for diagnostic purposes and thus aid clinical interpretation. Extensive quantitative evaluations were performed to measure the performance of TED against current, widely known, deep learning detection algorithms. Three different datasets were tested: two dental and one CT (hosting scans of lung nodules, livers, etc.). The results showed that the approach verifiably detected tooth decay and numerous lesions across two modalities, achieving superior performance compared to several well-known algorithms. The proposed method improved the accuracy of detection by 16–22% and the Intersection over Union (IOU) by 10% for both dentistry datasets. For the CT dataset, the performance metrics were similarly improved by 9% and 20%, respectively. Full article
Show Figures

Figure 1

Figure 1
<p>The architecture of the Transformer-Enhanced Detection (TED) framework. The denoised images are first augmented to create a larger dataset. The augmented images are then fed into the TED structure that contains a Vision Transformer (ViT) that is managed by the loss function during training.</p>
Full article ">Figure 2
<p>Training of Transformer-Enhanced Detection (TED) with the new proposed loss function, consisting of the combination of relevant detection evaluation metrics (MSE and IOU) between an input image and the predicted bounding box computed from TED. Here, the red and green boxes are the predicted and actual bounding boxes.</p>
Full article ">Figure 3
<p>Results from comparative study for dentistry dataset 1 where (<b>a</b>) is the Transformer-Enhanced Detection method (TED with <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>L</mi> <mi>o</mi> <mi>s</mi> <mi>s</mi> </mrow> </msub> </semantics></math>), (<b>b</b>) is the MSE loss, and (<b>c</b>) is the MAE loss. Green boxes are actual bounding boxes (<math display="inline"><semantics> <msub> <mi>y</mi> <mrow> <mi>a</mi> <mi>c</mi> <mi>t</mi> <mi>u</mi> <mi>a</mi> <mi>l</mi> </mrow> </msub> </semantics></math>), and red boxes are predicted bounding boxes (<math display="inline"><semantics> <msub> <mi>y</mi> <mrow> <mi>p</mi> <mi>r</mi> <mi>e</mi> <mi>d</mi> <mi>i</mi> <mi>c</mi> <mi>t</mi> <mi>e</mi> <mi>d</mi> </mrow> </msub> </semantics></math>).</p>
Full article ">Figure 4
<p>Results from comparative study for dentistry dataset 2 where (<b>a</b>) is the proposed method (TED with <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>L</mi> <mi>o</mi> <mi>s</mi> <mi>s</mi> </mrow> </msub> </semantics></math>), (<b>b</b>) is the MSE loss, and (<b>c</b>) is the MAE loss. Green boxes are actual bounding boxes (<math display="inline"><semantics> <msub> <mi>y</mi> <mrow> <mi>a</mi> <mi>c</mi> <mi>t</mi> <mi>u</mi> <mi>a</mi> <mi>l</mi> </mrow> </msub> </semantics></math>), and red boxes are predicted bounding boxes (<math display="inline"><semantics> <msub> <mi>y</mi> <mrow> <mi>p</mi> <mi>r</mi> <mi>e</mi> <mi>d</mi> <mi>i</mi> <mi>c</mi> <mi>t</mi> <mi>e</mi> <mi>d</mi> </mrow> </msub> </semantics></math>).</p>
Full article ">Figure 5
<p>Results from the comparative study for NIH DeepLesion where (<b>a</b>) is the Transformer-Enhanced Detection method (TED with <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>L</mi> <mi>o</mi> <mi>s</mi> <mi>s</mi> </mrow> </msub> </semantics></math>), (<b>b</b>) is the MSE loss, and (<b>c</b>) is the MAE loss. Green boxes are actual bounding boxes (<math display="inline"><semantics> <msub> <mi>y</mi> <mrow> <mi>a</mi> <mi>c</mi> <mi>t</mi> <mi>u</mi> <mi>a</mi> <mi>l</mi> </mrow> </msub> </semantics></math>), and red boxes are predicted bounding boxes (<math display="inline"><semantics> <msub> <mi>y</mi> <mrow> <mi>p</mi> <mi>r</mi> <mi>e</mi> <mi>d</mi> <mi>i</mi> <mi>c</mi> <mi>t</mi> <mi>e</mi> <mi>d</mi> </mrow> </msub> </semantics></math>).</p>
Full article ">Figure 6
<p>Results from the comparative study for dentistry datasets 1 and 2 where (<b>a</b>) is the Transformer-Enhanced Detection method (TED with AG), (<b>b</b>) is TED without AG for dataset 1, (<b>c</b>) is TED with AG, and (<b>d</b>) is TED without AG for dataset 2. Green boxes are actual bounding boxes (<math display="inline"><semantics> <msub> <mi>y</mi> <mrow> <mi>a</mi> <mi>c</mi> <mi>t</mi> <mi>u</mi> <mi>a</mi> <mi>l</mi> </mrow> </msub> </semantics></math>), and red boxes are predicted bounding boxes (<math display="inline"><semantics> <msub> <mi>y</mi> <mrow> <mi>p</mi> <mi>r</mi> <mi>e</mi> <mi>d</mi> <mi>i</mi> <mi>c</mi> <mi>t</mi> <mi>e</mi> <mi>d</mi> </mrow> </msub> </semantics></math>).</p>
Full article ">Figure 7
<p>Results from comparative study for dentistry dataset 1 where (<b>a</b>) TED with <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>L</mi> <mi>o</mi> <mi>s</mi> <mi>s</mi> </mrow> </msub> </semantics></math>, (<b>b</b>) YOLOv1,b29, (<b>c</b>) YOLOv3 [<a href="#B31-jimaging-09-00244" class="html-bibr">31</a>], (<b>d</b>) RCNN [<a href="#B32-jimaging-09-00244" class="html-bibr">32</a>]. Green boxes are actual bounding boxes (<math display="inline"><semantics> <msub> <mi>y</mi> <mrow> <mi>a</mi> <mi>c</mi> <mi>t</mi> <mi>u</mi> <mi>a</mi> <mi>l</mi> </mrow> </msub> </semantics></math>) and red boxes are predicted bounding boxes (<math display="inline"><semantics> <msub> <mi>y</mi> <mrow> <mi>p</mi> <mi>r</mi> <mi>e</mi> <mi>d</mi> <mi>i</mi> <mi>c</mi> <mi>t</mi> <mi>e</mi> <mi>d</mi> </mrow> </msub> </semantics></math>).</p>
Full article ">Figure 8
<p>Results from comparative study for dentistry dataset 2 where (<b>a</b>) TED with <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>L</mi> <mi>o</mi> <mi>s</mi> <mi>s</mi> </mrow> </msub> </semantics></math>, (<b>b</b>) YOLOv1 [<a href="#B30-jimaging-09-00244" class="html-bibr">30</a>], (<b>c</b>) YOLOv3 [<a href="#B31-jimaging-09-00244" class="html-bibr">31</a>], and (<b>d</b>) RCNN [<a href="#B32-jimaging-09-00244" class="html-bibr">32</a>]. Green boxes are actual bounding boxes (<math display="inline"><semantics> <msub> <mi>y</mi> <mrow> <mi>a</mi> <mi>c</mi> <mi>t</mi> <mi>u</mi> <mi>a</mi> <mi>l</mi> </mrow> </msub> </semantics></math>) and red boxes are predicted bounding boxes (<math display="inline"><semantics> <msub> <mi>y</mi> <mrow> <mi>p</mi> <mi>r</mi> <mi>e</mi> <mi>d</mi> <mi>i</mi> <mi>c</mi> <mi>t</mi> <mi>e</mi> <mi>d</mi> </mrow> </msub> </semantics></math>).</p>
Full article ">Figure 9
<p>Results from comparative study for NIH DeepLesion dataset where (<b>a</b>) TED with <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>L</mi> <mi>o</mi> <mi>s</mi> <mi>s</mi> </mrow> </msub> </semantics></math>, (<b>b</b>) YOLOv1 [<a href="#B30-jimaging-09-00244" class="html-bibr">30</a>], (<b>c</b>) YOLOv3 [<a href="#B31-jimaging-09-00244" class="html-bibr">31</a>], and (<b>d</b>) RCNN [<a href="#B32-jimaging-09-00244" class="html-bibr">32</a>]. Green boxes are actual bounding boxes (<math display="inline"><semantics> <msub> <mi>y</mi> <mrow> <mi>a</mi> <mi>c</mi> <mi>t</mi> <mi>u</mi> <mi>a</mi> <mi>l</mi> </mrow> </msub> </semantics></math>) and red boxes are predicted bounding boxes (<math display="inline"><semantics> <msub> <mi>y</mi> <mrow> <mi>p</mi> <mi>r</mi> <mi>e</mi> <mi>d</mi> <mi>i</mi> <mi>c</mi> <mi>t</mi> <mi>e</mi> <mi>d</mi> </mrow> </msub> </semantics></math>).</p>
Full article ">Figure 10
<p>Results from the comparative study for dentistry datasets. (<b>a</b>) is dataset 1 and (<b>b</b>) is dataset 2. Green boxes are actual bounding boxes (<math display="inline"><semantics> <msub> <mi>y</mi> <mrow> <mi>a</mi> <mi>c</mi> <mi>t</mi> <mi>u</mi> <mi>a</mi> <mi>l</mi> </mrow> </msub> </semantics></math>), blue boxes are the dentists’ predicted bounding boxes, and red boxes are predicted bounding boxes (<math display="inline"><semantics> <msub> <mi>y</mi> <mrow> <mi>p</mi> <mi>r</mi> <mi>e</mi> <mi>d</mi> <mi>i</mi> <mi>c</mi> <mi>t</mi> <mi>e</mi> <mi>d</mi> </mrow> </msub> </semantics></math>).</p>
Full article ">
20 pages, 4318 KiB  
Article
NeuroActivityToolkit—Toolbox for Quantitative Analysis of Miniature Fluorescent Microscopy Data
by Evgenii Gerasimov, Alexander Mitenev, Ekaterina Pchitskaya, Viacheslav Chukanov and Ilya Bezprozvanny
J. Imaging 2023, 9(11), 243; https://doi.org/10.3390/jimaging9110243 - 6 Nov 2023
Cited by 4 | Viewed by 2712
Abstract
The visualization of neuronal activity in vivo is an urgent task in modern neuroscience. It allows neurobiologists to obtain a large amount of information about neuronal network architecture and connections between neurons. The miniscope technique might help to determine changes that occurred in [...] Read more.
The visualization of neuronal activity in vivo is an urgent task in modern neuroscience. It allows neurobiologists to obtain a large amount of information about neuronal network architecture and connections between neurons. The miniscope technique might help to determine changes that occurred in the network due to external stimuli and various conditions: processes of learning, stress, epileptic seizures and neurodegenerative diseases. Furthermore, using the miniscope method, functional changes in the early stages of such disorders could be detected. The miniscope has become a modern approach for recording hundreds to thousands of neurons simultaneously in a certain brain area of a freely behaving animal. Nevertheless, the analysis and interpretation of the large recorded data is still a nontrivial task. There are a few well-working algorithms for miniscope data preprocessing and calcium trace extraction. However, software for further high-level quantitative analysis of neuronal calcium signals is not publicly available. NeuroActivityToolkit is a toolbox that provides diverse statistical metrics calculation, reflecting the neuronal network properties such as the number of neuronal activations per minute, amount of simultaneously co-active neurons, etc. In addition, the module for analyzing neuronal pairwise correlations is implemented. Moreover, one can visualize and characterize neuronal network states and detect changes in 2D coordinates using PCA analysis. This toolbox, which is deposited in a public software repository, is accompanied by a detailed tutorial and is highly valuable for the statistical interpretation of miniscope data in a wide range of experimental tasks. Full article
(This article belongs to the Special Issue Fluorescence Imaging and Analysis of Cellular System)
Show Figures

Figure 1

Figure 1
<p>Pipeline of data processing using NeuroActivityToolkit. (<b>A</b>) Schematic illustration of the mouse with mounted version 3 miniscope. (<b>B</b>) Fluorescence of neurons in CA1 hippocampal area expressing GCaMP6s recorded via miniscope. (<b>C</b>) Calcium traces obtained from calcium recording processed with the Minian (version 1.2.1). (<b>D</b>) Active state determination as a first step of NeuroActivityToolkit pipeline. (<b>E</b>) Quantification of the miniscope recorded data in NeuroActivityToolkit toolbox. (<b>F</b>) Fluorescence intensity trace of the calcium indicator for the single neuron in the recording. Single-neuron active state determined using spike method (<b>G</b>) and full method (<b>H</b>). Active state is shown in red, inactive in blue.</p>
Full article ">Figure 2
<p>Activation properties for the example recording. (<b>A</b>) Distribution of number of activations per minute for neurons from an example recording. (<b>B</b>) Number of activations per minute for the independent recording from the same mouse, acquired on the 3 different days (1–3). Data on the B graph are presented as a violin plot with median (continuous line) and quartiles (dotted line). ****: <span class="html-italic">p</span> &lt; 0.0001 (Kruskal–Wallis test with multiple comparisons using Dunn’s test).</p>
Full article ">Figure 3
<p>Neuronal network properties for the example recording. Distribution of network spike rate (<b>A</b>) and network spike peak (<b>B</b>) in the selected time interval of 3 s. Distribution for a single recording. (<b>C</b>) Distribution of network spike duration as a time when the amount of simultaneously active neurons was above the preset threshold value.</p>
Full article ">Figure 4
<p>Neuronal-activity correlation analysis using Pearson’s coefficient. (<b>A</b>) Distribution of Pearson’s correlation coefficient. (<b>B</b>) Correlation map of co-active neuronal pairs. Correlated neurons are linked with the line between them in the space of the neuronal network. Only correlations above a 0.3 threshold value are shown. Axis values are indicated in pixels. (<b>C</b>) Correlation heatmap for connected pairs of neurons, from the highly negatively correlated in blue color to highly positively correlated in red color. Neurons are labeled by unit_id number. (<b>D</b>) Correlation heatmap in binary representation, where correlation above 0.3 threshold value is shown in black, and lower in white. For (<b>C</b>,<b>D</b>), clusters of closely related pairs of neurons are highlighted by squares. (<b>E</b>) Dependence of Pearson’s coefficient of correlation on the threshold level for 3 recordings for the same mouse (signal method).</p>
Full article ">Figure 5
<p>Spatial position of neuronal co-active pairs and Pearson’s correlation coefficient. (<b>A</b>) The distance to neurons from the center of their mass in polar coordinates (Rho) for 3 independent recordings. (<b>B</b>) Dependence between the detected signal mean fluorescence and distance in polar coordinates for each neuron in the recordings for a single miniscope recording. (<b>C</b>) Dependence between the active-state ratio (active state of neuron duration/total recording duration) and distance in polar coordinates for each recorded neuron for a single miniscope recording. Distance between all correlated neuronal pairs, as Euclidean (<b>D</b>) and radial (<b>E</b>) distance correspondingly, for 3 independent recordings. All the data are presented as the median values, borders of the box plots are 1 and 3 quartiles, and all the errors are interquartile ranges. ns—there were no significant differences, **: <span class="html-italic">p</span> &lt; 0.01, ****: <span class="html-italic">p</span> &lt; 0.0001 (Kruskal–Wallis test with multiple comparisons using Dunn’s test).</p>
Full article ">Figure 6
<p>Distance between correlated neuronal pairs. Dependence between Euclidean distance (<b>A</b>) or radial distance (<b>B</b>) and Pearson’s coefficient for co-active neuronal pairs calculated using <span class="html-italic">active</span> (<span class="html-italic">spike</span>) method for 3 independent recordings.</p>
Full article ">Figure 7
<p>Shuffling module for variance estimation in the neuronal-activity data. (<b>A</b>) Presentation of the neuronal network in binarized form for original data (top) and shuffled with 1.0 ratio (bottom). (<b>B</b>) Pearson’s coefficient value for original and shuffled data. (<b>C</b>) Maximal amount of active neurons in 1 s (Network spike peak) for original and shuffled data. Data are presented as mean ± SEM; *: <span class="html-italic">p</span> &lt; 0.05, Student’s <span class="html-italic">t</span>-test.</p>
Full article ">Figure 8
<p>PCA dimensionality reduction method applied to obtained statistics. (<b>A</b>) Visualization of the results after applying the principal component method to reduce the dimension of the computed statistics for 3 recordings under the same experimental conditions (in green) and one recording in the state X. (<b>B</b>) Statistical metrics that make the greatest and least contributions to the PCA method.</p>
Full article ">
14 pages, 1760 KiB  
Article
Assessing Acetabular Index Angle in Infants: A Deep Learning-Based Novel Approach
by Farmanullah Jan, Atta Rahman, Roaa Busaleh, Haya Alwarthan, Samar Aljaser, Sukainah Al-Towailib, Safiyah Alshammari, Khadeejah Rasheed Alhindi, Asrar Almogbil, Dalal A. Bubshait and Mohammed Imran Basheer Ahmed
J. Imaging 2023, 9(11), 242; https://doi.org/10.3390/jimaging9110242 - 6 Nov 2023
Cited by 9 | Viewed by 3871
Abstract
Developmental dysplasia of the hip (DDH) is a disorder characterized by abnormal hip development that frequently manifests in infancy and early childhood. Preventing DDH from occurring relies on a timely and accurate diagnosis, which requires careful assessment by medical specialists during early X-ray [...] Read more.
Developmental dysplasia of the hip (DDH) is a disorder characterized by abnormal hip development that frequently manifests in infancy and early childhood. Preventing DDH from occurring relies on a timely and accurate diagnosis, which requires careful assessment by medical specialists during early X-ray scans. However, this process can be challenging for medical personnel to achieve without proper training. To address this challenge, we propose a computational framework to detect DDH in pelvic X-ray imaging of infants that utilizes a pipelined deep learning-based technique consisting of two stages: instance segmentation and keypoint detection models to measure acetabular index angle and assess DDH affliction in the presented case. The main aim of this process is to provide an objective and unified approach to DDH diagnosis. The model achieved an average pixel error of 2.862 ± 2.392 and an error range of 2.402 ± 1.963° for the acetabular angle measurement relative to the ground truth annotation. Ultimately, the deep-learning model will be integrated into the fully developed mobile application to make it easily accessible for medical specialists to test and evaluate. This will reduce the burden on medical specialists while providing an accurate and explainable DDH diagnosis for infants, thereby increasing their chances of successful treatment and recovery. Full article
(This article belongs to the Special Issue Advances in Image Analysis: Shapes, Textures and Multifractals)
Show Figures

Figure 1

Figure 1
<p>Calculating the AcI, with H-line displayed (Adapted from [<a href="#B7-jimaging-09-00242" class="html-bibr">7</a>]). Keypoints: LU: Left upper; LD: Left down; RU: Right upper; RD: Right down.</p>
Full article ">Figure 2
<p>Instance Segmentation and Key point labeling (Adapted from [<a href="#B7-jimaging-09-00242" class="html-bibr">7</a>]).</p>
Full article ">Figure 3
<p>(<b>a</b>) Box plot of ACI for Normal infants; (<b>b</b>) Box plot of ACI in DDH patients.</p>
Full article ">Figure 4
<p>Pipelined segmentation and keypoint detection process.</p>
Full article ">Figure 5
<p>Training and validation loss in pipelined Keypoint RCNN.</p>
Full article ">
12 pages, 2297 KiB  
Article
Scalable Optical Convolutional Neural Networks Based on Free-Space Optics Using Lens Arrays and a Spatial Light Modulator
by Young-Gu Ju
J. Imaging 2023, 9(11), 241; https://doi.org/10.3390/jimaging9110241 - 6 Nov 2023
Cited by 2 | Viewed by 2116
Abstract
A scalable optical convolutional neural network (SOCNN) based on free-space optics and Koehler illumination was proposed to address the limitations of the previous 4f correlator system. Unlike Abbe illumination, Koehler illumination provides more uniform illumination and reduces crosstalk. The SOCNN allows for scaling [...] Read more.
A scalable optical convolutional neural network (SOCNN) based on free-space optics and Koehler illumination was proposed to address the limitations of the previous 4f correlator system. Unlike Abbe illumination, Koehler illumination provides more uniform illumination and reduces crosstalk. The SOCNN allows for scaling of the input array and the use of incoherent light sources. Hence, the problems associated with 4f correlator systems can be avoided. We analyzed the limitations in scaling the kernel size and parallel throughput and found that the SOCNN can offer a multilayer convolutional neural network with massive optical parallelism. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

Figure 1
<p>Example of a 4f correlator system that uses Fourier transform to implement an existing optical convolutional neural network (OCNN). The mask represents the Fourier transform of the kernel used in the CNN. <span class="html-italic">f</span> represents the focal length of the Lens1 and Lens2.</p>
Full article ">Figure 2
<p>Example of a simple CNN with corresponding mathematical formula; <math display="inline"><semantics> <mrow> <msubsup> <mrow> <mi>a</mi> </mrow> <mrow> <mi>i</mi> </mrow> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msubsup> </mrow> </semantics></math> represents the <span class="html-italic">i</span>-th input or output node in the <span class="html-italic">l</span>-th layer; <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>w</mi> </mrow> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </mrow> </semantics></math> indicates the weight connecting the <span class="html-italic">j</span>-th input node and the <span class="html-italic">i</span>-th output node; <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>b</mi> </mrow> <mrow> <mi>i</mi> </mrow> </msub> </mrow> </semantics></math> is the <span class="html-italic">i</span>-th bias; <span class="html-italic">N</span> is the size of the input array; <span class="html-italic">N<sub>m</sub></span> is the number of weights connected to an input/output or the size of a kernel; and σ is a sigmoid function.</p>
Full article ">Figure 3
<p>Scalable optical convolutional neural network (SOCNN) based on Koehler illumination and free-space optics using lens arrays and a spatial light modulator: (<b>a</b>) schematics and the corresponding mathematical formula; (<b>b</b>) three-dimensional view of a system with 3 × 3 inputs and outputs; and (<b>c</b>) the structural parameters of the SLM (LCD) pixels and their subarrays. The small, colored squares represent pixels.</p>
Full article ">Figure 3 Cont.
<p>Scalable optical convolutional neural network (SOCNN) based on Koehler illumination and free-space optics using lens arrays and a spatial light modulator: (<b>a</b>) schematics and the corresponding mathematical formula; (<b>b</b>) three-dimensional view of a system with 3 × 3 inputs and outputs; and (<b>c</b>) the structural parameters of the SLM (LCD) pixels and their subarrays. The small, colored squares represent pixels.</p>
Full article ">Figure 4
<p>Difference mode configuration of the SOCNN; this mode can also be used for calculating multiple kernels for a single input array; and a generalized mathematical formula is given, where N<sub>p</sub> represents the number of detectors corresponding to one of lens 3.</p>
Full article ">
18 pages, 6127 KiB  
Article
A Deep Learning-Based Decision Support Tool for Plant-Parasitic Nematode Management
by Top Bahadur Pun, Arjun Neupane and Richard Koech
J. Imaging 2023, 9(11), 240; https://doi.org/10.3390/jimaging9110240 - 6 Nov 2023
Cited by 1 | Viewed by 2754
Abstract
Plant-parasitic nematodes (PPN), especially sedentary endoparasitic nematodes like root-knot nematodes (RKN), pose a significant threat to major crops and vegetables. They are responsible for causing substantial yield losses, leading to economic consequences, and impacting the global food supply. The identification of PPNs and [...] Read more.
Plant-parasitic nematodes (PPN), especially sedentary endoparasitic nematodes like root-knot nematodes (RKN), pose a significant threat to major crops and vegetables. They are responsible for causing substantial yield losses, leading to economic consequences, and impacting the global food supply. The identification of PPNs and the assessment of their population is a tedious and time-consuming task. This study developed a state-of-the-art deep learning model-based decision support tool to detect and estimate the nematode population. The decision support tool is integrated with the fast inferencing YOLOv5 model and used pretrained nematode weight to detect plant-parasitic nematodes (juveniles) and eggs. The performance of the YOLOv5-640 model at detecting RKN eggs was as follows: precision = 0.992; recall = 0.959; F1-score = 0.975; and mAP = 0.979. YOLOv5-640 was able to detect RKN eggs with an inference time of 3.9 milliseconds, which is faster compared to other detection methods. The deep learning framework was integrated into a user-friendly web application system to build a fast and reliable prototype nematode decision support tool (NemDST). The NemDST facilitates farmers/growers to input image data, assess the nematode population, track the population growths, and recommend immediate actions necessary to control nematode infestation. This tool has the potential for rapid assessment of the nematode population to minimise crop yield losses and enhance financial outcomes. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

Figure 1
<p>Decision support tool for nematode detection (NemDST).</p>
Full article ">Figure 2
<p>Root-knot nematode (RKN) eggs.</p>
Full article ">Figure 3
<p>(<b>a</b>) S×S grid; (<b>b</b>) Bounding Box Prediction; and (<b>c</b>) Final Detection.</p>
Full article ">Figure 4
<p>Architecture of YOLOv5 model.</p>
Full article ">Figure 5
<p>Architecture of YOLOv6.</p>
Full article ">Figure 6
<p>Architecture of YOLOv7.</p>
Full article ">Figure 7
<p>Entity–Relationship diagram of nematodes decision support tool (NemDST).</p>
Full article ">Figure 8
<p>Manual and machine counting of RKN egg using YOLOv7-640.</p>
Full article ">Figure 9
<p>Manual and machine counting of RKN egg using YOLOv6-480.</p>
Full article ">Figure 10
<p>Detection and counting of root-knot nematode (RKN) juveniles.</p>
Full article ">Figure 11
<p>Root-knot nematode detection result saved in database table (juvenile).</p>
Full article ">Figure 12
<p>Detection and counting of root-knot nematode eggs.</p>
Full article ">Figure 13
<p>Root-knot nematode egg detection results saved in the database table (egg).</p>
Full article ">
19 pages, 18837 KiB  
Article
Detecting Deceptive Dark-Pattern Web Advertisements for Blind Screen-Reader Users
by Satwik Ram Kodandaram, Mohan Sunkara, Sampath Jayarathna and Vikas Ashok
J. Imaging 2023, 9(11), 239; https://doi.org/10.3390/jimaging9110239 - 6 Nov 2023
Cited by 6 | Viewed by 4737
Abstract
Advertisements have become commonplace on modern websites. While ads are typically designed for visual consumption, it is unclear how they affect blind users who interact with the ads using a screen reader. Existing research studies on non-visual web interaction predominantly focus on general [...] Read more.
Advertisements have become commonplace on modern websites. While ads are typically designed for visual consumption, it is unclear how they affect blind users who interact with the ads using a screen reader. Existing research studies on non-visual web interaction predominantly focus on general web browsing; the specific impact of extraneous ad content on blind users’ experience remains largely unexplored. To fill this gap, we conducted an interview study with 18 blind participants; we found that blind users are often deceived by ads that contextually blend in with the surrounding web page content. While ad blockers can address this problem via a blanket filtering operation, many websites are increasingly denying access if an ad blocker is active. Moreover, ad blockers often do not filter out internal ads injected by the websites themselves. Therefore, we devised an algorithm to automatically identify contextually deceptive ads on a web page. Specifically, we built a detection model that leverages a multi-modal combination of handcrafted and automatically extracted features to determine if a particular ad is contextually deceptive. Evaluations of the model on a representative test dataset and ‘in-the-wild’ random websites yielded F1 scores of 0.86 and 0.88, respectively. Full article
(This article belongs to the Special Issue Image and Video Processing for Blind and Visually Impaired)
Show Figures

Figure 1

Figure 1
<p>An example of a deceptive ad on a popular <span class="html-italic">Kayak</span> travel website. One of the flight results on the list is actually an ad promoting another travel website, namely <span class="html-italic">Priceline</span>. The ad location, coupled with the content similarity between the ad and other flights, can potentially deceive blind users due to the limited information provided by their screen readers, e.g., the visual “Priceline” text is read out as just an “image” by a screen reader.</p>
Full article ">Figure 2
<p>An architectural schematic of the deceptive ad classifier.</p>
Full article ">Figure 3
<p>An edge case example of a deceptive ad undetected by the algorithm.</p>
Full article ">
39 pages, 922 KiB  
Article
Constraints on Optimising Encoder-Only Transformers for Modelling Sign Language with Human Pose Estimation Keypoint Data
by Luke T. Woods and Zeeshan A. Rana
J. Imaging 2023, 9(11), 238; https://doi.org/10.3390/jimaging9110238 - 2 Nov 2023
Cited by 1 | Viewed by 1944
Abstract
Supervised deep learning models can be optimised by applying regularisation techniques to reduce overfitting, which can prove difficult when fine tuning the associated hyperparameters. Not all hyperparameters are equal, and understanding the effect each hyperparameter and regularisation technique has on the performance of [...] Read more.
Supervised deep learning models can be optimised by applying regularisation techniques to reduce overfitting, which can prove difficult when fine tuning the associated hyperparameters. Not all hyperparameters are equal, and understanding the effect each hyperparameter and regularisation technique has on the performance of a given model is of paramount importance in research. We present the first comprehensive, large-scale ablation study for an encoder-only transformer to model sign language using the improved Word-level American Sign Language dataset (WLASL-alt) and human pose estimation keypoint data, with a view to put constraints on the potential to optimise the task. We measure the impact a range of model parameter regularisation and data augmentation techniques have on sign classification accuracy. We demonstrate that within the quoted uncertainties, other than 2 parameter regularisation, none of the regularisation techniques we employ have an appreciable positive impact on performance, which we find to be in contradiction to results reported by other similar, albeit smaller scale, studies. We also demonstrate that the model architecture is bounded by the small dataset size for this task over finding an appropriate set of model parameter regularisation and common or basic dataset augmentation techniques. Furthermore, using the base model configuration, we report a new maximum top-1 classification accuracy of 84% on 100 signs, thereby improving on the previous benchmark result for this model architecture and dataset. Full article
(This article belongs to the Special Issue Feature Papers in Section AI in Imaging)
Show Figures

Figure 1

Figure 1
<p>Batching method: single pass versus random batching for top-1 classification accuracy on 100 signs. Each column label includes the number of epochs the neural network was trained for.</p>
Full article ">Figure 2
<p>Log-linear plot showing mean top-1 accuracy as a function of batch size, <math display="inline"><semantics> <msub> <mi>λ</mi> <mi>batch</mi> </msub> </semantics></math>, for 100 classes. Somewhere between a batch size of 1024 and 2048, model performance begins to rapidly degrade.</p>
Full article ">Figure 3
<p>Log-linear plot showing mean top-1 accuracy as a function of learning rate, <math display="inline"><semantics> <msub> <mi>λ</mi> <mi>lr</mi> </msub> </semantics></math>, for 100 classes. A learning rate between approximately <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <mi>lr</mi> </msub> <mo>=</mo> <mn>1.0</mn> <mo>×</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>4</mn> </mrow> </msup> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <mi>lr</mi> </msub> <mo>=</mo> <mn>1.0</mn> <mo>×</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>3</mn> </mrow> </msup> </mrow> </semantics></math> produces the best-performing models.</p>
Full article ">Figure 4
<p>Log-linear plot showing mean top-1 accuracy as a function of elastic net regularisation for 100 classes, with the <math display="inline"><semantics> <msub> <mo>ℓ</mo> <mn>1</mn> </msub> </semantics></math> parameter, <math display="inline"><semantics> <msub> <mi>λ</mi> <msub> <mo>ℓ</mo> <mn>1</mn> </msub> </msub> </semantics></math>, varied, and the <math display="inline"><semantics> <msub> <mo>ℓ</mo> <mn>2</mn> </msub> </semantics></math> parameter, <math display="inline"><semantics> <msub> <mi>λ</mi> <msub> <mo>ℓ</mo> <mn>2</mn> </msub> </msub> </semantics></math>, held constant at <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <msub> <mo>ℓ</mo> <mn>2</mn> </msub> </msub> <mo>=</mo> <mn>0.001</mn> </mrow> </semantics></math>. The dashed lines with shaded areas indicate the respective mean top-1 accuracy minimum and maximum values from the calculated uncertainty for 16 experiments with no elastic net regularisation applied at all.</p>
Full article ">Figure 5
<p>Log-linear plot showing mean top-1 accuracy as a function of <math display="inline"><semantics> <msub> <mo>ℓ</mo> <mn>2</mn> </msub> </semantics></math> parameter regularisation, <math display="inline"><semantics> <msub> <mi>λ</mi> <msub> <mo>ℓ</mo> <mn>2</mn> </msub> </msub> </semantics></math>, for 100 classes. The dashed lines with shaded areas indicate the respective mean top-1 accuracy and associated uncertainty for 16 experiments with no parameter regularisation applied.</p>
Full article ">Figure 6
<p>Log-linear plot showing mean top-1 accuracy as a function of encoder feed-forward block layer dimension, <math display="inline"><semantics> <msub> <mi>λ</mi> <mi>ff</mi> </msub> </semantics></math>, for 100 classes. Test set performance appears to increase with the number of neurons in the feed-forward block layer.</p>
Full article ">Figure 7
<p>Log-linear plot showing mean top-1 accuracy as a function of encoder feed-forward block layer dimension, <math display="inline"><semantics> <msub> <mi>λ</mi> <mi>ff</mi> </msub> </semantics></math>, for 100 classes. Only the test set results are shown, which more clearly shows that the test set performance increases with the number of neurons in the feed-forward block layer until it appears to plateau, over the range tested, at approximately 2048 neurons. This is marked with a green dashed line.</p>
Full article ">Figure 8
<p>Plot showing mean top-1 accuracy as a function of encoder dropout probability, <math display="inline"><semantics> <msub> <mi>λ</mi> <mi>encdo</mi> </msub> </semantics></math>, for 100 classes. Despite model performance on the test set appearing to peak at approximately <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <mi>encdo</mi> </msub> <mo>=</mo> <mn>0.3</mn> </mrow> </semantics></math>, the uncertainty prevents the determination of a clearly optimal value. Dropout probabilities <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <mi>encdo</mi> </msub> <mo>&gt;</mo> <mn>0.3</mn> </mrow> </semantics></math> appear to impede performance on the test set. The dashed lines with shaded areas indicate the respective mean top-1 accuracy and associated uncertainty for 16 experiments with no encoder dropout applied.</p>
Full article ">Figure 9
<p>Plot showing mean top-1 accuracy as a function of embedding dropout probability, <math display="inline"><semantics> <msub> <mi>λ</mi> <mi>embdo</mi> </msub> </semantics></math>, for 100 classes. It is clear that no amount of embedding dropout is beneficial for model performance on any of the dataset splits. The dashed lines with shaded areas indicate the respective mean top-1 accuracy and associated uncertainty for 16 experiments with no embedding dropout applied.</p>
Full article ">Figure 10
<p>Log-linear plot showing mean top-1 accuracy as a function of noise augmentation, <math display="inline"><semantics> <msub> <mi>λ</mi> <mi>noise</mi> </msub> </semantics></math>, for 100 classes. Given the experimental uncertainty, augmenting the data with noise appears to provide no measurable positive effect on model performance, with the effect being clearly negative beyond <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <mi>noise</mi> </msub> <mo>=</mo> <mn>0.256</mn> </mrow> </semantics></math>. The dashed lines with shaded areas indicate the respective mean top-1 accuracy and associated uncertainty for 16 experiments with no augmentation applied.</p>
Full article ">Figure 11
<p>Log-linear plot showing mean top-1 accuracy as a function of rotation augmentation, <math display="inline"><semantics> <msub> <mi>λ</mi> <mi>rot</mi> </msub> </semantics></math>, for 100 classes. Given the experimental uncertainty, it is not possible to claim a directly observable positive effect from augmenting with rotation of keypoints. The dashed lines with shaded areas indicate the respective mean top-1 accuracy and associated uncertainty for 16 experiments with no augmentation applied.</p>
Full article ">Figure 12
<p>Log-linear plot showing mean top-1 accuracy as a function of scaling augmentation along the <span class="html-italic">x</span>-axis, <math display="inline"><semantics> <msub> <mi>λ</mi> <mi>scale</mi> </msub> </semantics></math>, for 100 classes. Excluding the accuracy results for both extreme hyperparameter values, no clear measurable effect is observed. The measured uncertainties of those extremes, however, do fall within the measured uncertainty of the baseline experiments, indicated by the respective shaded areas, thereby preventing a clear positive or negative performance impact being observed. The dashed lines with shaded areas represent the respective mean top-1 accuracy and associated uncertainty for 16 experiments with no augmentation applied.</p>
Full article ">Figure 13
<p>Log-linear plot showing mean top-1 accuracy as a function of scaling augmentation along the <span class="html-italic">y</span>-axis, <math display="inline"><semantics> <msub> <mi>λ</mi> <mi>scale</mi> </msub> </semantics></math>, for 100 classes. There is no observable effect on test set performance from any scaling hyperparameter value <math display="inline"><semantics> <msub> <mi>λ</mi> <mi>scale</mi> </msub> </semantics></math>. The dashed lines with shaded areas indicate the respective mean top-1 accuracy and associated uncertainty for 16 experiments with no augmentation applied.</p>
Full article ">Figure 14
<p>Log-linear plot showing mean top-1 accuracy as a function of scaling augmentation along the <span class="html-italic">x</span>- and <span class="html-italic">y</span>-axes, <math display="inline"><semantics> <msub> <mi>λ</mi> <mi>scale</mi> </msub> </semantics></math>, for 100 classes. There is no observable effect on test set performance from any scaling hyperparameter value <math display="inline"><semantics> <msub> <mi>λ</mi> <mi>scale</mi> </msub> </semantics></math>. The dashed lines with shaded areas indicate the respective mean top-1 accuracy and associated uncertainty for 16 experiments with no augmentation applied.</p>
Full article ">Figure 15
<p>Log-linear plot showing mean top-1 accuracy as a function of the drop frames augmentation, <math display="inline"><semantics> <msub> <mi>λ</mi> <mi>drop</mi> </msub> </semantics></math>, for 100 classes. Given the experimental uncertainty, it is not possible to claim a directly observable positive effect from augmenting by inserting an element of randomisation by dropping arbitrary frames from each sequence in a batch. The dashed lines with shaded areas indicate the respective mean top-1 accuracy and associated uncertainty for 16 experiments with no augmentation applied.</p>
Full article ">Figure 16
<p>Log-linear plot showing mean top-1 accuracy as a function of the trim start augmentation, <math display="inline"><semantics> <msub> <mi>λ</mi> <mi>trim</mi> </msub> </semantics></math>, for 100 classes. Given the experimental uncertainty, it is not possible to claim a directly observable positive effect from augmenting by inserting an element of randomisation by trimming arbitrary frames from the start of each sequence in a batch. The dashed lines with shaded areas indicate the respective mean top-1 accuracy and associated uncertainty for 16 experiments with no augmentation applied.</p>
Full article ">Figure 17
<p>Log-linear plot showing mean top-1 accuracy as a function of the offset copy augmentation, <math display="inline"><semantics> <msub> <mi>λ</mi> <mi>ocopy</mi> </msub> </semantics></math>, for 100 classes. Given the experimental uncertainty, it is not possible to claim a directly observable positive effect from augmenting by inserting copies of the first frame at the start of each sequence in a batch to delay the start of a sign. The dashed lines with shaded areas indicate the respective mean top-1 accuracy and associated uncertainty for 16 experiments with no augmentation applied.</p>
Full article ">Figure 18
<p>Log-linear plot showing mean top-1 accuracy as a function of the offset pad augmentation, <math display="inline"><semantics> <msub> <mi>λ</mi> <mi>opad</mi> </msub> </semantics></math>, for 100 classes. Given the experimental uncertainty, it is not possible to claim a directly observable positive effect from augmenting by inserting blanks frames at the start of each sequence in a batch to delay the start of a sign, but only up until <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <mi>opad</mi> </msub> <mo>=</mo> <mn>16</mn> </mrow> </semantics></math> after which it is clear that model performance is clearly degraded. The dashed lines with shaded areas indicate the respective mean top-1 accuracy and associated uncertainty for 16 experiments with no augmentation applied.</p>
Full article ">Figure 19
<p>Impact of fixed seed on repeatability of singular feed-forward block layer dimension hyperparameter, <math display="inline"><semantics> <msub> <mi>λ</mi> <mi>ff</mi> </msub> </semantics></math>, change. Each bar shows the difference between two experiments: the baseline hyperparameter value versus the altered hyperparameter value. A positive value indicates an improved accuracy score from the altered hyperparameter. The spread of values shows that the fixed seed can influence the impact of the hyperparameter value such that it is an insufficient way to test the outcome of hyperparameter changes with single experiments.</p>
Full article ">Figure 20
<p>Impact of fixed seed on repeatability of singular rotation hyperparameter, <math display="inline"><semantics> <msub> <mi>λ</mi> <mi>rot</mi> </msub> </semantics></math>, change. Each bar shows the difference between two experiments: the baseline hyperparameter value versus the altered hyperparameter value. A positive value indicates an improved accuracy score from the altered hyperparameter. The spread of values shows that the fixed seed can influence the impact of the hyperparameter value such that it is an insufficient way to test the outcome of hyperparameter changes with single experiments.</p>
Full article ">Figure 21
<p>Impact of fixed seed on repeatability of singular encoder dropout hyperparameter, <math display="inline"><semantics> <msub> <mi>λ</mi> <mi>encdo</mi> </msub> </semantics></math>, change. Each bar shows the difference between two experiments: the baseline hyperparameter value versus the altered hyperparameter value. A positive value indicates an improved accuracy score from the altered hyperparameter. The spread of values shows that the fixed seed can influence the impact of the hyperparameter value such that it is an insufficient way to test the outcome of hyperparameter changes with single experiments.</p>
Full article ">Figure 22
<p>Top-1 test set accuracy as a function of dataset size, with conservative best- and worst-case fitted curves to estimate the effect increased dataset size has on accuracy. Predictions for the linear and polynomial of degree 2 fits have also been plotted. The hypothetical point, as predicted by the linear fit, at which the dataset size would allow for 100% accuracy is marked with a vertical purple dashed line, and the equivalent maximum accuracy point for the worst-case polynomial fit is marked with a vertical teal dashed line.</p>
Full article ">
19 pages, 27203 KiB  
Article
Domain-Aware Few-Shot Learning for Optical Coherence Tomography Noise Reduction
by Deborah Pereg
J. Imaging 2023, 9(11), 237; https://doi.org/10.3390/jimaging9110237 - 30 Oct 2023
Cited by 1 | Viewed by 1620
Abstract
Speckle noise has long been an extensively studied problem in medical imaging. In recent years, there have been significant advances in leveraging deep learning methods for noise reduction. Nevertheless, adaptation of supervised learning models to unseen domains remains a challenging problem. Specifically, deep [...] Read more.
Speckle noise has long been an extensively studied problem in medical imaging. In recent years, there have been significant advances in leveraging deep learning methods for noise reduction. Nevertheless, adaptation of supervised learning models to unseen domains remains a challenging problem. Specifically, deep neural networks (DNNs) trained for computational imaging tasks are vulnerable to changes in the acquisition system’s physical parameters, such as: sampling space, resolution, and contrast. Even within the same acquisition system, performance degrades across datasets of different biological tissues. In this work, we propose a few-shot supervised learning framework for optical coherence tomography (OCT) noise reduction, that offers high-speed training (of the order of seconds) and requires only a single image, or part of an image, and a corresponding speckle-suppressed ground truth, for training. Furthermore, we formulate the domain shift problem for OCT diverse imaging systems and prove that the output resolution of a despeckling trained model is determined by the source domain resolution. We also provide possible remedies. We propose different practical implementations of our approach, verify and compare their applicability, robustness, and computational efficiency. Our results demonstrate the potential to improve sample complexity, generalization, and time efficiency, for coherent and non-coherent noise reduction via supervised learning models, that can also be leveraged for other real-time computer vision applications. Full article
Show Figures

Figure 1

Figure 1
<p>Chicken muscle speckle suppression results: (<b>a</b>) speckled acquired tomogram <math display="inline"><semantics> <mrow> <msub> <mi>p</mi> <mi mathvariant="normal">x</mi> </msub> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math>; (<b>b</b>) ground truth averaged over 901 tomograms; (<b>c</b>) OCT-RNN trained with 100 first columns of chicken muscle; (<b>d</b>) RNN-GAN trained with 100 first columns of chicken muscle and blueberry, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>t</mi> </msubsup> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math>; (<b>e</b>) RNN-GAN trained with 200 columns of chicken decimated by a factor 8/3 in the lateral direction, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>. System and tissue mismatch: (<b>f</b>) DRNN trained with 100 columns of human retinal image, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>; (<b>g</b>) DRNN following lateral decimation of the target input by a factor of 4/3, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>t</mi> </msubsup> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>; (<b>h</b>) DRNN following lateral decimation of the target input by 8/3, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>t</mi> </msubsup> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>. Scale bars are 200 µm.</p>
Full article ">Figure 2
<p>Proposed RNN, RNN-GAN, and U-Net schematic.</p>
Full article ">Figure 3
<p>Illustration of the proposed patch-to-patch RNN encoder–decoder.</p>
Full article ">Figure 4
<p>Retinal data speckle suppression: (<b>a</b>) cross-sectional human retina in vivo <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>; (<b>b</b>) despeckled (NLM) image used as the ground truth; (<b>c</b>) DRNN trained with 100 columns of retinal image <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>t</mi> </msubsup> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>. System mismatch: (<b>d</b>) DRNN following lateral decimation of the target input by a factor of 2, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>t</mi> </msubsup> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>; (<b>e</b>) DRNN following lateral interpolating of the input, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>t</mi> </msubsup> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math>. System and tissue mismatch: (<b>f</b>) RNN-GAN trained with 100 first columns of chicken muscle and blueberry, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math>; (<b>g</b>) RNN-GAN trained with 200 last columns of blueberry; (<b>h</b>) U-Net trained with blueberry image of size <math display="inline"><semantics> <mrow> <mn>256</mn> <mo>×</mo> <mn>256</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math>. Scale bars, 200 µm.</p>
Full article ">Figure 5
<p>Retinal data speckle suppression: (<b>a</b>) cross-sectional human retina in vivo, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>t</mi> </msubsup> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>; (<b>b</b>) despeckled (NLM) image used as the ground truth; (<b>c</b>) DRNN trained with 100 columns of retinal image <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>t</mi> </msubsup> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>; (<b>d</b>) RNN-GAN trained with 100 first columns of chicken muscle and blueberry, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math>; (<b>e</b>) U-Net trained with retinal image of size <math display="inline"><semantics> <mrow> <mn>256</mn> <mo>×</mo> <mn>256</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>. Scale bars are 200 µm.</p>
Full article ">Figure 6
<p>Blueberry speckle suppression results: (<b>a</b>) speckled acquired tomogram; (<b>b</b>) despeckled via angular compounding the used ground truth; (<b>c</b>) RNN-GAN trained with 200 last columns of blueberry, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>t</mi> </msubsup> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math>; (<b>d</b>) DRNN trained with 100 columns of human retinal image, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>; (<b>e</b>) U-Net trained with <math display="inline"><semantics> <mrow> <mn>256</mn> <mo>×</mo> <mn>256</mn> </mrow> </semantics></math> chicken skin image, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>. Scale bars are 200 µm.</p>
Full article ">Figure 7
<p>Chicken skin speckle suppression results: (<b>a</b>) speckled acquired tomogram; (<b>b</b>) AC ground truth, averaged over 60 tomograms; (<b>c</b>) DRNN trained with 100 columns of human retinal image, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>t</mi> </msubsup> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>; (<b>d</b>) RNN-GAN trained with 100 first columns of chicken muscle and blueberry, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math>; (<b>e</b>) RNN-GAN trained with 200 last columns of blueberry, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math>. Scale bars are 200 µm.</p>
Full article ">Figure 8
<p>Cucumber speckle suppression results: (<b>a</b>) speckled acquired tomogram, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>t</mi> </msubsup> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>; (<b>b</b>) ground truth averaged over 301 tomograms; (<b>c</b>) DRNN trained with human retina image, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>; (<b>d</b>) RNN-GAN trained with 200 columns of blueberry image decimated in the lateral direction by a factor of 8/3, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>t</mi> </msubsup> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>; (<b>e</b>) RNN-GAN trained with 200 columns of blueberry and chicken, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math>. Scale bars are 200 µm.</p>
Full article ">Figure 9
<p>Cardiovascular-I speckle suppression results (in Cartesian coordinates): (<b>a</b>) speckled acquired tomogram, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>t</mi> </msubsup> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>; (<b>b</b>) DRNN trained with 100 columns of human retinal image, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>; (<b>c</b>) RNN-GAN trained with 200 columns of blueberry and chicken images, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math>; (<b>d</b>) U-Net trained with retinal data image of size <math display="inline"><semantics> <mrow> <mn>448</mn> <mo>×</mo> <mn>256</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>. Scale bar is 500 µm.</p>
Full article ">Figure 10
<p>Cardiovascular-II speckle suppression results (in Cartesian coordinates): (<b>a</b>) cropped speckled acquired tomogram of size <math display="inline"><semantics> <mrow> <mn>371</mn> <mo>×</mo> <mn>311</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>t</mi> </msubsup> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>; (<b>b</b>) OCT-RNN trained with 100 first columns of chicken muscle, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math>; (<b>c</b>) DRNN trained with 100 columns of human retinal image, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>; (<b>d</b>) RNN-GAN trained with decimated retinal data, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>; (<b>e</b>) RNN-GAN trained with interpolated retinal data, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math>; (<b>f</b>) U-Net trained with retinal data image of size <math display="inline"><semantics> <mrow> <mn>448</mn> <mo>×</mo> <mn>256</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msubsup> <mi>p</mi> <mi mathvariant="normal">x</mi> <mi>s</mi> </msubsup> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>. Scale bar is 200 µm.</p>
Full article ">Figure 11
<p>Retinal data speckle suppression: (<b>a</b>,<b>g</b>) cross-sectional human retina in vivo; (<b>b</b>,<b>h</b>) despeckled image (ground truth); (<b>c</b>,<b>i</b>) RNN trained with a (different) single retinal image; (<b>d</b>,<b>j</b>) RNN-GAN trained with 200 first columns the test image; (<b>e</b>,<b>k</b>) U-Net trained with a (different) single retinal image; (<b>f</b>,<b>l</b>) SM-GAN trained with 3900 example pairs.</p>
Full article ">
12 pages, 893 KiB  
Review
Artificial Intelligence-Based Prediction of Cardiovascular Diseases from Chest Radiography
by Juan M. Farina, Milagros Pereyra, Ahmed K. Mahmoud, Isabel G. Scalia, Mohammed Tiseer Abbas, Chieh-Ju Chao, Timothy Barry, Chadi Ayoub, Imon Banerjee and Reza Arsanjani
J. Imaging 2023, 9(11), 236; https://doi.org/10.3390/jimaging9110236 - 26 Oct 2023
Cited by 4 | Viewed by 3301
Abstract
Chest radiography (CXR) is the most frequently performed radiological test worldwide because of its wide availability, non-invasive nature, and low cost. The ability of CXR to diagnose cardiovascular diseases, give insight into cardiac function, and predict cardiovascular events is often underutilized, not clearly [...] Read more.
Chest radiography (CXR) is the most frequently performed radiological test worldwide because of its wide availability, non-invasive nature, and low cost. The ability of CXR to diagnose cardiovascular diseases, give insight into cardiac function, and predict cardiovascular events is often underutilized, not clearly understood, and affected by inter- and intra-observer variability. Therefore, more sophisticated tests are generally needed to assess cardiovascular diseases. Considering the sustained increase in the incidence of cardiovascular diseases, it is critical to find accessible, fast, and reproducible tests to help diagnose these frequent conditions. The expanded focus on the application of artificial intelligence (AI) with respect to diagnostic cardiovascular imaging has also been applied to CXR, with several publications suggesting that AI models can be trained to detect cardiovascular conditions by identifying features in the CXR. Multiple models have been developed to predict mortality, cardiovascular morphology and function, coronary artery disease, valvular heart diseases, aortic diseases, arrhythmias, pulmonary hypertension, and heart failure. The available evidence demonstrates that the use of AI-based tools applied to CXR for the diagnosis of cardiovascular conditions and prognostication has the potential to transform clinical care. AI-analyzed CXRs could be utilized in the future as a complimentary, easy-to-apply technology to improve diagnosis and risk stratification for cardiovascular diseases. Such advances will likely help better target more advanced investigations, which may reduce the burden of testing in some cases, as well as better identify higher-risk patients who would benefit from earlier, dedicated, and comprehensive cardiovascular evaluation. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

Figure 1
<p>Recent investigations have challenged some long-held medical views regarding traditional applications of CXR to diagnose cardiovascular conditions. Novel AI models rely on automated feature extraction from images and do not depend on human decision-making skills.</p>
Full article ">Figure 2
<p>An example depicting the several neural networks that have been developed for detection of different cardiovascular conditions based on CXR analysis.</p>
Full article ">
30 pages, 7018 KiB  
Article
Empowering Deaf-Hearing Communication: Exploring Synergies between Predictive and Generative AI-Based Strategies towards (Portuguese) Sign Language Interpretation
by Telmo Adão, João Oliveira, Somayeh Shahrabadi, Hugo Jesus, Marco Fernandes, Ângelo Costa, Vânia Ferreira, Martinho Fradeira Gonçalves, Miguel A. Guevara Lopéz, Emanuel Peres and Luís Gonzaga Magalhães
J. Imaging 2023, 9(11), 235; https://doi.org/10.3390/jimaging9110235 - 25 Oct 2023
Cited by 1 | Viewed by 3400
Abstract
Communication between Deaf and hearing individuals remains a persistent challenge requiring attention to foster inclusivity. Despite notable efforts in the development of digital solutions for sign language recognition (SLR), several issues persist, such as cross-platform interoperability and strategies for tokenizing signs to enable [...] Read more.
Communication between Deaf and hearing individuals remains a persistent challenge requiring attention to foster inclusivity. Despite notable efforts in the development of digital solutions for sign language recognition (SLR), several issues persist, such as cross-platform interoperability and strategies for tokenizing signs to enable continuous conversations and coherent sentence construction. To address such issues, this paper proposes a non-invasive Portuguese Sign Language (Língua Gestual Portuguesa or LGP) interpretation system-as-a-service, leveraging skeletal posture sequence inference powered by long-short term memory (LSTM) architectures. To address the scarcity of examples during machine learning (ML) model training, dataset augmentation strategies are explored. Additionally, a buffer-based interaction technique is introduced to facilitate LGP terms tokenization. This technique provides real-time feedback to users, allowing them to gauge the time remaining to complete a sign, which aids in the construction of grammatically coherent sentences based on inferred terms/words. To support human-like conditioning rules for interpretation, a large language model (LLM) service is integrated. Experiments reveal that LSTM-based neural networks, trained with 50 LGP terms and subjected to data augmentation, achieved accuracy levels ranging from 80% to 95.6%. Users unanimously reported a high level of intuition when using the buffer-based interaction strategy for terms/words tokenization. Furthermore, tests with an LLM—specifically ChatGPT—demonstrated promising semantic correlation rates in generated sentences, comparable to expected sentences. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

Figure 1
<p>Main architecture for the proposed Deaf-hearing communication system, composed of four modules: (i) raw data collection consisting of the recording of labeled videos; (ii) dataset construction for structuring the raw data into an AI-complaint organization; (iii) AI consortium of logical entities responsible for inferring words/terms out of LGP signs-based anatomical landmarks, and for outputting well-structured sentences, in a rule-based conditioned manner by resorting to an LLM; and (iv) frontend, for interfacing with the Deaf individuals.</p>
Full article ">Figure 2
<p>Pipeline for data preparation, including a few data augmentation strategies (video- and point-based), as well as normalization steps, until the consolidation of the final dataset.</p>
Full article ">Figure 3
<p>Data augmentation operations applied to original key points: (<b>a</b>) illustrates the RRS process, and (<b>b</b>) depicts the SBI process. In both cases, semi-transparent red dots represent actual anatomical key points (collected from contributors), while purple/blue dots represent augmented key points. For simplicity, the anatomical key points of a single hand are shown, for each presented data augmentation operation.</p>
Full article ">Figure 4
<p>Flow diagram representing anatomic key points set buffer-based process, along LGP expression capturing.</p>
Full article ">Figure 5
<p>Communication sequence for LGP interpretation and sentence construction: (<b>a</b>) depicts the interaction of the user with the proposed LGP system and LLM service, through a frontend interface; (<b>b</b>) illustrates the interaction between LLM and the service requesters, involving a set of words and conditioning rules, resulting in the generation of coherent sentences.</p>
Full article ">Figure 6
<p>Screenshot of the interface of the gesture video recording application. The video stream on the right-side overlaps with visual elements representing the recognized skeletal key-points (red dots) and the corresponding connections between them (green lines).</p>
Full article ">Figure 7
<p>LSTM architectures set up for landmarks-based LGP interpretation. While (<b>a</b>) depicts an architecture composed of three simple LSTM nodes, (<b>b</b>) shows an architecture with a single layer combining 1D Convolutions and LSTM.</p>
Full article ">Figure 7 Cont.
<p>LSTM architectures set up for landmarks-based LGP interpretation. While (<b>a</b>) depicts an architecture composed of three simple LSTM nodes, (<b>b</b>) shows an architecture with a single layer combining 1D Convolutions and LSTM.</p>
Full article ">Figure 8
<p>Layout of the experimental frontend for interoperating with the proposed LGP recognition platform.</p>
Full article ">Figure 9
<p>Overview of the percentage of the usable train/validation signs, after video-based data extraction process. Terms <span class="html-italic">fresca</span>, <span class="html-italic">muito_bom</span>, <span class="html-italic">pao</span>, and <span class="html-italic">quanto_custa</span> were the only ones below 50%. Most of the signs, more specifically, 36 of them, were above the rate of 70%.</p>
Full article ">Figure 10
<p>Hit rate comparison for SimpleLSTM vs. ConvLSTM, considering the dataset augmented with HF, SO, and RRS: (<b>a</b>) SimpleLSTM hit rate plot; and (<b>b</b>) ConvLSTM hit rate plot.</p>
Full article ">Figure 11
<p>Participants’ feedback regarding the intuition provided by the proposed progress bar-based interaction technique for performing synchronized signs for effective tokenization.</p>
Full article ">Figure 12
<p>Layout of the frontend used to perform functional tests on the proposed LGP platform. In (<b>a</b>) there is a user gradually performing signs that are converted in words/terms by inference of the ConvLSTM model incorporated in the proposed platform; in (<b>b</b>), a well-structured sentence can be observed, which resulted from submitting the inferred list of terms/words to ChatGPT’s API service, via proposed platform.</p>
Full article ">
24 pages, 2648 KiB  
Article
Retinal Microvasculature Image Analysis Using Optical Coherence Tomography Angiography in Patients with Post-COVID-19 Syndrome
by Maha Noor, Orlaith McGrath, Ines Drira and Tariq Aslam
J. Imaging 2023, 9(11), 234; https://doi.org/10.3390/jimaging9110234 - 24 Oct 2023
Cited by 5 | Viewed by 2683
Abstract
Several optical coherence tomography angiography (OCT-A) studies have demonstrated retinal microvascular changes in patients post-SARS-CoV-2 infection, reflecting retinal-systemic microvasculature homology. Post-COVID-19 syndrome (PCS) entails persistent symptoms following SARS-CoV-2 infection. In this study, we investigated the retinal microvasculature in PCS patients using OCT-angiography and [...] Read more.
Several optical coherence tomography angiography (OCT-A) studies have demonstrated retinal microvascular changes in patients post-SARS-CoV-2 infection, reflecting retinal-systemic microvasculature homology. Post-COVID-19 syndrome (PCS) entails persistent symptoms following SARS-CoV-2 infection. In this study, we investigated the retinal microvasculature in PCS patients using OCT-angiography and analysed the macular retinal nerve fibre layer (RNFL) and ganglion cell layer (GCL) thickness via spectral domain-OCT (SD-OCT). Conducted at the Manchester Royal Eye Hospital, UK, this cross-sectional study compared 40 PCS participants with 40 healthy controls, who underwent ophthalmic assessments, SD-OCT, and OCT-A imaging. OCT-A images from the superficial capillary plexus (SCP) were analysed using an in-house specialised software, OCT-A vascular image analysis (OCTAVIA), measuring the mean large vessel and capillary intensity, vessel density, ischaemia areas, and foveal avascular zone (FAZ) area and circularity. RNFL and GCL thickness was measured using the OCT machine’s software. Retinal evaluations occurred at an average of 15.2 ± 6.9 months post SARS-CoV-2 infection in PCS participants. Our findings revealed no significant differences between the PCS and control groups in the OCT-A parameters or RNFL and GCL thicknesses, indicating that no long-term damage ensued in the vascular bed or retinal layers within our cohort, providing a degree of reassurance for PCS patients. Full article
(This article belongs to the Special Issue Advances in Retinal Image Processing)
Show Figures

Figure 1

Figure 1
<p>Spectral domain-optical coherence tomography (SD-OCT) of the macula obtained from Canon Xephilio OCT-A1 Machine (Canon Medical Systems Europe B.V©, Amstelveen, Netherlands) displaying a 10 × 10 mm macular image from a participant with post-COVID-19 syndrome segmented into nine EDTRS zones. The segments consist of superior outer, superior inner, nasal outer, nasal inner, inferior outer, inferior inner, temporal outer, temporal inner, and foveal (central) zones. (<b>a</b>) Displays the average thickness of the macular retinal nerve fibre layer (mRNFL) in nine EDTRS zones. (<b>b</b>) Displays the average thickness of the macular ganglion cell layer (mGCL) in nine EDTRS zones.</p>
Full article ">Figure 2
<p>Analysis of the macular 10 × 10 mm and 4 × 4 mm optical coherence tomography-angiography (OCT-A) images performed by our inhouse software. (<b>a</b>) 10 × 10 mm macular OCT-Angiography image of the right eye. (<b>b</b>) Binarisation of the 10 × 10 mm macular OCT-A image as a processing step. (<b>c</b>) Final segmentation of the image following removal of optic disc and the central 4 × 4 mm area which was analysed in separate dedicated 4 × 4 mm images (<b>d</b>) 4 × 4 mm macular OCT-Angiography image of the right eye. (<b>e</b>) Binarisation of the 4 × 4 mm macular OCT-A image. (<b>f</b>) Final segmentation of the 4 × 4 mm image with parafoveal and perifoveal zones highlighted.</p>
Full article ">Figure A1
<p>A Bland-Altmann plot demonstrating the difference between the results measuring large vessel intensities using the OCTAVIA software on two occasions by the same assessor.</p>
Full article ">Figure A2
<p>A scatterplot demonstrating the measurements of the area of the foveal avascular zone (FAZ) using Image J and the OCTAVIA software, outlining the relationship between a line of best fit compared to the 1:1 line.</p>
Full article ">Figure A3
<p>A scatterplot demonstrating manual measurements of large vessel intensities in 10 × 10 OCT-A images compared with OCTAVIA software, outlining the relationship between the line of best fit and 1:1 line.</p>
Full article ">
Previous Issue
Next Issue
Back to TopTop