Deep Local-Global Refinement Network for Stent Analysis in IVOCT Images

Yuyu Guo¹⁶,
Lei Bi¹⁷,
Ashnil Kumar¹⁷,
Yue Gao¹⁸,
Ruiyan Zhang¹⁸,
Dagan Feng¹⁷,
Qian Wang¹⁶ &
…
Jinman Kim¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11768))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

8294 Accesses
5 Citations

Abstract

Implantation of stents into coronary arteries is a common treatment option for patients with cardiovascular disease. Assessment of safety and efficacy of the stent implantation occurs via manual visual inspection of the neointimal coverage from intravascular optical coherence tomography (IVOCT) images. However, such manual assessment requires the detection of thousands of strut points within the stent. This is a challenging, tedious, and time-consuming task because the strut points usually appear as small, irregular shaped objects with inhomogeneous textures, and are often occluded by shadows, artifacts, and vessel walls. Conventional methods based on textures, edge detection, or simple classifiers for automated detection of strut points in IVOCT images have low recall and precision as they are, unable to adequately represent the visual features of the strut point for detection. In this study, we propose a local-global refinement network to integrate local-patch content with global content for strut points detection from IVOCT images. Our method densely detects the potential strut points in local image patches and then refines them according to global appearance constraints to reduce false positives. Our experimental results on a clinical dataset of 7,000 IVOCT images demonstrated that our method outperformed the state-of-the-art methods with a recall of 0.92 and precision of 0.91 for strut points detection.

You have full access to this open access chapter, Download conference paper PDF

Enhancing percutaneous coronary intervention using TriVOCTNet: a multi-task deep learning model for comprehensive intravascular optical coherence tomography analysis

Article 06 January 2025

Prediction of stent under-expansion in calcified coronary arteries using machine learning on intravascular optical coherence tomography images

Article Open access 23 October 2023

Automated Segmentation of Metal and BVS Stent Struts from OCT Images Using U-Net

Keywords

1 Introduction

Accurate assessment of neointimal coverage after stent implantation in intravascular optical coherence tomography (IVOCT) images is important to ensure the safety and efficacy of the Percutaneous Coronary Intervention procedure [8]. Unfortunately, manual assessment requires the detection and analysis of thousands of struts within the stent, which is a challenging, tedious and time-consuming task. As shown in Fig. 1, the stent struts are small, and the visual characteristics of the region covering the thick intima (innermost layer of the artery) may make the struts inconspicuous.

Motivated by these challenges, a number of automated detection methods have been proposed. Existing methods typically use handcrafted features to encode the candidate strut points and then apply supervised classification to identify the struts. Commonly used handcrafted features and supervised approaches include shadow feature [11, 12], decision trees [5] and wavelet based detection [3]. Besides, some studies used lumen segmentation [7, 14] and stent shape models [1] to constrain the search space for the potential struts candidates. However, all these methods rely on effective pre-processing steps, such as denoising, illumination corrections and detecting lumen boundaries for producing accurate results, which thereby restricting its generalizability.

An alternative is to derive features using convolutional neural networks (CNNs) which have achieved great success in medical imaging analysis [4]. The results from the use of CNN architectures including U-Net [10] and FCN [2] demonstrate their performances in accurate detection and segmentation for large sized objects. However, because of the downsampling of the image to enlarge the receptive field and to encode the global information, the application of these CNNs to strut detection has not been validated.

We propose an automated method for stent detection in IVOCT images that overcomes the limitations above mentioned. We leverage CNNs for their ability to combine low-level appearance information with high-level semantic information in a hierarchical manner. A local network to densely detect potentially similar-struts in the image patches, and a global network that uses image appearance information to iteratively refines/removes the false predictions that are less likely to be struts. We have named our method the deep local-global refinement network (LGRN). We contribute the following to the state-of-the-art:

To the best of our knowledge, this is the first deep learning method for strut points detection. Our method removes the reliance on pre-processing steps such as denoising and illumination corrections.
Our coupling of a local network that has high recall with a global network that provides image level refinement, enables false positive reductions while maintaining high sensitivity and efficiency to strut point detection.
Our global network uses an appearance constrained attention module for false positive reduction, which preserves the detected struts that fit the overall appearance of the image.

2 Method

2.1 Materials

Our dataset consists of 57 patients with stent implanted for more than 1 year. Each patient has an average of about 127 IVOCT images with stent. All the IVOCT images were acquired using a C7-XR OCT system (St. Jude Medical, St. Paul, MN, USA) with a 2.7f Dragonfly imaging catheter. Each IVOCT image has a resolution of $714\times 714$ pixels. A cardiologist performed manual annotation of the struts and lumen on all the IVOCT images. And we follow the same protocol as in [6] to design a morphological filter applied to the annotated struts to enlarge the size, which was then used as the ground truth for evaluation.

2.2 Deep Local-Global Refinement Network

Figure 2 shows the overview of the proposed local-global refinement network. Initially, Local-Network is applied to the input IVOCT image to detect all the potential struts via small input patches. After that, the detected struts together with the original IVOCT image are used as the input to Global-Network for refinement, where an appearance constrained attention module is applied to guide the overall spatial distribution of all the struts and to remove all falsely detected struts.

Local-Network: A patch-based deep CNN is used as the Local-Network for detecting all the strut points. It consists of 9 zero-padded convolutional layers with kernel size of 3 and stride of 1. Residual block is used to connect each adjacent layer. At the end of the Local-Network we use a linear $1\times 1$ convolution layer and a Gaussian convolution kernel, where the $1\times 1$ convolution is used to compensate for batch normalization, and the Gaussian convolution kernel is used to smooth the output, e.g., suppressing the path artifacts of the Local-Network. The patch size is set to $64\times 64$ for the Local-Network. The network is trained with L1 Loss.

Global-Network: The purpose of the Global-Network is to extract high-level semantic information that can be used to guide the refinement of all the detected struts. The Global-Network uses a modified U-net [10] architecture. There are 4 downsampling layers, each with a $2 \times 2$ max-pooling operator, and 4 upsampling layers. At each down/up-sampling layer, we repeat the following parallel architecture module: one sub-branch with $3 \times 3$ convolutional kernel, and another sub-branch with $3 \times 3$ dilated convolution and 2 dilations. The outputs of these two sub-branches are added at the end of this module. This combination of regular convolution and dilated convolution has a larger receptive field thus it can learn visual characteristics that assist in inferring struts with less visual features. Therefore, global context constraints can ensure the accuracy of the prediction results, while local context learning can improve the sensitivity of the model to detect the object. However, the uneven distribution of the background and foreground makes it difficult for the global network to converge during training. To overcome this, we added an appearance constrained attention module to guide its convergence, where we used another 5 layers CNN network to different whether the predicted detection results have the similar appearance to the ground truth. To facilitate the learning process, we used two loss functions ($\ell _{similar}$ and $\ell _{attention}$) as:

$$\begin{aligned} \ell _{similar} = \ell _{L1}\Big (\lceil {M}\rceil *P, M\Big ) + \ell _{L1}\bigg (\Big (1-\lceil {M}\rceil \Big )*P, 0\bigg ) \end{aligned}$$

(1)

where $\lceil {M}\rceil $ is the ground truth annotation. P indicates the predicted results. It’s used to balance the uneven distribution of foreground and background. We also add an attention loss to constrain the overall appearance of the predicted struts and this can be defined as:

$$\begin{aligned} \ell _{attention} = log\Big (A(M,I)\Big ) + log\Big (1 - A(P,I)\Big ) \end{aligned}$$

(2)

where $A(\cdot )$ indicates the attention module that discriminates whether P is similar appearance to the ground truth.

2.3 Implementation Details

We pre-processed the dataset with maximum normalization and cropped all images to $512 \times 512$. Both Local-Net and Global-Network were trained for 80 epochs with an Adam optimizer at an initial learning rate of 0.001 and batch size of 1. It took an average of 15 h to train on an 11 GB Nvidia 1080Ti GPU.

3 Results and Discussions

3.1 Experimental Setup

We randomly divided the dataset into a training set (30 patients, 3873 images) and a test set (27 patients, 3352 images) for evaluation. We performed the following experiments on the dataset: (a) comparison of the performance of our method with the state-of-the-art methods; and (b) analysis of the performance of each component in our method. The state-of-the-art methods include: (i) Wang et al. [13] - Bayesian network based detection method; (ii) Faster-RCNN [9] - Region Proposal Network (RPN) that is trained end-to-end to generate high-quality region proposals for detection. (iii) Lu et al. [5] - Bagged decision trees classifier for classifying candidate struts using structure features, and (iv) Nam et al. [7] - Neural network classifier to classify the features from gradient images. For [13], due to the unavailability of the algorithm source code, we refer to the published result as a reference acknowledging that the dataset is different. We use recall and precision for the evaluation (we followed [13] to define the true positive detection if they are within 5 pixels of the ground-truth).

3.2 State-of-the-Art Methods Comparison

Table 1 shows the detection results of our method compared to the state-of-the-art methods, where it increases recall by 1.2% and precision by 4.7%, relative to the second best results from Faster-RCNN, as shown in Fig. 3(a).

Table 1. Detection results compared to the state-of-the-art methods

Full size table

3.3 Component Analysis

Table 1 and Fig. 3(b) show the detection results of our method at individual stages. Figure 5(a) shows the two example detection results with various thickness coverage. The Local-Network shows the higher recall while Global-Network achieved better precision results (as shown in Fig. 5(c) and (d)). As exemplified in Fig. 5(e), the proposed method integrated both Local Network and Global-Network and achieved a better consistent performance in recall and precision.

3.4 Discussion

Table 1 and Fig. 3(a) illustrate our method achieved the overall best performance when compared to the existing methods for strut detection. The traditional methods (Hyeong et al. [7], Hong et al. [5] and Ancong et al. [13]) using hand-crafted features with conventional classifiers achieved competitive performance when compared with Faster-RCNN method. Figure 4(d) and (e) show two example results where both Hyeong et al. and Hong et al. methods fail to detect strut points where there is low-contrast to the background. In contrast, Faster-RCNN has the ability to combine deep semantic information and shallow appearance information in a hierarchical manner that enables it to encode image-wide location information and semantic characteristics. However, Faster-RCNN lacks constrain of the overall appearance of all the struts. Consequently, Faster-RCNN generates poor detection results for small struts (as shown in Fig. 4(c)).

Table 1, Figs. 3(b) and 5 compared the main components of our method individually to quantify their contributions to the final detection results. These results demonstrate that Local-Network has higher recall and we attribute this to the usage of patch-based network to detect all the potential strut candidates. In contrast, Global-Network achieved higher precision for its ability by adding global context, e.g., appearance information, as part of the learning process, which ensures all the detected struts are consistent with the shape of the stent. Table 1, Figs. 3(b) and 5 also show the advantages from our combination which integrates complementary detection results produced at individual components.

4 Conclusion

We propose a deep learning-based method for stent struts detection for IVOCT images. We achieved state-of-the-art struts detection performance via a local-global refinement network, where we detected potential struts which were then refined according to global appearance constraints to reduce false positives. Our experimental results demonstrate that our method achieved higher accuracy when compared to the existing state-of-the-art methods on a large clinical dataset.

References

Ciompi, F., et al.: Computer-aided detection of intracoronary stent in intravascular ultrasound sequences. Med. Phys. 43(10), 5616–5625 (2016)
Article Google Scholar
Jonathan, L., et al.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)
Google Scholar
Kostas, M., et al.: Automatic quantitative analysis of in-stent restenosis using FD-OCT in vivo intra-arterial imaging. Med. Phys. 40(6PartI), 063101 (2013)
Google Scholar
Litjens, G., et al.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017)
Article Google Scholar
Lu, H., et al.: Automatic stent detection in intravascular OCT images using bagged decision trees. Biomed. Opt. Express 3(11), 2809–2824 (2012)
Article Google Scholar
Merget, D., et al.: Robust facial landmark detection via a fully-convolutional local-global context network. In: CVPR, pp. 781–790 (2018)
Google Scholar
Nam, H.S., et al.: Automated detection of vessel lumen and stent struts in intravascular optical coherence tomography to evaluate stent apposition and neointimal coverage. Med. Phys. 43(4), 1662–1675 (2016)
Article Google Scholar
Otsuka, F., et al.: Neoatherosclerosis: overview of histopathologic findings and implications for intravascular imaging assessment. Eur. Hear. J. 36(32), 2147–2159 (2015)
Article Google Scholar
Ren, S., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Ughi, G.J., et al.: Automatic segmentation of in-vivo intra-coronary optical coherence tomography images to assess stent strut apposition and coverage. Int. J. Cardiovasc. Imagin 28(2), 229–241 (2012)
Article Google Scholar
Wang, A., et al.: Automatic stent strut detection in intravascular optical coherence tomographic pullback runs. Int. J. Cardiovasc. Imaging 29(1), 29–38 (2013)
Article Google Scholar
Wang, A., et al.: 3-D stent detection in intravascular OCT using a Bayesian network and graph search. IEEE Trans. Med. Imaging 34(7), 1549–1561 (2015)
Article Google Scholar
Yong, Y.L., et al.: Linear-regression convolutional neural network for fully automated coronary lumen segmentation in intravascular optical coherence tomography. J. Biomed. Opt. 22(12), 126005 (2017)
Article Google Scholar

Download references

Acknowledgement

This work was supported in part by Australia Research Council (ARC) grants (LP140100686 and IC170100022), the University of Sydney – Shanghai Jiao Tong University Joint Research Alliance (USYD-SJTU JRA) grants and STCSM grant (17411953300).

Author information

Authors and Affiliations

Institute for Medical Imaging Technology, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
Yuyu Guo & Qian Wang
School of Computer Science, University of Sydney, Sydney, Australia
Lei Bi, Ashnil Kumar, Dagan Feng & Jinman Kim
Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China
Yue Gao & Ruiyan Zhang

Authors

Yuyu Guo
View author publications
You can also search for this author in PubMed Google Scholar
Lei Bi
View author publications
You can also search for this author in PubMed Google Scholar
Ashnil Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Yue Gao
View author publications
You can also search for this author in PubMed Google Scholar
Ruiyan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dagan Feng
View author publications
You can also search for this author in PubMed Google Scholar
Qian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jinman Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qian Wang .

Editor information

Editors and Affiliations

University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Dinggang Shen
University of Georgia, Athens, GA, USA
Tianming Liu
Western University, London, ON, Canada
Terry M. Peters
Yale University, New Haven, CT, USA
Lawrence H. Staib
University of Strasbourg, Illkirch, France
Caroline Essert
United Imaging Intelligence, Shanghai, China
Sean Zhou
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Pew-Thian Yap
Western University, London, ON, Canada
Ali Khan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guo, Y. et al. (2019). Deep Local-Global Refinement Network for Stent Analysis in IVOCT Images. In: Shen, D., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11768. Springer, Cham. https://doi.org/10.1007/978-3-030-32254-0_60

Download citation

DOI: https://doi.org/10.1007/978-3-030-32254-0_60
Published: 10 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32253-3
Online ISBN: 978-3-030-32254-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)