Abstract
Because coronary artery calcified plaques can hinder or eliminate stent deployment, interventional cardiologists need a better way to plan interventions, which might include one of the many methods for calcification modification (e.g., atherectomy). We are imaging calcifications with intravascular optical coherence tomography (IVOCT), which is the lone intravascular imaging technique with the ability to image the extent of a calcification, and using results to build vessel-specific finite element models for stent deployment. We applied methods to a large set of image data (>45 lesions and > 2,600 image frames) of calcified plaques, manually segmented by experts into calcified, lumen and “other” tissue classes. In optimization experiments, we evaluated anatomical (x, y) versus acquisition (r,θ) views, augmentation methods, and classification noise cleaning. Noisy semantic segmentations are cleaned by applying a conditional random field (CRF). We achieve an accuracy of 0.85 ± 0.04, 0.99 ± 0.01, and 0.97 ± 0.01, and F-score of 0.88 ± 0.07, 0.97 ± 0.01, and 0.91 ± 0.04 for calcified, lumen, and other tissues classes respectively across all folds following CRF noise cleaning. As a proof of concept, we applied our methods to cadaver heart experiments on highly calcified plaques. Following limited manual correction, we used our calcification segmentations to create a lesion-specific finite element model (FEM) and used it to predict direct stenting deployment at multiple pressure steps. FEM modeling of stent deployment captured many features found in the actual stent deployment (e.g., lumen shape, lumen area, and location and number of apposed stent struts).
Keywords: intravascular optical coherence tomography (IVOCT), deep learning, semantic segmentation, calcified plaque, finite element model (FEM)
1. INTRODUCTION
When treating highly calcified coronary artery lesions with stents, interventional cardiologists, almost blindly and without established guidelines, make stressful treatment decisions that can lead to inadequate stent deployment and possible diminished outcomes, or even calamitous events. A cardiologist must choose between a normal sized angioplasty balloon; a smaller angioplasty balloons with high, prolonged pressures to fracture the calcification; direct stenting at very high pressures (up to 30 atm); a scoring or cutting balloon; or any one of a number of atherectomy devices. Detailed intravascular optical coherence tomography (IVOCT) evaluations of stent deployment show that without plaque modification, eccentric calcifications can lead to under deployment with malapposed struts and vessel dissections. The cardiologist’s choices can lead to deleterious consequences. Sub-optimal stent deployment can result in poor longer-term outcomes, a vessel can dissect, or more rarely, a balloon can rupture or an atherectomy device can perforate the wall. These challenges are particularly acute given that cardiologists make treatment decisions for calcified arteries on a daily basis. Calcifications are present in over 100,000 cases (17%−35% of interventions) in the US per year, numbers that will rise with population aging and prolonged statin treatment.1
In this report, we outline steps in our new comprehensive program to assess the role of coronary calcifications in stent deployment. We develop and evaluate deep learning methods for segmentation of calcifications in a very large number (>2,600) of IVOCT image frames. We then demonstrate lesion-specific finite element analysis (FEA) of stent deployment in heavily calcified arteries. FEA results are compared to measured stent deployments in some elegant ex vivo experiments. With success, our research could lead to treatment planning software to support the interventional cardiologist.
2. METHODS
Image processing and learning techniques are applied to do semantic segmentation of pixels in IVOCT images as calcified plaque, lumen, or other. The deep learning model trained on the in vivo data is used to classify the images from the ex vivo experiment. The classified images is used to build the finite element model.
2.1. Preprocessing and Data Set Augmentation
Preprocessing steps are applied to the raw IVOCT images obtained in the polar (r, θ) domain. Image speckle is reduced by filtering with a normalized Gaussian kernel with a size of (7, 7) and standard deviation of 2.5 pixel. IVOCT (r, θ) images are scan converted to create (x, y) images for CNN processing. Data augmentation is used during training to provide more examples, improving model generalization. For anatomical (x, y) images, we rotate the images with an angle picked randomly between −180 to +180. Images were resized from 1024 by 1024 pixels to 360 by 360 to reduce training time and computational cost. To augment (r, θ) images, we concatenate all the (r, θ) images into one big 2D array where θ repeats 0 to 360° many times. By changing an offset shift, we can resample new 360° (r, θ) images. In practice, we shifted the starting A-line 5 times by increments of 100 A-lines to create roughly 13,225 augmented images in this manner to supplement our original data sets that contains 2,646 images.
2.2. Deep Learning Model Architecture and Implementation Details
We chose SegNet2 as our network architecture. SegNet is an end-to-end hourglass shape encoder-decoder convolutional neural network which was trained on CamVid dataset. Each encoder/decoder convolution set consists of a convolution layer, a batch normalization layer and a rectified linear unit (ReLU) layer. All convolution layers are with filter size of 3, a stride of 1, and zero padding of size 1. This filter size was chosen to detect small features, including the edges of calcified plaques. The depth of the network was 5 to provide a receptive field of (360, 360) for the CNN.
A batch normalization layer normalizes each input x across a mini-batch. The layer first normalizes the activations of each channel by calculating the z-score. Activations are subtracted the mini-batch mean μ and subsequently divided by the mini-batch standard deviation σ.
(1) |
where ϵ improves numerical stability when σ2 is very small. To allow for the possibility that inputs with zero mean and unit variance are not optimal for the layer that follows the batch normalization layer, the batch normalization layer further shifts and scales the activations as
(2) |
Here, the offset β and scale factor α are learnable parameters that are updated during network training.3
Convolutional and batch normalization layers are followed by a ReLU layer. A ReLU layer performs a threshold operation to each element, where any input value less than zero is set to zero,
(3) |
A max pooling layer is inserted at the end of each encoder step. All max pooling layers had a pool size of 2 pixels and stride of 2 pixels. Max pooling channels transfer the maximum responses and their indices from the encoder to the decoder to identify corresponding locations while upsampling. The model will produce pixel wise probability scores for pre-defined class labels (“Lumen”, “Calcified Plaque”, or “Other”) with the same size and resolution as the input image. The model is illustrated in Figure 1.
2.3. Segmentation Refinement Strategy
We use conditional random field (CRF) as post-processing step to refine the results from the deep learning model. A method to integrate network outputs to a fully connected CRF is described in Kamnitsas et al.4 The deep learning model gives a vector of class probabilities at each pixel location. The CRF uses these values, pixel intensities and corresponding spatial location information to generate crisp class labels. This process results in images with reduced noise as compared to simply performing a class-wise median filter operation over the image. The goal is to reduce noise by generating a new labeling that favors assigning the same label to pixels that are closer to each other spatially (both in x and y) using the scores generated by the neural network. For IVOCT images, the appearance kernel is inspired by the observation that nearby pixels with similar intensity are likely to be in the same class.
A CRF is an undirected graphical model that encodes a conditional distribution over the target variable Y given a set of the observed variable X. This method maximizes the distribution P(Y|X)), which is expressed as a Gibbs distribution over a random field. The fully connected CRF described in Krähenbühl et al.,5 computes the maximum a posteriori label by minimizing the energy function as follows:
(4) |
where l is a particular label assignment for all pixels in the image; θi(li) = − log P(li) is the unary potential, where P(li) is the probability estimate of label l at pixel i computed by the neural network; θi,j(li, lj) is the pairwise edge potential that connects all pixel pairs in the image i, j; and is defined as a linear combination of Gaussian kernels as shown below
(5) |
where the label compatibility function μ(li, lj) = 1 if li ≠ lj and zero otherwise; pi and pj refer to the spatial positions of pixels i and j, Ii and Ij indicate the intensity vectors of pixels i and j; w1 and w2 are weights of the appearance and smoothness terms, respectively; and σα, σβ, and σγ control the degree of interaction either in the spatial or intensity dimensions.
The message passing step within the iterative update scheme can be expressed as a Gaussian filtering rendering the algorithm computationally efficient. All free parameters are determined empirically: the size of the smoothness kernels, weights of the smoothness and appearance kernel, and the number of iterations. Overall, for each pixel in the (x, y) classification view, the CRF takes in probability estimates of each class as input and outputs its final class ownership. Similar processing was performed when network training experiments were performed on the (r, θ) images as well.
2.4. Finite Element Model
We constructed lesion-specific finite element models from IVOCT images. To create a finite element mesh, there were several steps. They are: (1) Process images using the semantic segmentation deep learning as above. (2) Manually correct labels if necessary. (3) Reconstruct the surface from segmentation results by computing a triangular approximation of the interfaces between different materials. (4) Smooth the generated surfaces to eliminate any staircase-like surfaces. (5) Generate the FEM mesh where the volume enclosed by the generated surface is filled with tetrahedra, using Amira software 6.5 (Thermo Fisher Scientific, Waltham, MA, USA).
Other details of finite element modeling follow. Material properties were determined by fitting results to our measurements at different pressures. All tissues were considered as hyperelastic isotropic materials with different parameters. A stent model was created from detailed characteristics of a Express stent, having a nominal diameter of 3 mm and length of 18mm. Considering the physiological environment in the body and the stenting process, symmetric constraints were applied to both ends of the artery.6
3. EXPERIMENTAL METHODS
3.1. Ex-vivo experimental data
All ex vivo hearts were first CT scanned to choose a good candidate that has large deposits of calcium. Percutaneous Coronary Intervention (PCI) was performed using an 8-Fr guiding catheter. We deploy a 3.0 mm diameter stent (Xience Sierra (3.0 mm diameter, 18 mm long), Abbott Vascular, Santa Clara, CA) using a non-compliant balloon dilated to its nominal pressure. This was followed by post dilations at 3.5, 4.0 and 4.5 mm, each at the following balloon pressures: 10, 20, and 30 atm. Maximal balloon pressure and maximal balloon size are recorded. IVOCT was performed after stent implantation. All IVOCT was performed as FD-OCT (C7 or C8 XR Imaging System; St. Jude Medical, St. Paul, MN, USA). A 2.7-Fr IVOCT catheter (Dragonfly or Dragonfly JP; St. Jude Medical) was advanced distal to the lesion, and automated pullback was performed with contrast injection through the guiding catheter. IVOCT images were recorded and analyzed using the IVOCT console.
3.2. In vivo training data
Our in vivo training data involves 34 clinical pullbacks from 34 patients with a total of 48 lesions. The average number of images per lesion is 55 images. The dataset has 15 calcified lesions (941 images), 27 lipid lesions (1349 images) and 6 mixed lesions (356 images) with both calcium and lipid. All pull-backs were imaged prior to an interventional procedure. The in vivo IVOCT images were acquired using a frequency domain IVOCT system using Illumien Optis (St. Jude Medical, St. Paul, Minnesota). The system comprises of a tunable laser light source sweeping from 1250 nm to 1360 nm. The system was operated at a frame rate of 180 fps, at a pullback speed of 36 mm/sec, and has an axial resolution around 20 μm. The pullbacks were analyzed by two expert readers in the cartesian (x, y) view. In all, a total of 2,646 image frames were analyzed across 34 pullbacks. Labels from (x, y) images were converted back to the polar (r, θ) system for polar data set training.7 All in vivo images were used to train, validate and test the deep learning model that used to classify images from the ex-vivo dataset to build the finite element model.
To determine the ground truth labels, we relied on the definitions given in the consensus document.8 Calcified plaque is seen as a signal poor region with sharply delineated front and/or back borders in IVOCT images. An additional class “other” was used to include all pixels which could not be labeled into lumen or calcified plaque. Example annotation is shown in Figure 2.
3.3. Network Training and Testing
A ten-fold cross-validation procedure was used to measure classifier performance. Each lesion was considered as a volume of interest (VOI). We assigned roughly 80% of the VOIs for training; 10% for validation, and 10% for testing. The VOIs were rotated until all VOIs were in the test set once. We ensure that in each fold that there was no lesion overlap across training, validation, and test sets. Mean and standard error of classification accuracy over the ten folds was recorded.
Predefined classes in our data set are not balanced in sense of number of pixels. We use class weighting to balance the classes as in Eigen et al.9 The median frequency of appearance of classes computed on the entire training set. The weight assigned to each class in the loss function is the ratio of the median frequency of appearance to the frequency of appearance for each class.
There were several issues associated with training. The network was trained using Adam optimizer11 with weight decay of 10−3. We avoid overfitting by adding a regularization term for the weights to the loss function. The optimal network parameters were selected based on the categorical cross entropy error. A mini-batch size of 4 images is used to manage memory requirements during training. We set the maximum number of epochs to 120. Training was stopped when the loss on the validation dataset did not improve by more than 0.01% for 10 consecutive epochs or when the network was trained for 120 epochs, whichever occurred first. The model with the least validation loss during training was saved and was used to make predictions on the test set. Finally, we post-processed each image with a CRF algorithm to reduce classification noise.
Images preprocessing and deep learning model were performed using MATLAB 2017b (MathWorks Inc., Natick, MA) environment. The execution of the network was performed on a Linux-based Intel Xeon Processors x86_64 (x86_64 indicates Intel Xeon 64-bit platform; architecture based on Intel 8086 CPU) with a CUDA-capable NVIDIA™ Tesla P100 16GB GPU.
4. RESULTS
4.1. Deep learning semantic segmentation
Preprocessing and data augmentation steps are shown in Figure 3. All images are shown after log compression for improved visualization.
We determined the role of different algorithms on classification performance. First, training the model on (r, θ) data tended to have a higher classification accuracy than training on (x, y) data over every class (Table 1). Second, it was highly desirable to clean the pixel wise classification from all model networks. Following noise cleaning, the classification results compared favorably with the annotated labels. We optimized CRF parameters in an ad hoc fashion. We also found that the (x, y) model had higher error rate as compared to the (r, θ) across all folds. As shown by the classification results in Table. 1, both (r, θ) and (x, y) models perform well, but close examination showed that the (r, θ) model agreed more favorably to the annotated labels.
Table 1.
Predicted “Other” | Predicted “Lumen” | Predicted “Calcified Plaque” | |
---|---|---|---|
True “Other” | 14,208,138 (95.18 ± 2.83) | 220,241 (1.485 ± 1.23) | 493,874 (3.333 ± 2.08) |
True “Lumen” | 17,574 (1.54 ± 2.48) | 1,112,913 (98.03 ± 2.42) | 4,768 (0.4214 ± 0.54) |
True “Calcified Plaque” | 32,154 (13.76 ± 6.96) | 7,817 (3.345 ±2.40) | 193,697 (82.89 ± 6.81) |
Predicted “Other” | Predicted “Lumen” | Predicted “Calcified Plaque” | |
True “Other” | 87,966,274 (97.62 ± 1.47) | 560,685 (0.62 ± 0.55) | 1,582,785 (1.75 ± 1.09) |
True “Lumen” | 167,637 (0.56 ± 0.60) | 29,536,981 (99.42 ± 1.05) | 4,076 (0.01 ± 0.02) |
True “Calcified Plaque” | 466,286 (14.50 ± 7.33) | 7,101 (0.22 ± 0.23) | 2,741,775 (85.27 ± 4.82) |
To estimate the accuracy of deep learning prediction in identifying calcified plaque in IVOCT images during the testing process, accuracy and dice coefficient were computed against manual segmentation for each class. Table 1 shows the confusion matrix for both (x, y) and (r, θ) views while Table 2 shows the performance of the model based in the (r, θ) data set before and after noise cleaning.
Table 2.
Accuracy | Dice coefficient | Accuracy | Dice coefficient | |||
---|---|---|---|---|---|---|
Other | 0.94 | 0.97 | Other | 0.98 | 0.98 | |
Lumen | 0.99 | 0.98 | Lumen | 0.99 | 0.98 | |
Calcified | 0.82 | 0.42 | Calcified | 0.85 | 0.73 |
We also made visual assessments. The qualitative results show the ability of the proposed model to classify smallest class in IVOCT images, i.e. calcium plaque, while producing a smooth segmentation of the overall image. The weights that produced the highest accuracy were used for comparison against manual segmentations in a held-out test set. For visual inspection, we display an image frame, with the annotated ground truth image, our prediction output and the prediction after refinement process. The red shaded area is the lumen area while the blue is the calcified plaques. Deep learning classifications for lumen were similar to those obtained from manual segmentation, and the results were consistent for all testing sets. Overall, calcified plaques were well captured. Knowledge of the lumen region serves as the basis for calcifications quantification. Demarcation of the vessel lumen in IVOCT images quantitates the luminal cross-sectional area and assess the stenosis severity. Using IVOCT, it was shown that circumferential extent of calcification is a predictor of stent strut malapposition. All predicted images for all training sets were compared with their corresponding manual segmentation. Example deep learning segmentations of lumen and calcified plaque were compared with ground truth manual segmentations from the held out test set are shown in Figure 4.
4.2. Demonstration of FEA of stent deployment in a heavily calcified artery
The performance of the finite element model was measured by comparing the lumen area from the IVOCT experiment in different pressure and balloon sizes steps and the predicted lumen area from the FEM. Figure 5 shows FEM predictions as compared to actual lumen area from IVOCT measurements. Stent strut malapposition is another point of interest. The FEM was able to predict the location of malapposition. Figure 6 shows the malapposition from the IVOCT image and the prediction from the finite element model. FEM was able to predict that the malapposition will happen in the region close to the calcified plaque. The FEM results agree well with the measurement from the IVOCT images.
5. DISCUSSION
In this paper, we demonstrated the ability to segment calcifications using deep learning, and to create lesion-specific finite element models from the segmentations. This research is part of a project to provide interventional cardiologists with information and tools to better plan stent interventions in the presence of highly calcified plaques.
CNN semantic segmentation worked significantly better on data arranged in (r, θ) arrays than in (x, y) arrays (Table 1). There are multiple potential reasons. First, when one reformats data into an (x, y) array, there is increasing interpolation as one goes out from the catheter center. This is not the case in (r, θ) arrays. This interpolation effect could negatively affect the success of local kernels. Second, the (r, θ) data representation was amenable to an elegant data augmentation scheme as described in Methods, allowing us to create heavily augmented data. Third, we were able to process the (r, θ) images at full resolution, but had to downsample the (x, y) images in order to train the Segnet model. This could have affected the ability of the CNN to recognize features such as the sharp edges at calcifications.
Predictions from the FEM were compared with the results from the ex vivo experiment. Lumen gain and malapposition were investigated at different cross sections of the model at different pressures and balloon sizes. Calcified plaque caused malapposition of the stent strut (Figure 6). The prediction from the FEM has a good match to what we have from the ex vivo experiment. The lumen area that the FEM predicted is lesser than the IVOCT measurement (around 10%) as in Figure 5. There could be several reasons for this result. First, a single mechanical model for all tissues which are not calcified plaque was used. Second, the balloon was modeled as a cylindrical tube, whereas the one used in the ex vivo experiment was a tri-folded balloon. Third, the frictions between the balloon and stent were not considered in the model. The material properties of artery and plaque components could be improved to be nonhomogeneous, anisotropic, and time dependent.
Further work will involve conducting additional ex vivo experiments using varieties of stent models and sizes, different balloon sizes, different pressure steps, and tuning the segmentation algorithm and the FEM parameters to give more clinically matched results. Results are promising, and encourage us to continue our efforts towards creating methods to aid pre-stent planning.
ACKNOWLEDGEMENTS
This project was supported by the National Heart, Lung, and Blood Institute through grants NIH R01HL114406-01. The grant was obtained via collaboration between Case Western Reserve University and University Hospitals of Cleveland. The content of this report is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This research was conducted in space renovated using funds from an NIH construction grant (C06RR12463) awarded to CWRU in 1997-2000.This work made use of the High Performance Computing Resource in the Core Facility for Advanced Research Computing at Case Western Reserve University. The veracity guarantor, Chaitanya Kolluru, affirms that to the best of his knowledge that all aspects of this paper are accurate.
REFERENCES
- [1].Puri R, Nicholls SJ, Shao M, Kataoka Y, Uno K, Kapadia SR, Tuzcu EM and Nissen SE, “Impact of statins on serial coronary calcification during atheroma progression and regression,” J. Am. Coll. Cardiol 65(13), 1273–1282 (2015). [DOI] [PubMed] [Google Scholar]
- [2].Badrinarayanan V, Kendall A and Cipolla R, “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence 39(12), 2481–2495 (2017). [DOI] [PubMed] [Google Scholar]
- [3].Ioffe S and Szegedy C, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” arXiv:1502.03167 [cs] (2015). [Google Scholar]
- [4].Kamnitsas K, Ledig C, Newcombe VFJ, Simpson JP, Kane AD, Menon DK, Rueckert D and Glocker B, “Efficient Multi-Scale 3D CNN with Fully Connected CRF for Accurate Brain Lesion Segmentation,” Medical Image Analysis 36, 61–78 (2017). [DOI] [PubMed] [Google Scholar]
- [5].Krähenbühl P and Koltun V, “Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials,” arXiv:1210.5644 [cs] (2012). [Google Scholar]
- [6].Dong PF, Prabhu D, Wilson DL, Bezerra HG and Gu LX, “OCT-based Three Dimensional Modeling of Stent Deployment. ASME 2017 International Mechanical Engineering Congress & Exposition.” [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Kolluru C, Prabhu D, Gharaibeh Y, Bezerra H, Guagliumi G and Wilson D, “Deep neural networks for A-line-based plaque classification in coronary intravascular optical coherence tomography images,” JMI 5(4), 044504 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Tearney GJ, Regar E, Akasaka T, Adriaenssens T, Barlis P, Bezerra HG, Bouma B, Bruining N, Cho J, Chowdhary S, Costa MA, Silva R. de, Dijkstra J, Mario CD, Dudeck D, Falk E, Feldman MD, Fitzgerald P, Garcia H, et al. , “Consensus Standards for Acquisition, Measurement, and Reporting of Intravascular Optical Coherence Tomography Studies: A Report From the International Working Group for Intravascular Optical Coherence Tomography Standardization and Validation,” Journal of the American College of Cardiology 59(12), 1058–1072 (2012). [DOI] [PubMed] [Google Scholar]
- [9].Eigen D and Fergus R, “Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture,” arXiv:1411.4734 [cs] (2014). [Google Scholar]