[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2022232084A1 - Methods of automatic segmentation of anatomy in artifact affected ct images with deep neural networks and applications of same - Google Patents

Methods of automatic segmentation of anatomy in artifact affected ct images with deep neural networks and applications of same Download PDF

Info

Publication number
WO2022232084A1
WO2022232084A1 PCT/US2022/026262 US2022026262W WO2022232084A1 WO 2022232084 A1 WO2022232084 A1 WO 2022232084A1 US 2022026262 W US2022026262 W US 2022026262W WO 2022232084 A1 WO2022232084 A1 WO 2022232084A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
post
sspc
tspc
atlas
Prior art date
Application number
PCT/US2022/026262
Other languages
French (fr)
Inventor
Benoit M. Dawant
Jianing Wang
Jack H. Noble
Robert F. Labadie
Original Assignee
Vanderbilt University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vanderbilt University filed Critical Vanderbilt University
Priority to US18/288,058 priority Critical patent/US20240202914A1/en
Publication of WO2022232084A1 publication Critical patent/WO2022232084A1/en

Links

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B6/00Apparatus or devices for radiation diagnosis; Apparatus or devices for radiation diagnosis combined with radiation therapy equipment
    • A61B6/52Devices using data or image processing specially adapted for radiation diagnosis
    • A61B6/5211Devices using data or image processing specially adapted for radiation diagnosis involving processing of medical diagnostic data
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B6/00Apparatus or devices for radiation diagnosis; Apparatus or devices for radiation diagnosis combined with radiation therapy equipment
    • A61B6/52Devices using data or image processing specially adapted for radiation diagnosis
    • A61B6/5258Devices using data or image processing specially adapted for radiation diagnosis involving detection or reduction of artifacts or noise
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/149Segmentation; Edge detection involving deformable models, e.g. active contour models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/174Segmentation; Edge detection involving the use of two or more images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B6/00Apparatus or devices for radiation diagnosis; Apparatus or devices for radiation diagnosis combined with radiation therapy equipment
    • A61B6/02Arrangements for diagnosis sequentially in different planes; Stereoscopic radiation diagnosis
    • A61B6/03Computed tomography [CT]
    • A61B6/032Transmission computed tomography [CT]
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B6/00Apparatus or devices for radiation diagnosis; Apparatus or devices for radiation diagnosis combined with radiation therapy equipment
    • A61B6/12Arrangements for detecting or locating foreign bodies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20124Active shape model [ASM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20128Atlas-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30052Implant; Prosthesis

Definitions

  • the invention relates generally to cochlear implants, and more particularly, to atlas-based methods of automatic segmentation of intracochlear anatomy in metal artifact affected CT images of the ear with deep neural networks and applications of the same.
  • the cochlea (FIG. 1C) is a spiral-shaped structure that is part of the inner ear involved in hearing. It contains two main cavities: the scala tympani (ST) and the scala vestibuli (SV).
  • the modiolus (MD) is a porous bone around which the cochlea is wrapped that hosts the auditory nerves.
  • a cochlear implant (Cl) is an implanted neuroprosthetic device that is designed to produce hearing sensations in a person with severe to profound deafness by electrically stimulating the auditory nerves. CIs are programmed postoperatively in a process that involves activating all or a subset of the electrodes and adjusting the stimulus level for each of these to a level that is beneficial to the recipient.
  • Programming parameters adjustment is influenced by the intracochlear position of the Cl electrodes, which requires the accurate localization of the Cl electrodes relative to the intracochlear anatomy (ICA) in the post-implantation CT (Post-CT) images of the Cl recipients.
  • This requires the accurate segmentation of the ICA in the Post-CT images.
  • Segmenting the ICA in the Post-CT images is challenging due to the strong artifacts produced by the metallic Cl electrodes (FIG. IB) that can obscure these structures, often severely.
  • the segmentation of the ICA can be obtained by segmenting their pre-implantation CT (Pre-CT) image (FIG. 1A) using an active shape model-based (ASM) method.
  • Pre-CT pre-implantation CT
  • ASM active shape model-based
  • the outputs of the ASM method are surface meshes of the ST, the SV, and the MD that have a predefined number of vertices.
  • each vertex corresponds to a specific anatomical location on the surface of the structures and the meshes are encoded with the information needed for the programming of the implant.
  • Preserving point-to- point correspondence when registering the images is thus of critical importance in the application.
  • the ICA in the Post-CT image of the patients can be obtained by registering their Pre-CT image to the Post-CT image and then transferring the segmentations of the ICA in the Pre-CT image to the Post-CT image using that transformation. This approach does not extend to Cl recipients for whom a Pre-CT image is unavailable, which is the case for long-term recipients who were not scanned before surgery, or for recipients for whom images cannot be retrieved.
  • the invention relates to a method for segmentation of structures of interest (SOI) in a computed tomography (CT) image post- operatively acquired from an implant user in a region of interest in which an implant is implanted.
  • the method comprises providing an atlas image, a dataset and networks, wherein the dataset comprises a plurality of CT image pairs, randomly partitioned into a training set, a validation set, and a testing set, wherein each CT image pair has a pre-implantation CT (Pre-CT) image and a post-implantation CT (Post-CT) image respectively acquired in a region of a respective implant recipient before and after an implant is implanted in the region, so that the Pre-CT image and the Post-CT image of each CT image pair are an artifact-free CT image and an artifact-affected CT image, respectively; wherein the atlas image is a Pre-CT image of the region of a subject that is not in the plurality of CT image pairs; and wherein the networks comprises a first network for registering the
  • the method also comprises training the networks with the plurality of CT image pairs, so as to learn to register the artifact-affected CT images and the atlas image with assistance of the paired artifact-free CT images; inputting the CT image post-operatively acquired from the implant user to the trained networks to generate a dense deformation field (DDF) from the input Post-CT image to the atlas image; and warping a segmentation mesh of the SOI in the atlas image to the input Post-CT image using the DDF so as to generate the segmentation mesh of the SOI in the input Post-CT image, wherein the segmentation mesh of the SOI in the atlas image is generated by applying an active shape model-based (ASM) method to the atlas image.
  • ASM active shape model-based
  • said providing the dataset comprises, for each CT image pair, rigidly registering the Pre-CT image to the Post-CT image; and aligning the registered Pre-CT and Post-CT image pair to the atlas image so that anatomical structures in the region of the respective implant recipient are roughly in the same spatial location and orientation.
  • said providing the dataset further comprises, for each CT image pair, applying the ASM method to the registered Pre-CT image to generate a segmentation mesh of the SOI in the registered Pre-CT image (Mesh pre ); transferring the segmentation mesh of the SOI in the registered Pre- CT image (Mesh pre ) to the Post-CT image, so as to generate a segmentation mesh of the SOI in the Post-CT image (Mesh post ); and converting the segmentation mesh of the SOI in the Post-CT image (Mesh post ) to segmentation masks of the SOI in the Post-CT image (Seg post ).
  • all of the images are resampled to an isotropic voxel size, and cropped with images of 3D voxels containing the structures of interest.
  • said providing the dataset comprises applying image augmentation to the training set by rotating each image by a plurality of small random angles in the range of -25 and 25 degrees about the x-axis, y-axis, and z-axis, to create additional training images from each original image.
  • the networks have network architecture in which NET sSpc-tspc, designed for generating a DDF for warping a source image S to a target image T, comprises a Global-net and a Local-net, wherein the network architecture is configured such that after receiving the concatenation of S and T, the Global-net generates an affine transformation matrix; S is warped to T by using the affine transformation to generate an image S’; the Local-net takes the concatenation of S’ and T to generate a non-rigid local DDF; and the affine transformation and the local DDF are composed to produce an output DDF.
  • said training the networks comprises inputting a concatenation of the atlas image (Atlas atlas ) and each Post-CT image (Post post ) into the networks so that the first network (NET atlas-post ) generates a DDF from the atlas space to the Post-CT space (DDF atias-post ) and the second network (NET Post-atlas ) generates a DDF from the Post-CT space to the atlas space (DDF post-atlas ); warping the Pre-CT image ( Pre sSpc ), the segmentation masks ( Mask sSpc ), and the fiducial vertices (FidV sSpc ) in a source space to a target space by using the corresponding DDFs, to generate Pre sSpc-tspc , Mask sSpc-tSpc , and FidV sSpc-tSpc ; and transferring Pre sSpc-t
  • FidV atlas and FidV post are the fiducial vertices randomly sampled from Mesh atlas and Mesh post on the fly for calculating the fiducial registration error during training.
  • the training objective for NET sSpc-tSpc is constructed using measurements of similarity between a target object in tSpc ( O tSpc ) and a source object that is transferred to tSpc from sSpc (O sSpc-tSpc ).
  • the training objective for the networks is a weighted sum of loss terms of MSPDice, Mean FRE, NCC, CycConsis, BendE, and L2.
  • MSPDice MSPDice( Mask post, Mask atlas-post ) +MSPDice( Mask atlas,
  • MSPDice(Mask tSpc, Mask sSpc-tSpc ) is a multiscale soft probabilistic Dice between Mask tSpc and Mask sSpc-tSpc that measures the similarity of the segmentation masks between Mask tSpc and Mask sSpc-tSpc.
  • FRE FRE( FidV post, FidV atlas-post ) + FRE(FidV atlas, FidV post-atlas ), wherein FRE (FidV tSpc, FidV sSpc-tSpc ) is a mean fiducial registration error that measures the similarity of the fiducial vertices between FidV tSpc and FidV sSpc- tSpc , and is calculated as the average Euclidean distance between the fiducial vertices in FidV tSpc and the corresponding vertices in FidV sSpc-tSpc ;
  • NCC NCC (Pre post, Atla atlas-post ) + NCC(Atlas atlas, Pre post-atlas ), wherein NCC(Preg tSpc, Pre sSpc-tSpc ) is a normalized cross-correlation between Pre tSpc and Pre sSpc-tSpc that measures the similarity between the warped source image and the target image.
  • MSPDice ( Mask sSpc , Mask sSpc-tSpc-sSpc ) + 2 x FRE(FidV sSpc , FidV sSpc-tSpc-sSpc ) + 0.5 x NCC( Pre sSpc , Pre sSpc-tSpc-sSpc ) .
  • BendE BendE(DDF atlas-post ) + BendE (DDF post-atlas ), wherein BendE(DDF sSpc-tSpc ) is a bending loss for which the DDF from the source space to the target space DDF sSpc-tSpc is regularized using bending energy.
  • L2 L2 (NET atlas-post ) + L2(NET post-atlas ), wherein L2 (NET sSpc-tSpc ) is a L2 loss for which the learnable parameters of the registration network NET sSpc-tSpc are regularized by an L2 term.
  • the region of interest includes ear, brain, heart, or other organs of a living subject.
  • the structures of interest comprise anatomical structures in the region of interest.
  • the anatomical structures comprise intra cochlear anatomy (ICA).
  • ICA intra cochlear anatomy
  • the implant is a cochlear implant, a deep brain stimulator, or a pacemaker.
  • the invention in another aspect, relates to a method for segmentation of structures of interest (SOI) in a CT image post-operatively acquired with an implant user in a region of interest in which an implant is implanted.
  • the method includes inputting the post-operatively acquired CT (Post-CT) image to trained networks to generate a dense deformation field (DDF) from the input Post-CT image to an atlas image, wherein the atlas image is a Pre-CT image; and warping a segmentation mesh of the SOI in the atlas image to the input Post-CT image using the DDF so as to generate the segmentation mesh of the SOI in the input Post-CT image, wherein the segmentation mesh of the SOI in the atlas image is generated by applying an active shape model-based (ASM) method to the atlas image.
  • ASM active shape model-based
  • the networks have network architecture in which NET sSpc-tSpc, designed for generating a DDF for warping a source image S to a target image T, comprises a Global-net and a Local-net, wherein the network architecture is configured such that after receiving the concatenation of S and T, the Global-net generates an affine transformation matrix; S is warped to T by using the affine transformation to generate an image S’; the Local-net takes the concatenation of S’ and T to generate a non-rigid local DDF; and the affine transformation and the local DDF are composed to produce an output DDF.
  • the networks is trained with a dataset comprising a plurality of CT image pairs, randomly partitioned into a training set, a validation set, and a testing set, wherein each CT image pair has a pre- implantation CT (Pre-CT) image and a post-implantation CT (Post-CT) image respectively acquired in a region of a respective implant recipient before and after an implant is implanted in the region, so that the Pre-CT image and the Post-CT image of each CT image pair are an artifact-free CT image and an artifact-affected CT image, respectively.
  • Pre-CT pre- implantation CT
  • Post-CT post-implantation CT
  • the Pre-CT image is rigidly registered to the Post- CT image for each CT image pair; and the registered Pre-CT and Post-CT image pair is aligned to the atlas image so that anatomical structures in the region of the respective implant recipient are roughly in the same spatial location and orientation.
  • the ASM method is applied to the registered Pre-CT image to generate a segmentation mesh of the SOI in the registered Pre-CT image ( Mesh pre ); the segmentation mesh of the SOI in the registered Pre-CT image (Mesh pre ) is transferred to the Post-CT image so as to generate a segmentation mesh of the SOI in the Post-CT image (Mesh post ); and the segmentation mesh of the SOI in the Post-CT image (Mesh post ) is converted to segmentation masks of the SOI in the Post-CT image(Seg post ).
  • the networks are trained to learn to register the artifact-affected CT images and the atlas image with assistance of the paired artifact-free CT images.
  • the networks is trained by inputting a concatenation of the atlas image (Atlas atlas ) and each Post-CT image (Post post ) into the networks so that the first network (NET atlas-Post ) generates a DDF from the atlas space to the Post-CT space ( DDF atlas-post ) and the second network ( NET post-atlas ) generates a DDF from the Post-CT space to the atlas space (DDF post-atlas ); warping the Pre-CT image ( Pre sSpc ), the segmentation masks (Mask sSpc ), and the fiducial vertices ( FidV sSpc ) in a source space to a target space by using the corresponding DDFs, to generate Pre sSpc-tSpc , Mask sSpc-tSpc, and FidV sSpc-tSpc ; and transferring Pre sSpc
  • the training objective for NET sSpc-tSpc is constructed using measurements of similarity between a target object in tSpc ( O tSpc ) and a source object that is transferred to tSpc from sSpc ( O sSpc-tSpc ).
  • the trained networks are characterized with a training objective that is a weighted sum of loss terms of MSPDice, Mean FRE, NCC, CycConsis, BendE, and L2.
  • MSPDice MSPDice(Mask post, Mask atlas-post ) +MSPDice(Mask atlas, Mask post-atlas ), wherein MSPDice(Mask tSpc , Mask sSpc-tSpc ) is a multiscale soft probabilistic Dice between Mask tSpc and Mask sSpc-tSpc that measures the similarity of the segmentation masks between Mask tSpc and Mask sSpc-tSpc,
  • Mean FRE FRE(FidV post, FidV atlas-post ) + FRE(FidV atlas, FidV post-atlas ) , wherein FRE (FidV tSpc, FidV sSpc-tSpc ) is a mean fiducial registration error that measures the similarity of the fiducial vertices between FidV tSpc and FidV sSpc- tSpc , and is calculated as the average Euclidean distance between the fiducial vertices in FidV sSpc and the corresponding vertices in FidV sSpc-tSpc .
  • NCC NCC ( Pre post , Atlas atlas-post ) + NCC(Atlas atlas, Pre post-atlas ) , wherein NCC(Preg tSpc, Pre sSpc-tSpc ) is a normalized cross-correlation between Pre tSpc and Pre sSpc-tSpc that measures the similarity between the warped source image and the target image.
  • MSPDice ( Mask sSpc , Mask sSpc-tSpc-sSpc ) + 2 x FRE( FidV sSpc, FidV sSpc-tSpc-sSpc ) + 0.5 x NCC ( Pre sSpc , Pre sSpc-tSpc-sSpc ).
  • BendE BendE(DDF atlas-post ) + BendE(DDF post-atlas ), wherein BendE (DDF sSpc-tSpc ) is a bending loss for which the DDF from the source space to the target space DDF sSpc-tSpc is regularized using bending energy.
  • L2 L2(NET atlas-post ) + L2(NET post-atlas ), wherein L2(NET sSpc-tSpc ) is a L2 loss for which the learnable parameters of the registration network NET sSpc-tSpc are regularized by an L2 term.
  • the invention relates to a non-transitory tangible computer-readable medium storing instructions which, when executed by one or more processors, cause a system to perform the above methods.
  • the invention relates to a system for segmentation of structures of interest (SOI) in a CT image post-operatively acquired from an implant user in a region of interest in which an implant is implanted.
  • the system includes a trained networks for generating dense deformation fields (DDFs) in opposite directions; and a microcontroller coupled with the trained networks and configured to generate a DDF from the post-operatively acquired CT (Post-CT) image to an atlas image; warp a segmentation mesh of the SOI in the atlas image to the Post-CT image using the DDF to generate a segmentation mesh of the SOI in the Post-CT image, wherein the segmentation mesh of the SOI in the atlas image is generated by applying an active shape model-based (ASM) method to the atlas image, wherein the atlas image is a Pre-CT image.
  • ASM active shape model-based
  • the trained networks comprises networks having network architecture in which NET sSpc-tSpc, designed for generating a DDF for warping a source image S to a target image T, comprises a Global-net and a Local-net, wherein the network architecture is configured such that after receiving the concatenation of S and T, the Global-net generates an affine transformation matrix; S is warped to T by using the affine transformation to generate an image S’; the Local-net takes the concatenation of S ’ and T to generate a non-rigid local DDF; and the affine transformation and the local DDF are composed to produce an output DDF.
  • the microcontroller is further configured to train the network with a dataset comprising a plurality of CT image pairs, randomly partitioned into a training set, a validation set, and a testing set, wherein each CT image pair has a pre-implantation CT (Pre-CT) image and a postimplantation CT (Post-CT) image respectively acquired in a region of a respective implant recipient before and after an implant is implanted in the region, so that the Pre-CT image and the Post-CT image of each CT image pair are an artifact-free CT image and an artifact-affected CT image, respectively.
  • Pre-CT pre-implantation CT
  • Post-CT postimplantation CT
  • the microcontroller is further configured to apply image augmentation to the training set by rotating each image by a plurality of small random angles in the range of -25 and 25 degrees about the x-axis, y- axis, and z-axis, to create additional training images from each original image.
  • the microcontroller is further configured to rigidly register the Pre-CT image to the Post-CT image for each CT image pair; and align the registered Pre-CT and Post-CT image pair to the atlas image so that anatomical structures in the region of the respective implant recipient are roughly in the same spatial location and orientation.
  • the microcontroller is further configured to, for each CT image pair, apply the ASM method to the registered Pre-CT image to generate a segmentation mesh of the SOI in the registered Pre-CT image ( Mesh pre ); transfer the segmentation mesh of the SOI in the registered Pre-CT image (Mesh pre ) to the Post-CT image, so as to generate a segmentation mesh of the SOI in the Post-CT image (Mesh post ); and convert the segmentation mesh of the SOI in the Post-CT image ( Mesh post ) to segmentation masks of the SOI in the Post-CT image (Seg post ).
  • the networks are trained to learn to register the artifact-affected CT images and the atlas image with assistance of the paired artifact-free CT images.
  • the networks is trained by inputting a concatenation of the atlas image (Atlas atlas ) and each Post-CT image ( Post post ) into the networks so that the first network (NET atlas-post ) generates a DDF from the atlas space to the Post-CT space ( DDF atlas-post ) and the second network ( NET post-atlas ) generates a DDF from the Post-CT space to the atlas space ( DDF post-atlas ); warping the Pre-CT image (Pre sSpc ), the segmentation masks ( Mask sSpc ), and the fiducial vertices ( FidV sSpc ) in a source space to a target space by using the corresponding DDFs, to generate Pre sSpc-tSpc, Mask sSpc-tSpc, and FidV sSpc-tSpc ; and transferring Pre sSpc-tSp
  • the training objective for NET sSpc-tSpc is constructed using measurements of similarity between a target object in tSpc ( O tSpc ) and a source object that is transferred to tSpc from sSpc ( O sSpc-tSpc ).
  • the trained networks are characterized with a training objective that is a weighted sum of loss terms of MSPDice, Mean FRE, NCC, CycConsis, BendE, and L2.
  • MSPDice MSPDice(Mask post, Mask atlas-post ) +MSPDice(Mask atlas, Mask post-atlas ), wherein MSPDice(Mask tSpc , Mask sSpc-tSpc ) is a multiscale soft probabilistic Dice between Mask tSpc and Mask sSpc-tSpc that measures the similarity of the segmentation masks between Mask tSpc and Mask sSpc-tSpc.
  • Mean FRE FRE (FidV post, FidV atlas-post ) + FRE (FidV atlas, FidV post-atlas ), wherein FRE(FidV tSpc, FidV sSpc-tSpc ) is a mean fiducial registration error that measures the similarity of the fiducial vertices between FidV tSpc and FidV sSpc- tSpc , and is calculated as the average Euclidean distance between the fiducial vertices in FidV tSpc and the corresponding vertices in FidV sSpc-tSpc.
  • NCC NCC ( Pre post, Atlas atlas-post ) + NCC(Atlas atlas, Pre post-atlas ), wherein NCC(Preg tSpc, Pre sSpc-tSpc ) is a normalized cross-correlation between Pre sSpc and Pre sSpc-tSpc that measures the similarity between the warped source image and the target image.
  • BendE BendE(DDF atlas-post ) + BendE(DDF post-atlas ), wherein BendE(DDF sSpc-tSpc ) is a bending loss for which the DDF from the source space to the target space DDF sSpc-tSpc is regularized using bending energy.
  • L2 L2 (NET atlas-post ) + L2 (NET post-atlas ), wherein L2 ( NET sSpc-tSpc ) is a L2 loss for which the learnable parameters of the registration network NET sSpc-tSpc are regularized by an L2 term.
  • the region of interest includes ear, brain, heart, or other organs of a living subject.
  • the structures of interest comprise anatomical structures in region of interest.
  • the anatomical structures comprise intra cochlear anatomy (ICA).
  • ICA intra cochlear anatomy
  • the implant is a cochlear implant, a deep brain stimulator, or a pacemaker.
  • FIGS. 1 A- IB show schematically a pair of registered Pre-CT and Post- CT images, respectively, of an ear of a Cl recipient.
  • FIG. 1C is an illustration of the intracochlear anatomy with an implanted Cl electrode array. The meshes of the ST, the SV, and the MD are obtained by applying the ASM method to the Pre-CT image
  • FIGS. 2A-2C show schematically the framework of the method according to embodiments of the invention.
  • FIG. 2A Objects used for training the networks.
  • FIG. 2B Training phase.
  • FIG. 2C Inference phase.
  • FIG. 3 is an illustration of a registration network NET sSpc-tSpc that is tasked to generate a DDF from the source space to the target space according to embodiments of the invention.
  • FIGS. 4A-4B show two example cases in which the method leads to (FIG. 4A) good and (FIG. 4B) poor results according to embodiments of the invention.
  • FIGS. 5A-5C show boxplots of (FIG. 5 A) the median, (FIG. 5B) the Max, and (FIG. 5C) the STD of the P2PEs, according to embodiments of the invention. Boxplots from the left-hand side to the right-hand side are respectively for “cGAN + ASM”, “Novel”, “Novel-NoNCC”, “Novel- NoCycConsis”, “Novel-NoFRE”, “Baseline”, and “No registration”, for each of ST, SV and MD.
  • first, second, third etc. may be used herein to describe various elements, components, regions, layers and/or sections, these elements, components, regions, layers and/or sections should not be limited by these terms. These terms are only used to distinguish one element, component, region, layer or section from another element, component, region, layer or section. Thus, a first element, component, region, layer or section discussed below could be termed a second element, component, region, layer or section without departing from the teachings of the invention.
  • relative terms such as “lower” or “bottom” and “upper” or “top,” may be used herein to describe one element's relationship to another element as illustrated in the figures. It will be understood that relative terms are intended to encompass different orientations of the device in addition to the orientation depicted in the figures. For example, if the device in one of the figures is turned over, elements described as being on the “lower” side of other elements would then be oriented on “upper” sides of the other elements. The exemplary term “lower”, can therefore, encompasses both an orientation of “lower” and “upper,” depending of the particular orientation of the figure.
  • the phrase "at least one of A, B, and C" should be construed to mean a logical (A or B or C), using a non-exclusive logical OR.
  • cGANs+ASM conditional generative adversarial networks
  • One of the objectives of this invention is to provide an atlas-based method to segment the intracochlear anatomy (ICA) in the post-implantation CT (Post-CT) images of cochlear implant (Cl) recipients that preserves the point-to-point correspondence between the meshes in the atlas and the segmented volumes.
  • ICA intracochlear anatomy
  • Post-CT post-implantation CT
  • Cl cochlear implant
  • DDFs dense deformation fields
  • the networks are trained using loss functions based on voxel-wise labels, image content, fiducial registration error, and cycle- consistency constraint.
  • the segmentation of the ICA in the Post-CT images is subsequently obtained by transferring the predefined segmentation meshes of the ICA in the atlas image to the Post-CT images using the corresponding DDFs generated by the trained registration networks.
  • the model can learn the underlying geometric features of the ICA even though they are obscured by the metal artifacts.
  • the end-to-end network produces results that are comparable to the current state of the art (SOTA) that relies on the two-step method that first uses conditional generative adversarial networks to synthesize artifact-free images from the Post-CT images and then uses an active shape model-based method to segment the ICA in the synthetic images, as disclosed in U.S. Patent Application Serial No. 17/266,180, which is incorporated herein in its entirety by reference.
  • the atlas- based method operably produces results in a fraction of the time needed by the SOTA and is more robust to noise and poor image quality and faster, which is important for end-user acceptance.
  • the invention relates to a method for segmentation of structures of interest (SOI) in a CT image post-operatively acquired from an implant user in a region of interest in which an implant is implanted.
  • the region of interest includes ear, brain, heart, or other organs of a living subject
  • the structures of interest comprise anatomical structures in the region of interest
  • the anatomical structures comprise the ICA.
  • the implant is a cochlear implant, a deep brain stimulator, or a pacemaker.
  • the method comprises providing an atlas image, a dataset and networks.
  • the dataset comprises a plurality of CT image pairs, randomly partitioned into a training set, a validation set, and a testing set, wherein each CT image pair has a pre-implantation CT (Pre-CT) image and a postimplantation CT (Post-CT) image respectively acquired in a region of a respective implant recipient before and after an implant is implanted in the region, so that the Pre-CT image and the Post-CT image of each CT image pair are an artifact-free CT image and an artifact-affected CT image, respectively.
  • Pre-CT pre-implantation CT
  • Post-CT postimplantation CT
  • the Pre-CT image is rigidly registered to the Post- CT image for each CT image pair; and the registered Pre-CT and Post-CT image pair is aligned to the atlas image so that anatomical structures in the region of the respective implant recipient are roughly in the same spatial location and orientation.
  • the ASM method is applied to the registered Pre-CT image to generate a segmentation mesh of the SOI in the registered Pre-CT image ( Mesh pre ); the segmentation mesh of the SOI in the registered Pre-CT image ( Mesh pre ) is transferred to the Post-CT image so as to generate a segmentation mesh of the SOI in the Post-CT image(Mesh post ); and the segmentation mesh of the SOI in the Post-CT image (Mesh post ) is converted to segmentation masks of the SOI in the Post-CT image(Seg post ).
  • the atlas image is a Pre-CT image of the region of a subject that is not in the plurality of CT image pairs.
  • all of the images are resampled to an isotropic voxel size, and cropped with images of 3D voxels containing the structures of interest.
  • image augmentation is applied to the training set by rotating each image by a plurality of small random angles in the range of - 25 and 25 degrees about the x-axis, y-axis, and z-axis, to create additional training images from each original image.
  • the networks have network architecture in which NET sSpc-tSpc, designed for generating a DDF for warping a source image S to a target image T, comprises a Global-net and a Local-net, wherein the network architecture is configured such that after receiving the concatenation of S and T, the Global-net generates an affine transformation matrix; S is warped to T by using the affine transformation to generate an image S’; the Local-net takes the concatenation of S’ and T to generate a non- rigid local DDF; and the affine transformation and the local DDF are composed to produce an output DDF.
  • the network architecture is configured such that after receiving the concatenation of S and T, the Global-net generates an affine transformation matrix; S is warped to T by using the affine transformation to generate an image S’; the Local-net takes the concatenation of S’ and T to generate a non- rigid local DDF; and the affine transformation and the local DDF are composed to produce an output D
  • the networks comprises a first network for registering the atlas image to each Post-CT image and a second network for registering each Post-CT image to the atlas image.
  • the method also comprises training the networks with the plurality of CT image pairs, so as to learn to register the artifact-affected CT images and the atlas image with assistance of the paired artifact-free CT images; inputting the CT image post-operatively acquired from the implant user to the trained networks to generate a DDF from the input Post-CT image to the atlas image; and warping a segmentation mesh of the SOI in the atlas image to the input Post-CT image using the DDF so as to generate the segmentation mesh of the SOI in the input Post-CT image, wherein the segmentation mesh of the SOI in the atlas image is generated by applying an active shape model-based (ASM) method to the atlas image.
  • ASM active shape model-based
  • said training the networks comprises inputting a concatenation of the atlas image (Atlas atlas ) and each Post-CT image ( Post post ) into the networks so that the first network ( NET atlas- post ) generates a DDF from the atlas space to the Post-CT space ( DDF atlas-post ) and the second network (NET post-atlas ) generates a DDF from the Post-CT space to the atlas space (DDF post-atlas ); warping the Pre-CT image ( Pre sSpc ) the segmentation masks ( Mask sSpc ), and the fiducial vertices ( FidV sSpc ) in a source space to a target space by using the corresponding DDFs, to generate Pre sSpc- tSPc, Mask sSpc-tSpc , and FidV sSpc-tSpc ⁇ , and transferring
  • FidV atlas and FidV post are the fiducial vertices randomly sampled from Mesh atlas and Mesh post on the fly for calculating the fiducial registration error during training.
  • the training objective for NET sSpc-tSpc is constructed using measurements of similarity between a target object in tSpc (O tSpc ) and a source object that is transferred to tSpc from sSpc ( O sSpc-tSpc ).
  • the training objective for the networks is a weighted sum of loss terms of MSPDice, Mean FRE, NCC, CycConsis,
  • MSPDice MSPDice(Mask post, Mask atlas-post ) +MSPDice(Mask atlas, Mask post-atlas ), wherein MSPDice(Mask tSpc, Mask sSpc-tSpc ) is a multiscale soft probabilistic Dice between Mask tSpc and Mask sSpc-tSpc that measures the similarity of the segmentation masks between Mask tSpc and Mask sSpc-tSpc.
  • Mean FRE FRE(FidV post, FidV atlas-post ) + FRE(FidV atlas, FidV post-atlas ), wherein FRE(FidV tSpc, FidV sSpc-tSpc ) is a mean fiducial registration error that measures the similarity of the fiducial vertices between FidV tSpc and FidV sSpc- tSpc, and is calculated as the average Euclidean distance between the fiducial vertices in FidV tSpc and the corresponding vertices in FidV sSpc-tSpc.
  • NCC NCC(Prc post, Atlas atlas-post ) + NCC(Atlas atlas, Pre post-atlas ), wherein NCC(Preg tSpc, Pre sSpc-tSpc ) is a normalized cross-correlation between Pre tSpc and Pre sSpc-tSpc that measures the similarity between the warped source image and the target image.
  • BendE BendE(DDF atlas-post ) + BendE(DDF post-atlas ), wherein BendE(DDF sSpc-tSpc ) is a bending loss for which the DDF from the source space to the target space DDF sSpc-tSpc is regularized using bending energy.
  • L2 L2(NET atlas-post ) + L2(NET post-atlas ), wherein L2(NET sSpc-tSpc ) is a L2 loss for which the learnable parameters of the registration network NET sSpc-tSpc are regularized by an L2 term.
  • the invention in another aspect, relates to a system for segmentation of SOI in a CT image post-operatively acquired from an implant user in a region of interest in which an implant is implanted.
  • the system includes a trained networks for generating DDFs in opposite directions; and a microcontroller coupled with the trained networks and configured to generate a DDF from the post-operatively acquired CT (Post- CT) image to an atlas image; warp a segmentation mesh of the SOI in the atlas image to the Post-CT image using the DDF to generate a segmentation mesh of the SOI in the Post-CT image, wherein the segmentation mesh of the SOI in the atlas image is generated by applying an ASM method to the atlas image.
  • Post- CT post-operatively acquired CT
  • the atlas image is a Pre-CT image.
  • the trained networks comprises networks having network architecture in which NET sSpc-tSpc, designed for generating a DDF for warping a source image S to a target image T, comprises a Global-net and a Local-net, wherein the network architecture is configured such that after receiving the concatenation of S and T, the Global-net generates an affine transformation matrix; S is warped to T by using the affine transformation to generate an image S’; the Local-net takes the concatenation of S’ and T to generate a non-rigid local DDF; and the affine transformation and the local DDF are composed to produce an output DDF.
  • the microcontroller is further configured to train the network with a dataset comprising a plurality of CT image pairs, randomly partitioned into a training set, a validation set, and a testing set, wherein each CT image pair has a pre-implantation CT (Pre-CT) image and a postimplantation CT (Post-CT) image respectively acquired in a region of a respective implant recipient before and after an implant is implanted in the region, so that the Pre-CT image and the Post-CT image of each CT image pair are an artifact-free CT image and an artifact-affected CT image, respectively.
  • Pre-CT pre-implantation CT
  • Post-CT postimplantation CT
  • the microcontroller is further configured to apply image augmentation to the training set by rotating each image by a plurality of small random angles in the range of -25 and 25 degrees about the x-axis, y- axis, and z-axis, to create additional training images from each original image.
  • the microcontroller is further configured to rigidly register the Pre-CT image to the Post-CT image for each CT image pair; and align the registered Pre-CT and Post-CT image pair to the atlas image so that anatomical structures in the region of the respective implant recipient are roughly in the same spatial location and orientation.
  • the microcontroller is further configured to, for each CT image pair, apply the ASM method to the registered Pre-CT image to generate a segmentation mesh of the SOI in the registered Pre-CT image( Mesh pre ); transfer the segmentation mesh of the SOI in the registered Pre-CT image (Mesh pre ) to the Post-CT image, so as to generate a segmentation mesh of the SOI in the Post-CT image (Mesh post ); and convert the segmentation mesh of the SOI in the Post-CT image (Mesh post ) to segmentation masks of the SOI in the Post-CT image (Seg post ).
  • the networks are trained to learn to register the artifact-affected CT images and the atlas image with assistance of the paired artifact-free CT images.
  • the networks is trained by inputting a concatenation of the atlas image (Atlas atlas ) and each Post-CT image ( Post post ) into the networks so that the first network (NET atlas-post ) generates a DDF from the atlas space to the Post-CT space (DDF atlas-post ) and the second network(NET post-atlas ) generates a DDF from the Post-CT space to the atlas space( DDF post-atlas ); warping the Pre-CT image (Pre sSpc ), the segmentation masks(Mask sSpc ), and the fiducial vertices (FidV sSpc ) in a source space to a target space by using the corresponding DDFs, to generate Pre sSpc-tSpc, Mask sSpc-tSpc, and FidV sSpc-tSpc, and transferring Pre sSpc-t
  • the training objective for NET sSpc-tSpc is constructed using measurements of similarity between a target object in tSpc (O tSpc ) and a source object that is transferred to tSpc from sSpc (O sSpc-tSpc ).
  • the trained networks are characterized with a training objective that is a weighted sum of loss terms of MSPDice, Mean FRE, NCC, CycConsis, BendE, and L2.
  • MSPDice MSPDice(Mask Post, Mask atlas-post ) +MSPDice(Mask atlas,
  • MSPDice(Mask tSpc, Mask sSpc-tSpc ) is a multiscale soft probabilistic Dice between Mask tSpc and Mask sSpc-tSpc that measures the similarity of the segmentation masks between Mask tSpc and Mask sSpc-tSpc.
  • Mean FRE FRE(FidV post, FidV atlas-post ) + FRE(FidV atlas, FidV post-atlas )
  • FRE( FidV tSpc, FidV sSpc-tSpc ) is a mean fiducial registration error that measures the similarity of the fiducial vertices between FidV tSpc and FidV sSpc- tspc, and is calculated as the average Euclidean distance between the fiducial vertices in FidV tSpc and the corresponding vertices in FidV sSpc-tSpc.
  • NCC NCC(Pre post, Atlas atlas-Post ) NCC(Atlas atlas, Pre post-atlas ), wherein NCC(Preg tSpc, Pre sSpc-tSpc ) is a normalized cross-correlation between Pre tSpc and Pre sSpc-tSpc that measures the similarity between the warped source image and the target image.
  • BendE BendE(DDF atlas-post ) + BendE(DDF post-atlas ), wherein
  • BendE(DDF sSpc-tSpc ) is a bending loss for which the DDF from the source space to the target space DDF sSpc-tSpc is regularized using bending energy.
  • L2 L 2(NET atlas-post ) + L2(NET post-atlas ), wherein L2(NET sSpc-tSpc ) is a L2 loss for which the learnable parameters of the registration network NET sSpc-tSpc are regularized by an L2 term.
  • Segmentation of the ICA is important to assist audiologists in programming cochlear implant. Segmenting the anatomy in images acquired after implantation is difficult because the implant produces very strong artifacts. The method system disclosed herein permit the segmentation of these images despite these artifacts. It is also robust to poor image quality, i.e., images affected by noise or blurred.
  • the storage medium/memory may include, but is not limited to, high-speed random access medium/memory such as DRAM, SRAM, DDR RAM or other random access solid state memory devices, and non-volatile memory such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices.
  • high-speed random access medium/memory such as DRAM, SRAM, DDR RAM or other random access solid state memory devices
  • non-volatile memory such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices.
  • an end-to-end atlas-based method is developed, which first generates a dense deformation field (DDF) between an artifact-free atlas image and a Post-CT image.
  • DDF dense deformation field
  • the segmentation of the intracochlear anatomy (ICA) in the Post-CT image can then be obtained by transferring the predefined segmentation meshes of the ICA in the atlas image to the Post-CT image using that the DDF.
  • the inter-subject non- rigid registration between the atlas image and the Post-CT image is a difficult task because (1) considerable variation in cochlear anatomy across individuals has been documented, and (2) the artifacts in the Post-CT image change, often severely, the appearance of the anatomy, which has a significant influence on the accuracy of registration methods guided by intensity-based similarity metrics.
  • the exemplary study herein discloses a method to perform registrations between an atlas image and the Post-CT images that rely on deep networks. Following the idea of consistent image registration obtained by jointly estimating the forward and reverse transformations between two images proposed by Christensen et al., a pair of co-trained networks that generate DDFs in opposite directions is adapted.
  • One network is tasked with registering the atlas image to the Post-CT image and the other one is tasked with registering the Post-CT image to the atlas image.
  • the networks are trained using loss functions that include voxel-wise labels, image content, fiducial registration error (FRE), and cycle-consistency constraint.
  • the model can segment the ICA and preserve point-to-point correspondence between the atlas and the Post-CT meshes, even when the ICA is difficult to localize visually.
  • the dataset includes Pre-CT and Post-CT image pairs of 624 ears.
  • the atlas image is a Pre-CT image of an ear that is not in the 624 ears.
  • the Pre-CT images are acquired with several conventional scanners (GE BrightSpeed, LightSpeed Ultra; Siemens Sensation 16; and Philips M x 8000 IDT, iCT 128, and Brilliance 64) and the Post-CT images are acquired with a low-dose flat-panel volumetric scanner (Xoran Technologies xCAT ® ENT).
  • the typical voxel size is 0.25 x 0.25 x 0.3 mm 3 for the Pre-CT images and 0.4 x 0.4 x 0.4 mm 3 for the Post-CT images.
  • the Pre-CT image is rigidly registered to the Post-CT image. The registration is accurate because the surgery, which comprises threading an electrode array through a small hole into the bony cavity, does not induce non-rigid deformation of the cochlea.
  • the registered Pre-CT and Post-CT image pairs are then aligned to the atlas image so that the ears are roughly in the same spatial location and orientation. All of the images are resampled to an isotropic voxel size of 0.2 mm.
  • FIG. 2A shows a list of images, meshes, and masks used to train the networks.
  • O xSpc or O x is used to denote an object O in the x space.
  • Atlaslmg atlas orAtlas atlas is the atlas image in the atlas space.
  • PostImg post or Post post is a Post-CT image in the Post-CT space.
  • Mesh atlas is the segmentation mesh of the ICA in Atlas atlas generated by applying the active shape model-based (ASM) method to Atlas atlas.
  • PreImg post or Pre post is the paired Pre-CT image of Post post registered to the original Post-CT image.
  • Mesh post is the segmentation mesh of the ICA in Post post, which is generated by applying the ASM method to Pre post and then transferring the meshes to Post post .
  • Mask atlas ( Seg atlas ) and Mask post ( Seg post ) are segmentation masks of the ST, SY, and MD. They are generated by converting Mesh atlas and Mesh post to masks.
  • the input of the networks is the concatenation of Atlas atlas and Post post.
  • the networks include a first network (NET atlas-post ) that generates a DDF from the atlas space to the Post-CT space ( DDF atlas-post ) and a second network (NET post-atlas ) that generates a DDF from the Post-CT space to the atlas space ( DDF post-atlas ).
  • FidV atlas and FidV post are fiducial vertices randomly sampled from Mesh atlas and Mesh post on the fly for calculating fiducial registration error (FRE) during training.
  • FRE fiducial registration error
  • Pre-CT image, the segmentation masks, and the fiducial points in sSpc are warped to tSpc by using the corresponding DDFs (note that one DDF is used for the images and masks and the other for the fiducial points), and the results are denoted as Pre sSpc-tSpc , Mask sSpc-tSpc , and FidV sSpc-tSpc .
  • Pre sSpc-tSpc, Mask sSpc-tSpc , and FidV sSpc-tSpc are transferred back to sSpc using the corresponding DDF, and the results are denoted as Pre sSpc-tSpc-sSpc, Mask sSpc-tSpc- sSpc, and FidV sSpc-tSpc-sSpc, respectively.
  • the training objective for NET sSpc-tSpc can be constructed by using similarity measurements between the target object in tSpc (denoted as O tSpc ) and the source object that has been transferred to tSpc from sSpc (denoted as O sSpc-tSpc ).
  • the multiscale soft probabilistic Dice (MSPDice) between Mask tSpc and Mask sSpc-tSpc which is denoted as MSPDice(Mask tSpc, Mask sSpc-tSpc ), is used to measure the similarity of the segmentation masks.
  • the multiscale soft probabilistic Dice is less sensitive to the class imbalance in the segmentation tasks and is more appropriate for measuring label similarity in the context of image registration.
  • the similarity between FidV tSpc and FidV sSpc-tSpc is measured by the mean fiducial registration error FRE(FidV tSpc, FidV sSpc-tSpc ), which is calculated as the average Euclidean distance between the vertices in FidV tSpc and the corresponding vertices in FidV sSpc-tSpc.
  • NCC normalized cross-correlation
  • CycConsis sSpc-tSpc measures the similarity between the original source objects in the source space and the source objects that have been transferred from the source space to the target space and then transferred back to the source space, which is calculated as MSPDice(Mask sSpc, Mask sSpc-tSpc-sSpc ) + 2 x FRE( FidV sSpc, FidV sSpc-tSpc-sSpc ) + 0.5 x NCC(Pre sSpc, Pre sSpc-tSpc-sSpc ).
  • the DDF from the source space to the target space DDF sSpc-tSpc is regularized using bending energy, which is denoted as BendE(DDF sSpc-tSpc ).
  • the learnable parameters of the registration network NET sSpc-tSpc are regularized by an L2 term, which is denoted as L2( NET sSpc-tSpc ).
  • the training objective for the networks is the weighted sum of the loss terms listed in Table 1; wherein the weights have been selected empirically by looking at training performance on a small number of epochs.
  • NET sSpc-tSpc which is tasked with generating a DDF for warping the source image S to the target image T, is composed of a Global-net and a Local-net.
  • the Global-net After receiving the concatenation of S and T, the Global-net generates an affine transformation matrix. S is warped to T by using this affine transformation and the resulting image is denoted as S.
  • the Local-net takes the concatenation of S’ and T to generate a non-rigid local DDF.
  • the affine transformation and the local DDF are composed to produce the output DDF.
  • the details about the Global-net and Local-net can be found in Hu et al.
  • the Global-net is a 3D convolutional neural network with three down- sampling blocks, followed by an output block that maps the extracted image information to the affine transformation parameters.
  • the Local-net is a 3D convolutional neural network based on an adapted U-network architecture with three down-sampling blocks, followed by three up-sampling blocks.
  • NET sSpc- tSpc is trained end-to-end, i.e., the Global-net and the Local-net are trained together using the same loss function which is the weighted sum of the loss terms listed in Table 1.
  • the ICA in Post post can be segmented by warping Mesh atlas to PostImg post using the DDF generated by the trained network.
  • the resulting segmentation mesh of the ICA is denoted as Mesh atlas-post.
  • Mesh post is used as the ground truth for comparison.
  • Mesh atlas and Mesh post are the outputs of the ASM method, both of them have a predefined number of vertices, and the vertices of Mesh atlas and Mesh post have a one-to-one correspondence.
  • Point-to-point error computed as the Euclidean distance in millimeters, between the corresponding vertices on Mesh atlas-post and Mesh post are used to quantify the accuracy of the segmentation and registration.
  • the P2PEs between the corresponding vertices on Mesh post and the meshes generated by cGANs+ASM are calculated and serve as values that are used to compare the novel method with the state of the art (SOTA).
  • Hu et al which uses a unidirectional registration network trained with the MSPDice loss and the regularization loss, is used as a baseline for comparison.
  • the training objective also includes the FRE loss, NCC loss, and the cycle-consistency loss.
  • An ablation study is conducted to analyze how these loss terms affect the performance of the networks.
  • the 624 ears are partitioned into 465 ears for training, 66 ears for validation, and 93 ears for testing.
  • the partition is random, with the constraint that ears of the same object cannot be used in both training and testing.
  • Augmentation is applied to the training set by rotating each image by 6 random angles in the range of -25 and 25 degrees about the x-, y-, and z-axis.
  • the training images are blurred by applying a Gaussian filter with a kernel size selected randomly from ⁇ 0, 0.5, 1.0, 1.5 ⁇ with equal probability. This results in a training set expanded to 8835 images. Each image is clipped between its 5th and 95th intensity percentiles, and the intensity values are rescaled to -1 to 1.
  • a batch size of 1 is used, at each training step, 30% of the vertices on the ICA meshes are randomly sampled and used as the fiducial points for calculating the FRE loss.
  • FIGS. 4A-4B show two cases for which the method leads to (FIG. 4 A) good and (FIG. 4B) poor results.
  • the first row shows three orthogonal views of the original atlas image in the atlas space.
  • the second row shows the Post-CT image.
  • the third row shows the atlas image registered to the Post-CT image.
  • the fourth row shows the paired Pre-CT image of the Post- CT image.
  • the warped atlas image (third row) should be as similar as possible to the Pre-CT image (fourth row).
  • the last row shows the original segmentation mesh in the atlas image ( Mesh atlas ), the segmentation mesh in the Post-CT image generated using the method ( Mesh atlas-post ), and the ground truth mesh in the Post-CT image (Mesh post ).
  • Mesh atlas and Mesh post the ST, the SV, and the MD are shown in red, blue, and green, respectively.
  • Mesh atlas-post is color-coded with the P2PE at each vertex on the mesh surfaces. Both these cases illustrate the severity of the artifact introduced by the implant. In the second case, the cochlea is barely visible.
  • FIGS. 5A- 5C show the boxplots of these statistics for the 93 testing ears.
  • cGAN+ASM denotes the results of the SOTA.
  • Novel denotes the results of the method according to the invention.
  • Novel-NoNCC denotes the results of the method according to the invention.
  • Novel-NoNCC denotes the results of the method according to the invention.
  • Novel-NoNCC denotes the results of the method according to the invention.
  • Novel-NoNCC “Novel-NoCycConsis”
  • Novel-NoFRE denote the results of the novel networks trained without using the NCC loss, the cycle-consistency loss, and the FRE loss.
  • Baseline denotes the results of the baseline method.
  • No registration denotes the P2PEs between the vertices on the mesh surfaces in the original atlas space and the Post-CT space.
  • Two-sided and one-sided Wilcoxon signed-rank tests between the “Novel” group and the other groups are performed. The p-values have been corrected using the Holm-Bonferroni method. The median values for each group are shown on top of the boxplots, in which red denotes that both the two-sided and the one-sided tests are significant, cyan denotes that only the two-sided test is significant, and blue denotes that the two-sided test is not significant.
  • the results show that the networks trained using all of the novel loss terms achieve a significantly lower segmentation error compared to the baseline method and the networks that are not trained using all of the loss terms.
  • the method according to the invention produces results that are similar to those obtained with the SOTA in terms of the medians of the segmentation error.
  • the Max of the segmentation error and the STD of the segmentation error for the SV and MD remain slightly superior to those obtained with the SOTA.
  • the SOTA is a two-step process: (1) generate a synthetic Pre-CT image from a Post-CT image with cGANs trained for this purpose and (2) apply an ASM method to the synthetic image.
  • Step 2 requires the very accurate registration of an atlas to the image to be segmented to initialize the ASM. This is achieved through an affine and then a non-rigid intensity-based registration in a volume-of-interest that includes the inner ear.
  • Step 1 takes about 0.3s while step 2 takes on average 75s.
  • the novel method according to the invention only requires providing a volume-of-interest that includes the inner ear to the networks and inference time is also about 0.3s. Segmentation is thus essentially instantaneous with the novel method while it takes over a minute with the SOTA. This is of importance for clinical deployment and end-user acceptance.
  • the exemplary example discloses networks capable of performing image registration between artifact-affected CT images and an artifact-free atlas image, which is a very challenging task because of the severity of the artifact introduced by the implant.
  • a point-to-point loss is introduced, which, to the best of inventors’ knowledge, has not yet been reported.
  • the experiments have shown that this loss is critical to achieve results that are comparable to those obtained with the SOTA that relies on an ASM fitted to a preoperative image synthesized from a post-operative image.
  • ASM methods always produce plausible shapes.
  • the network also produces plausible shapes even when the images are of very poor quality (see FIG. 4B). Thanks to the point-to-point loss, the network has been able to learn the shape of the cochlea and can fit this shape to partial information in the post-operative image. More experiments are ongoing to verify this hypothesis.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Radiology & Medical Imaging (AREA)
  • Veterinary Medicine (AREA)
  • Optics & Photonics (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Surgery (AREA)
  • Animal Behavior & Ethology (AREA)
  • High Energy & Nuclear Physics (AREA)
  • Public Health (AREA)
  • Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Apparatus For Radiation Diagnosis (AREA)

Abstract

Methods and systems for segmentation of structures of interest (SOI) in a CT image post-operatively acquired with an implant user in a region of interest in which an implant is implanted. The method includes inputting the post-operatively acquired CT (Post-CT) image to trained networks to generate a dense deformation field (DDF) from the input Post-CT image to an atlas image; and warping a segmentation mesh of the SOI in the atlas image to the input Post-CT image using the DDF so as to generate the segmentation mesh of the SOI in the input Post-CT image, wherein the segmentation mesh of the SOI in the atlas image is generated by applying an active shape model-based method to the atlas image.

Description

METHODS OF AUTOMATIC SEGMENTATION OF ANATOMY IN ARTIFACT AFFECTED CT IMAGES WITH DEEP NEURAL NETWORKS AND APPLICATIONS
OF SAME
STATEMENT AS TO RIGHTS UNDER FEDERALLY-SPONSORED RESEARCH
This invention was made with government support under Grant Nos. R01DC014037 and R01DC014462 awarded by the National Institutes of Health (NIH). The government has certain rights in the invention.
CROSS-REFERENCE TO RELATED PATENT APPLICATIONS
This application claims priority to and the benefit of U.S. Provisional Patent Application Serial No. 63/179,655, filed April 26, 2021, which is incorporated herein in its entirety by reference.
This application is also a continuation-in-part application of U.S. Patent Application Serial No. 17/266,180, filed February 5, 2021, which is a national stage entry of PCT Patent Application No. PCT/US2019/045221, filed August 6, 2019, which itself claims priority to and the benefit of U.S. Provisional Patent Application Serial No. No. 62/714,831, filed August 6, 2018, which are incorporated herein in their entireties by reference.
FIELD OF THE INVENTION
The invention relates generally to cochlear implants, and more particularly, to atlas-based methods of automatic segmentation of intracochlear anatomy in metal artifact affected CT images of the ear with deep neural networks and applications of the same.
BACKGROUND OF THE INVENTION
The background description provided herein is for the purpose of generally presenting the context of the present invention. The subject matter discussed in the background of the invention section should not be assumed to be prior art merely as a result of its mention in the background of the invention section. Similarly, a problem mentioned in the background of the invention section or associated with the subject matter of the background of the invention section should not be assumed to have been previously recognized in the prior art. The subject matter in the background of the invention section merely represents different approaches, which in and of themselves may also be inventions. Work of the presently named inventors, to the extent it is described in the background of the invention section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present invention.
The cochlea (FIG. 1C) is a spiral-shaped structure that is part of the inner ear involved in hearing. It contains two main cavities: the scala tympani (ST) and the scala vestibuli (SV). The modiolus (MD) is a porous bone around which the cochlea is wrapped that hosts the auditory nerves. A cochlear implant (Cl) is an implanted neuroprosthetic device that is designed to produce hearing sensations in a person with severe to profound deafness by electrically stimulating the auditory nerves. CIs are programmed postoperatively in a process that involves activating all or a subset of the electrodes and adjusting the stimulus level for each of these to a level that is beneficial to the recipient. Programming parameters adjustment is influenced by the intracochlear position of the Cl electrodes, which requires the accurate localization of the Cl electrodes relative to the intracochlear anatomy (ICA) in the post-implantation CT (Post-CT) images of the Cl recipients. This, in turn, requires the accurate segmentation of the ICA in the Post-CT images. Segmenting the ICA in the Post-CT images is challenging due to the strong artifacts produced by the metallic Cl electrodes (FIG. IB) that can obscure these structures, often severely. For patients who have been scanned before implantation, the segmentation of the ICA can be obtained by segmenting their pre-implantation CT (Pre-CT) image (FIG. 1A) using an active shape model-based (ASM) method. The outputs of the ASM method are surface meshes of the ST, the SV, and the MD that have a predefined number of vertices. Importantly, each vertex corresponds to a specific anatomical location on the surface of the structures and the meshes are encoded with the information needed for the programming of the implant. Preserving point-to- point correspondence when registering the images is thus of critical importance in the application. The ICA in the Post-CT image of the patients can be obtained by registering their Pre-CT image to the Post-CT image and then transferring the segmentations of the ICA in the Pre-CT image to the Post-CT image using that transformation. This approach does not extend to Cl recipients for whom a Pre-CT image is unavailable, which is the case for long-term recipients who were not scanned before surgery, or for recipients for whom images cannot be retrieved.
Therefore, a heretofore unaddressed need exists in the art to address the aforementioned deficiencies and inadequacies. SUMMARY OF THE INVENTION
In one aspect, the invention relates to a method for segmentation of structures of interest (SOI) in a computed tomography (CT) image post- operatively acquired from an implant user in a region of interest in which an implant is implanted. The method comprises providing an atlas image, a dataset and networks, wherein the dataset comprises a plurality of CT image pairs, randomly partitioned into a training set, a validation set, and a testing set, wherein each CT image pair has a pre-implantation CT (Pre-CT) image and a post-implantation CT (Post-CT) image respectively acquired in a region of a respective implant recipient before and after an implant is implanted in the region, so that the Pre-CT image and the Post-CT image of each CT image pair are an artifact-free CT image and an artifact-affected CT image, respectively; wherein the atlas image is a Pre-CT image of the region of a subject that is not in the plurality of CT image pairs; and wherein the networks comprises a first network for registering the atlas image to each Post-CT image and a second network for registering each Post-CT image to the atlas image.
The method also comprises training the networks with the plurality of CT image pairs, so as to learn to register the artifact-affected CT images and the atlas image with assistance of the paired artifact-free CT images; inputting the CT image post-operatively acquired from the implant user to the trained networks to generate a dense deformation field (DDF) from the input Post-CT image to the atlas image; and warping a segmentation mesh of the SOI in the atlas image to the input Post-CT image using the DDF so as to generate the segmentation mesh of the SOI in the input Post-CT image, wherein the segmentation mesh of the SOI in the atlas image is generated by applying an active shape model-based (ASM) method to the atlas image.
In one embodiment, said providing the dataset comprises, for each CT image pair, rigidly registering the Pre-CT image to the Post-CT image; and aligning the registered Pre-CT and Post-CT image pair to the atlas image so that anatomical structures in the region of the respective implant recipient are roughly in the same spatial location and orientation.
In one embodiment, said providing the dataset further comprises, for each CT image pair, applying the ASM method to the registered Pre-CT image to generate a segmentation mesh of the SOI in the registered Pre-CT image (Meshpre ); transferring the segmentation mesh of the SOI in the registered Pre- CT image (Meshpre) to the Post-CT image, so as to generate a segmentation mesh of the SOI in the Post-CT image (Meshpost); and converting the segmentation mesh of the SOI in the Post-CT image (Meshpost) to segmentation masks of the SOI in the Post-CT image (Segpost ). In one embodiment, all of the images are resampled to an isotropic voxel size, and cropped with images of 3D voxels containing the structures of interest.
In one embodiment, said providing the dataset comprises applying image augmentation to the training set by rotating each image by a plurality of small random angles in the range of -25 and 25 degrees about the x-axis, y-axis, and z-axis, to create additional training images from each original image.
In one embodiment, the networks have network architecture in which NETsSpc-tspc, designed for generating a DDF for warping a source image S to a target image T, comprises a Global-net and a Local-net, wherein the network architecture is configured such that after receiving the concatenation of S and T, the Global-net generates an affine transformation matrix; S is warped to T by using the affine transformation to generate an image S’; the Local-net takes the concatenation of S’ and T to generate a non-rigid local DDF; and the affine transformation and the local DDF are composed to produce an output DDF.
In one embodiment, said training the networks comprises inputting a concatenation of the atlas image (Atlasatlas) and each Post-CT image (Post post) into the networks so that the first network (NETatlas-post) generates a DDF from the atlas space to the Post-CT space (DDFatias-post ) and the second network (NET Post-atlas) generates a DDF from the Post-CT space to the atlas space (DDF post-atlas); warping the Pre-CT image ( PresSpc ), the segmentation masks ( MasksSpc ), and the fiducial vertices (FidVsSpc) in a source space to a target space by using the corresponding DDFs, to generate PresSpc-tspc , MasksSpc-tSpc , and FidVsSpc-tSpc ; and transferring PresSpc-tSpc , MasksSpc-tSpc, and FidVsSpc-tSpc back to sSpc using the corresponding DDF, to generate PresSpc-tSpc-sSpc , MasksSpc-tSpc- sSpc , and FidVsSpc-tSpc-sSpc, respectively, wherein OxSpc denotes an object O in the x space, sSpc and tSpc respectively denote the source and target spaces.
In one embodiment, FidVatlas and FidVpost are the fiducial vertices randomly sampled from Meshatlas and Meshpost on the fly for calculating the fiducial registration error during training.
In one embodiment, the training objective for NETsSpc-tSpc is constructed using measurements of similarity between a target object in tSpc ( OtSpc ) and a source object that is transferred to tSpc from sSpc (OsSpc-tSpc).
In one embodiment, the training objective for the networks is a weighted sum of loss terms of MSPDice, Mean FRE, NCC, CycConsis, BendE, and L2.
MSPDice = MSPDice( Maskpost, Maskatlas-post ) +MSPDice( Maskatlas,
Maskpost-atlas), wherein MSPDice(MasktSpc, MasksSpc-tSpc ) is a multiscale soft probabilistic Dice between MasktSpc and MasksSpc-tSpc that measures the similarity of the segmentation masks between MasktSpc and MasksSpc-tSpc. s Mean FRE = FRE( FidVpost, FidVatlas-post) + FRE(FidVatlas, FidVpost-atlas ), wherein FRE (FidVtSpc, FidVsSpc-tSpc) is a mean fiducial registration error that measures the similarity of the fiducial vertices between FidVtSpc and FidVsSpc- tSpc , and is calculated as the average Euclidean distance between the fiducial vertices in FidVtSpc and the corresponding vertices in FidVsSpc-tSpc;
NCC = NCC (Prepost, Atla atlas-post) + NCC(Atlasatlas, Prepost-atlas), wherein NCC(PregtSpc, PresSpc-tSpc ) is a normalized cross-correlation between PretSpc and PresSpc-tSpc that measures the similarity between the warped source image and the target image.
CycConsis = CycConsisatlas-post + CycConsispost-atlas , wherein CycConsissSpc-tSpc is a cycle-consistency loss that measures the similarity between the original source objects in the source space and the source objects that are transferred from the source space to the target space and then transferred back to the source space, wherein CycConsissSpc-tSpc =
MSPDice ( MasksSpc , MasksSpc-tSpc-sSpc) + 2 x FRE(FidVsSpc , FidVsSpc-tSpc-sSpc ) + 0.5 x NCC( PresSpc , PresSpc-tSpc-sSpc ) .
BendE = BendE(DDFatlas-post) + BendE (DDFpost-atlas), wherein BendE(DDFsSpc-tSpc) is a bending loss for which the DDF from the source space to the target space DDFsSpc-tSpc is regularized using bending energy.
L2 = L2 (NETatlas-post) + L2(NETpost-atlas), wherein L2 (NETsSpc-tSpc) is a L2 loss for which the learnable parameters of the registration network NETsSpc-tSpc are regularized by an L2 term.
In one embodiment, the region of interest includes ear, brain, heart, or other organs of a living subject.
In one embodiment, the structures of interest comprise anatomical structures in the region of interest.
In one embodiment, the anatomical structures comprise intra cochlear anatomy (ICA).
In one embodiment, the implant is a cochlear implant, a deep brain stimulator, or a pacemaker.
In another aspect, the invention relates to a method for segmentation of structures of interest (SOI) in a CT image post-operatively acquired with an implant user in a region of interest in which an implant is implanted. The method includes inputting the post-operatively acquired CT (Post-CT) image to trained networks to generate a dense deformation field (DDF) from the input Post-CT image to an atlas image, wherein the atlas image is a Pre-CT image; and warping a segmentation mesh of the SOI in the atlas image to the input Post-CT image using the DDF so as to generate the segmentation mesh of the SOI in the input Post-CT image, wherein the segmentation mesh of the SOI in the atlas image is generated by applying an active shape model-based (ASM) method to the atlas image.
In one embodiment, the networks have network architecture in which NETsSpc-tSpc, designed for generating a DDF for warping a source image S to a target image T, comprises a Global-net and a Local-net, wherein the network architecture is configured such that after receiving the concatenation of S and T, the Global-net generates an affine transformation matrix; S is warped to T by using the affine transformation to generate an image S’; the Local-net takes the concatenation of S’ and T to generate a non-rigid local DDF; and the affine transformation and the local DDF are composed to produce an output DDF.
In one embodiment, the networks is trained with a dataset comprising a plurality of CT image pairs, randomly partitioned into a training set, a validation set, and a testing set, wherein each CT image pair has a pre- implantation CT (Pre-CT) image and a post-implantation CT (Post-CT) image respectively acquired in a region of a respective implant recipient before and after an implant is implanted in the region, so that the Pre-CT image and the Post-CT image of each CT image pair are an artifact-free CT image and an artifact-affected CT image, respectively.
In one embodiment, the Pre-CT image is rigidly registered to the Post- CT image for each CT image pair; and the registered Pre-CT and Post-CT image pair is aligned to the atlas image so that anatomical structures in the region of the respective implant recipient are roughly in the same spatial location and orientation.
In one embodiment, for each CT image pair, the ASM method is applied to the registered Pre-CT image to generate a segmentation mesh of the SOI in the registered Pre-CT image ( Meshpre ); the segmentation mesh of the SOI in the registered Pre-CT image (Meshpre) is transferred to the Post-CT image so as to generate a segmentation mesh of the SOI in the Post-CT image (Meshpost); and the segmentation mesh of the SOI in the Post-CT image (Meshpost) is converted to segmentation masks of the SOI in the Post-CT image(Segpost).
In one embodiment, the networks are trained to learn to register the artifact-affected CT images and the atlas image with assistance of the paired artifact-free CT images.
In one embodiment, the networks is trained by inputting a concatenation of the atlas image (Atlasatlas) and each Post-CT image (Postpost) into the networks so that the first network (NETatlas-Post) generates a DDF from the atlas space to the Post-CT space ( DDFatlas-post ) and the second network ( NETpost-atlas ) generates a DDF from the Post-CT space to the atlas space (DDFpost-atlas); warping the Pre-CT image ( PresSpc ), the segmentation masks (MasksSpc), and the fiducial vertices ( FidVsSpc ) in a source space to a target space by using the corresponding DDFs, to generate PresSpc-tSpc , MasksSpc-tSpc, and FidVsSpc-tSpc; and transferring PresSpc-tSpc, MasksSpc-tSpc, and FidVsSpc-tSpc back to sSpc using the corresponding DDF, to generate PresSpc-tSpc-sSpc, MasksSpc-tSpc-sSpc, and FidVsSpc- tSpc-sSpc, respectively, wherein OxSpc denotes an object O in the x space, sSpc and tSpc respectively denote the source and target spaces.
In one embodiment, the training objective for NETsSpc-tSpc is constructed using measurements of similarity between a target object in tSpc ( OtSpc ) and a source object that is transferred to tSpc from sSpc ( OsSpc-tSpc ).
In one embodiment, the trained networks are characterized with a training objective that is a weighted sum of loss terms of MSPDice, Mean FRE, NCC, CycConsis, BendE, and L2.
MSPDice = MSPDice(Maskpost, Maskatlas-post ) +MSPDice(Maskatlas, Maskpost-atlas), wherein MSPDice(MasktSpc , MasksSpc-tSpc ) is a multiscale soft probabilistic Dice between MasktSpc and MasksSpc-tSpc that measures the similarity of the segmentation masks between MasktSpc and MasksSpc-tSpc,
Mean FRE = FRE(FidVpost, FidVatlas-post ) + FRE(FidVatlas, FidVpost-atlas ) , wherein FRE (FidVtSpc, FidVsSpc-tSpc ) is a mean fiducial registration error that measures the similarity of the fiducial vertices between FidVtSpc and FidVsSpc- tSpc , and is calculated as the average Euclidean distance between the fiducial vertices in FidVsSpc and the corresponding vertices in FidVsSpc-tSpc.
NCC = NCC ( Prepost , Atlasatlas-post) + NCC(Atlasatlas, Prepost-atlas ) , wherein NCC(PregtSpc, PresSpc-tSpc ) is a normalized cross-correlation between PretSpc and PresSpc-tSpc that measures the similarity between the warped source image and the target image.
CycConsis = CycConsisatlas-post + CycConsispost-atlas, wherein CycConsissSpc-tSpc is a cycle-consistency loss that measures the similarity between the original source objects in the source space and the source objects that are transferred from the source space to the target space and then transferred back to the source space, wherein CycConsissSpc-tSpc =
MSPDice ( MasksSpc , MasksSpc-tSpc-sSpc ) + 2 x FRE( FidVsSpc, FidVsSpc-tSpc-sSpc ) + 0.5 x NCC ( PresSpc , PresSpc-tSpc-sSpc).
BendE = BendE(DDFatlas-post ) + BendE(DDFpost-atlas ), wherein BendE (DDFsSpc-tSpc) is a bending loss for which the DDF from the source space to the target space DDFsSpc-tSpc is regularized using bending energy.
L2 = L2(NETatlas-post) + L2(NETpost-atlas), wherein L2(NETsSpc-tSpc) is a L2 loss for which the learnable parameters of the registration network NETsSpc-tSpc are regularized by an L2 term. In yet another aspect, the invention relates to a non-transitory tangible computer-readable medium storing instructions which, when executed by one or more processors, cause a system to perform the above methods.
In a further aspect, the invention relates to a system for segmentation of structures of interest (SOI) in a CT image post-operatively acquired from an implant user in a region of interest in which an implant is implanted. The system includes a trained networks for generating dense deformation fields (DDFs) in opposite directions; and a microcontroller coupled with the trained networks and configured to generate a DDF from the post-operatively acquired CT (Post-CT) image to an atlas image; warp a segmentation mesh of the SOI in the atlas image to the Post-CT image using the DDF to generate a segmentation mesh of the SOI in the Post-CT image, wherein the segmentation mesh of the SOI in the atlas image is generated by applying an active shape model-based (ASM) method to the atlas image, wherein the atlas image is a Pre-CT image.
In one embodiment, the trained networks comprises networks having network architecture in which NETsSpc-tSpc, designed for generating a DDF for warping a source image S to a target image T, comprises a Global-net and a Local-net, wherein the network architecture is configured such that after receiving the concatenation of S and T, the Global-net generates an affine transformation matrix; S is warped to T by using the affine transformation to generate an image S’; the Local-net takes the concatenation of S ’ and T to generate a non-rigid local DDF; and the affine transformation and the local DDF are composed to produce an output DDF.
In one embodiment, the microcontroller is further configured to train the network with a dataset comprising a plurality of CT image pairs, randomly partitioned into a training set, a validation set, and a testing set, wherein each CT image pair has a pre-implantation CT (Pre-CT) image and a postimplantation CT (Post-CT) image respectively acquired in a region of a respective implant recipient before and after an implant is implanted in the region, so that the Pre-CT image and the Post-CT image of each CT image pair are an artifact-free CT image and an artifact-affected CT image, respectively.
In one embodiment, the microcontroller is further configured to apply image augmentation to the training set by rotating each image by a plurality of small random angles in the range of -25 and 25 degrees about the x-axis, y- axis, and z-axis, to create additional training images from each original image.
In one embodiment, the microcontroller is further configured to rigidly register the Pre-CT image to the Post-CT image for each CT image pair; and align the registered Pre-CT and Post-CT image pair to the atlas image so that anatomical structures in the region of the respective implant recipient are roughly in the same spatial location and orientation.
In one embodiment, the microcontroller is further configured to, for each CT image pair, apply the ASM method to the registered Pre-CT image to generate a segmentation mesh of the SOI in the registered Pre-CT image ( Meshpre ); transfer the segmentation mesh of the SOI in the registered Pre-CT image (Meshpre) to the Post-CT image, so as to generate a segmentation mesh of the SOI in the Post-CT image (Meshpost ); and convert the segmentation mesh of the SOI in the Post-CT image ( Meshpost ) to segmentation masks of the SOI in the Post-CT image (Segpost ).
In one embodiment, the networks are trained to learn to register the artifact-affected CT images and the atlas image with assistance of the paired artifact-free CT images.
In one embodiment, the networks is trained by inputting a concatenation of the atlas image (Atlasatlas ) and each Post-CT image ( Postpost ) into the networks so that the first network (NETatlas-post ) generates a DDF from the atlas space to the Post-CT space ( DDFatlas-post ) and the second network ( NETpost-atlas ) generates a DDF from the Post-CT space to the atlas space ( DDFpost-atlas); warping the Pre-CT image (PresSpc ), the segmentation masks ( MasksSpc ), and the fiducial vertices ( FidVsSpc) in a source space to a target space by using the corresponding DDFs, to generate PresSpc-tSpc, MasksSpc-tSpc, and FidVsSpc-tSpc ; and transferring PresSpc-tSpc, MasksSpc-tSpc, and FidVsSpc-tSpc back to sSpc using the corresponding DDF, to generate PresSpc-tSpc-sSpc, MasksSpc-tSpc-sSpc, and FidVsSpc- tSpc-sSpc, respectively, wherein OxSpc denotes an object O in the x space, sSpc and tSpc respectively denote the source and target spaces.
In one embodiment, the training objective for NETsSpc-tSpc is constructed using measurements of similarity between a target object in tSpc ( OtSpc ) and a source object that is transferred to tSpc from sSpc ( OsSpc-tSpc ).
In one embodiment, the trained networks are characterized with a training objective that is a weighted sum of loss terms of MSPDice, Mean FRE, NCC, CycConsis, BendE, and L2.
MSPDice = MSPDice(Mask post, Maskatlas-post ) +MSPDice(Maskatlas, Maskpost-atlas ), wherein MSPDice(MasktSpc , MasksSpc-tSpc ) is a multiscale soft probabilistic Dice between MasktSpc and MasksSpc-tSpc that measures the similarity of the segmentation masks between MasktSpc and MasksSpc-tSpc.
Mean FRE = FRE (FidVpost, FidVatlas-post ) + FRE (FidVatlas, FidVpost-atlas ), wherein FRE(FidVtSpc, FidVsSpc-tSpc) is a mean fiducial registration error that measures the similarity of the fiducial vertices between FidVtSpc and FidVsSpc- tSpc , and is calculated as the average Euclidean distance between the fiducial vertices in FidVtSpc and the corresponding vertices in FidVsSpc-tSpc.
NCC = NCC ( Prepost, Atlasatlas-post) + NCC(Atlasatlas, Prepost-atlas), wherein NCC(PregtSpc, PresSpc-tSpc) is a normalized cross-correlation between PresSpc and PresSpc-tSpc that measures the similarity between the warped source image and the target image.
CycConsis = CycConsisatlas-post + CycConsispost-atlas, wherein CycConsissSpc-tSpc is a cycle-consistency loss that measures the similarity between the original source objects in the source space and the source objects that are transferred from the source space to the target space and then transferred back to the source space, wherein CycConsissSpc-tSpc =
MSPDice(MasksSpc, MasksSpc-tSpc-sSpc) + 2x FRE( FidVsSpc, FidVsSpc-tSpc-sSpc) + 0.5 x NCC(PresSpc, PresSpc-tSpc-sSpc).
BendE = BendE(DDFatlas-post) + BendE(DDFpost-atlas), wherein BendE(DDFsSpc-tSpc) is a bending loss for which the DDF from the source space to the target space DDFsSpc-tSpc is regularized using bending energy.
L2 = L2 (NETatlas-post) + L2 (NETpost-atlas), wherein L2 ( NETsSpc-tSpc) is a L2 loss for which the learnable parameters of the registration network NETsSpc-tSpc are regularized by an L2 term.
In one embodiment, the region of interest includes ear, brain, heart, or other organs of a living subject.
In one embodiment, the structures of interest comprise anatomical structures in region of interest.
In one embodiment, the anatomical structures comprise intra cochlear anatomy (ICA).
In one embodiment, the implant is a cochlear implant, a deep brain stimulator, or a pacemaker.
These and other aspects of the present invention will become apparent from the following description of the preferred embodiments, taken in conjunction with the following drawings, although variations and modifications therein may be affected without departing from the spirit and scope of the novel concepts of the disclosure.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings illustrate one or more embodiments of the invention and, together with the written description, serve to explain the principles of the invention. The same reference numbers may be used throughout the drawings to refer to the same or like elements in the embodiments.
FIGS. 1 A- IB show schematically a pair of registered Pre-CT and Post- CT images, respectively, of an ear of a Cl recipient. FIG. 1C is an illustration of the intracochlear anatomy with an implanted Cl electrode array. The meshes of the ST, the SV, and the MD are obtained by applying the ASM method to the Pre-CT image
FIGS. 2A-2C show schematically the framework of the method according to embodiments of the invention. FIG. 2A: Objects used for training the networks. FIG. 2B: Training phase. FIG. 2C: Inference phase.
FIG. 3 is an illustration of a registration network NETsSpc-tSpc that is tasked to generate a DDF from the source space to the target space according to embodiments of the invention.
FIGS. 4A-4B show two example cases in which the method leads to (FIG. 4A) good and (FIG. 4B) poor results according to embodiments of the invention.
FIGS. 5A-5C show boxplots of (FIG. 5 A) the median, (FIG. 5B) the Max, and (FIG. 5C) the STD of the P2PEs, according to embodiments of the invention. Boxplots from the left-hand side to the right-hand side are respectively for “cGAN + ASM”, “Novel”, “Novel-NoNCC”, “Novel- NoCycConsis”, “Novel-NoFRE”, “Baseline”, and “No registration”, for each of ST, SV and MD.
DETAILED DESCRIPTION OF THE INVENTION
The invention will now be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments of the invention are shown. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this invention will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Like reference numerals refer to like elements throughout.
The terms used in this specification generally have their ordinary meanings in the art, within the context of the invention, and in the specific context where each term is used. Certain terms that are used to describe the invention are discussed below, or elsewhere in the specification, to provide additional guidance to the practitioner regarding the description of the invention. For convenience, certain terms may be highlighted, for example using italics and/or quotation marks. The use of highlighting has no influence on the scope and meaning of a term; the scope and meaning of a term is the same, in the same context, whether or not it is highlighted. It will be appreciated that same thing can be said in more than one way. Consequently, alternative language and synonyms may be used for any one or more of the terms discussed herein, nor is any special significance to be placed upon whether or not a term is elaborated or discussed herein. Synonyms for certain terms are provided. A recital of one or more synonyms does not exclude the use of other synonyms. The use of examples anywhere in this specification including examples of any terms discussed herein is illustrative only, and in no way limits the scope and meaning of the invention or of any exemplified term. Likewise, the invention is not limited to various embodiments given in this specification.
It will be understood that, as used in the description herein and throughout the claims that follow, the meaning of "a", "an", and "the" includes plural reference unless the context clearly dictates otherwise. Also, it will be understood that when an element is referred to as being "on" another element, it can be directly on the other element or intervening elements may be present therebetween. In contrast, when an element is referred to as being "directly on" another element, there are no intervening elements present. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
It will be understood that, although the terms first, second, third etc. may be used herein to describe various elements, components, regions, layers and/or sections, these elements, components, regions, layers and/or sections should not be limited by these terms. These terms are only used to distinguish one element, component, region, layer or section from another element, component, region, layer or section. Thus, a first element, component, region, layer or section discussed below could be termed a second element, component, region, layer or section without departing from the teachings of the invention.
Furthermore, relative terms, such as "lower" or "bottom" and "upper" or "top," may be used herein to describe one element's relationship to another element as illustrated in the figures. It will be understood that relative terms are intended to encompass different orientations of the device in addition to the orientation depicted in the figures. For example, if the device in one of the figures is turned over, elements described as being on the "lower" side of other elements would then be oriented on "upper" sides of the other elements. The exemplary term "lower", can therefore, encompasses both an orientation of "lower" and "upper," depending of the particular orientation of the figure. Similarly, if the device in one of the figures, is turned over, elements described as "below" or "beneath" other elements would then be oriented "above" the other elements. The exemplary terms "below" or "beneath" can, therefore, encompass both an orientation of above and below.
It will be further understood that the terms "comprises" and/or "comprising," or "includes" and/or "including" or "has" and/or "having", or "carry" and/or "carrying," or "contain" and/or "containing," or "involve" and/or "involving, and the like are to be open-ended, i.e., to mean including but not limited to. When used in this invention, they specify the presence of stated features, regions, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, regions, integers, steps, operations, elements, components, and/or groups thereof.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present invention, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
As used herein, "around", "about" or "approximately" shall generally mean within 20 percent, preferably within 10 percent, and more preferably within 5 percent of a given value or range. Numerical quantities given herein are approximate, meaning that the term "around", "about" or "approximately" can be inferred if not expressly stated.
As used herein, the terms "comprise" or "comprising", "include" or "including", "carry" or "carrying", "has/have" or "having", "contain" or "containing", "involve" or "involving" and the like are to be understood to be open-ended, i.e., to mean including but not limited to.
As used herein, the phrase "at least one of A, B, and C" should be construed to mean a logical (A or B or C), using a non-exclusive logical OR.
It should be understood that one or more steps within a method may be executed in different order (or concurrently) without altering the principles of the invention.
The description below is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses. The broad teachings of the invention can be implemented in a variety of forms. Therefore, while this invention includes particular examples, the true scope of the invention should not be so limited since other modifications will become apparent upon a study of the drawings, the specification, and the following claims. For purposes of clarity, the same reference numbers will be used in the drawings to identify similar elements. It should be understood that one or more steps within a method may be executed in different order (or concurrently) without altering the principles of the invention.
In view of the aforementioned deficiencies and inadequacies, a two-step method, which is referred to as “cGANs+ASM”, is disclosed in U.S. Patent Application Serial No. 17/266,180, which is incorporated herein in its entirety by reference. The method first uses conditional generative adversarial networks (cGANs) to synthesize artifact-free Pre-CT images from the Post-CT images and then uses the ASM method to segment the ICA in the synthetic images. To the best of inventors’ knowledge, the cGANs+ASM is the most accurate automatic method for ICA segmentation in Post-CT images.
One of the objectives of this invention is to provide an atlas-based method to segment the intracochlear anatomy (ICA) in the post-implantation CT (Post-CT) images of cochlear implant (Cl) recipients that preserves the point-to-point correspondence between the meshes in the atlas and the segmented volumes. To solve this problem, which is challenging because of the strong artifacts produced by the implant, a pair of co-trained deep networks that generate dense deformation fields (DDFs) in opposite directions is used. One network is tasked with registering an atlas image to the Post-CT images and the other network is tasked with registering the Post-CT im-ages to the atlas image. The networks are trained using loss functions based on voxel-wise labels, image content, fiducial registration error, and cycle- consistency constraint. The segmentation of the ICA in the Post-CT images is subsequently obtained by transferring the predefined segmentation meshes of the ICA in the atlas image to the Post-CT images using the corresponding DDFs generated by the trained registration networks. The model can learn the underlying geometric features of the ICA even though they are obscured by the metal artifacts. It is shown that the end-to-end network produces results that are comparable to the current state of the art (SOTA) that relies on the two-step method that first uses conditional generative adversarial networks to synthesize artifact-free images from the Post-CT images and then uses an active shape model-based method to segment the ICA in the synthetic images, as disclosed in U.S. Patent Application Serial No. 17/266,180, which is incorporated herein in its entirety by reference. Among other things, the atlas- based method operably produces results in a fraction of the time needed by the SOTA and is more robust to noise and poor image quality and faster, which is important for end-user acceptance.
In one aspect, the invention relates to a method for segmentation of structures of interest (SOI) in a CT image post-operatively acquired from an implant user in a region of interest in which an implant is implanted. In some embodiments, the region of interest includes ear, brain, heart, or other organs of a living subject, the structures of interest comprise anatomical structures in the region of interest, and the anatomical structures comprise the ICA. In some embodiments, the implant is a cochlear implant, a deep brain stimulator, or a pacemaker.
In some embodiments, the method comprises providing an atlas image, a dataset and networks.
The dataset comprises a plurality of CT image pairs, randomly partitioned into a training set, a validation set, and a testing set, wherein each CT image pair has a pre-implantation CT (Pre-CT) image and a postimplantation CT (Post-CT) image respectively acquired in a region of a respective implant recipient before and after an implant is implanted in the region, so that the Pre-CT image and the Post-CT image of each CT image pair are an artifact-free CT image and an artifact-affected CT image, respectively.
In some embodiments, the Pre-CT image is rigidly registered to the Post- CT image for each CT image pair; and the registered Pre-CT and Post-CT image pair is aligned to the atlas image so that anatomical structures in the region of the respective implant recipient are roughly in the same spatial location and orientation.
In some embodiments, for each CT image pair, the ASM method is applied to the registered Pre-CT image to generate a segmentation mesh of the SOI in the registered Pre-CT image ( Meshpre ); the segmentation mesh of the SOI in the registered Pre-CT image ( Meshpre ) is transferred to the Post-CT image so as to generate a segmentation mesh of the SOI in the Post-CT image(Meshpost); and the segmentation mesh of the SOI in the Post-CT image (Meshpost ) is converted to segmentation masks of the SOI in the Post-CT image(Segpost).
In some embodiments, the atlas image is a Pre-CT image of the region of a subject that is not in the plurality of CT image pairs.
In some embodiments, all of the images are resampled to an isotropic voxel size, and cropped with images of 3D voxels containing the structures of interest.
In some embodiments, image augmentation is applied to the training set by rotating each image by a plurality of small random angles in the range of - 25 and 25 degrees about the x-axis, y-axis, and z-axis, to create additional training images from each original image.
In some embodiments, as shown in FIG. 3, the networks have network architecture in which NETsSpc-tSpc, designed for generating a DDF for warping a source image S to a target image T, comprises a Global-net and a Local-net, wherein the network architecture is configured such that after receiving the concatenation of S and T, the Global-net generates an affine transformation matrix; S is warped to T by using the affine transformation to generate an image S’; the Local-net takes the concatenation of S’ and T to generate a non- rigid local DDF; and the affine transformation and the local DDF are composed to produce an output DDF.
In some embodiments, the networks comprises a first network for registering the atlas image to each Post-CT image and a second network for registering each Post-CT image to the atlas image.
The method also comprises training the networks with the plurality of CT image pairs, so as to learn to register the artifact-affected CT images and the atlas image with assistance of the paired artifact-free CT images; inputting the CT image post-operatively acquired from the implant user to the trained networks to generate a DDF from the input Post-CT image to the atlas image; and warping a segmentation mesh of the SOI in the atlas image to the input Post-CT image using the DDF so as to generate the segmentation mesh of the SOI in the input Post-CT image, wherein the segmentation mesh of the SOI in the atlas image is generated by applying an active shape model-based (ASM) method to the atlas image.
In some embodiments, as shown in FIG. 2B, said training the networks comprises inputting a concatenation of the atlas image (Atlasatlas) and each Post-CT image ( Postpost ) into the networks so that the first network ( NETatlas- post) generates a DDF from the atlas space to the Post-CT space ( DDFatlas-post ) and the second network (NETpost-atlas ) generates a DDF from the Post-CT space to the atlas space (DDFpost-atlas ); warping the Pre-CT image ( PresSpc) the segmentation masks ( MasksSpc ), and the fiducial vertices ( FidVsSpc ) in a source space to a target space by using the corresponding DDFs, to generate PresSpc- tSPc, MasksSpc-tSpc , and FidVsSpc-tSpc·, and transferring PresSpc-tSpc , Mask sSpc-tSpc , and FidVsSpc-tSpc, back to sSpc using the corresponding DDF, to generate PresSpc-tSpc- sSpc, MasksSpc-tSpc-sSpc, and FidVsSpc-tSpc-sSpc, respectively, wherein OxSpc denotes an object O in the x space, sSpc and tSpc respectively denote the source and target spaces.
In some embodiments, FidVatlas and FidVpost are the fiducial vertices randomly sampled from Meshatlas and Meshpost on the fly for calculating the fiducial registration error during training.
In some embodiments, the training objective for NETsSpc-tSpc is constructed using measurements of similarity between a target object in tSpc (OtSpc) and a source object that is transferred to tSpc from sSpc ( OsSpc-tSpc).
In some embodiments, the training objective for the networks is a weighted sum of loss terms of MSPDice, Mean FRE, NCC, CycConsis,
BendE, and L2.
MSPDice = MSPDice(Maskpost, Maskatlas-post) +MSPDice(Maskatlas,Maskpost-atlas ), wherein MSPDice(MasktSpc, MasksSpc-tSpc) is a multiscale soft probabilistic Dice between Mask tSpc and MasksSpc-tSpc that measures the similarity of the segmentation masks between MasktSpc and MasksSpc-tSpc.
Mean FRE = FRE(FidVpost, FidVatlas-post ) + FRE(FidVatlas, FidVpost-atlas), wherein FRE(FidVtSpc, FidVsSpc-tSpc) is a mean fiducial registration error that measures the similarity of the fiducial vertices between FidVtSpc and FidVsSpc- tSpc, and is calculated as the average Euclidean distance between the fiducial vertices in FidVtSpc and the corresponding vertices in FidVsSpc-tSpc.
NCC = NCC(Prcpost, Atlasatlas-post) + NCC(Atlasatlas, Prepost-atlas), wherein NCC(PregtSpc, PresSpc-tSpc) is a normalized cross-correlation between PretSpc and PresSpc-tSpc that measures the similarity between the warped source image and the target image.
CycConsis = CycConsisatlas-post + CycConsispost-atlas, wherein CycConsissSpc-tSpc is a cycle-consistency loss that measures the similarity between the original source objects in the source space and the source objects that are transferred from the source space to the target space and then transferred back to the source space, wherein CycConsissSpc-tSpc =
MSPDice(MasksSpc , MasksSpc-tSpc-sSpc) + 2x FRE(FidVsSpc, FidVsSpc-tSpc-sSpc) + 0.5 x NCC(PresSpc, PresSpc-tSpc-sSpc).
BendE = BendE(DDFatlas-post) + BendE(DDFpost-atlas), wherein BendE(DDFsSpc-tSpc) is a bending loss for which the DDF from the source space to the target space DDFsSpc-tSpc is regularized using bending energy.
L2 = L2(NETatlas-post) + L2(NETpost-atlas), wherein L2(NETsSpc-tSpc) is a L2 loss for which the learnable parameters of the registration network NETsSpc-tSpc are regularized by an L2 term.
In another aspect, the invention relates to a system for segmentation of SOI in a CT image post-operatively acquired from an implant user in a region of interest in which an implant is implanted.
The system includes a trained networks for generating DDFs in opposite directions; and a microcontroller coupled with the trained networks and configured to generate a DDF from the post-operatively acquired CT (Post- CT) image to an atlas image; warp a segmentation mesh of the SOI in the atlas image to the Post-CT image using the DDF to generate a segmentation mesh of the SOI in the Post-CT image, wherein the segmentation mesh of the SOI in the atlas image is generated by applying an ASM method to the atlas image.
The atlas image is a Pre-CT image. In some embodiments, the trained networks comprises networks having network architecture in which NETsSpc-tSpc, designed for generating a DDF for warping a source image S to a target image T, comprises a Global-net and a Local-net, wherein the network architecture is configured such that after receiving the concatenation of S and T, the Global-net generates an affine transformation matrix; S is warped to T by using the affine transformation to generate an image S’; the Local-net takes the concatenation of S’ and T to generate a non-rigid local DDF; and the affine transformation and the local DDF are composed to produce an output DDF.
In some embodiments, the microcontroller is further configured to train the network with a dataset comprising a plurality of CT image pairs, randomly partitioned into a training set, a validation set, and a testing set, wherein each CT image pair has a pre-implantation CT (Pre-CT) image and a postimplantation CT (Post-CT) image respectively acquired in a region of a respective implant recipient before and after an implant is implanted in the region, so that the Pre-CT image and the Post-CT image of each CT image pair are an artifact-free CT image and an artifact-affected CT image, respectively.
In some embodiments, the microcontroller is further configured to apply image augmentation to the training set by rotating each image by a plurality of small random angles in the range of -25 and 25 degrees about the x-axis, y- axis, and z-axis, to create additional training images from each original image.
In some embodiments, the microcontroller is further configured to rigidly register the Pre-CT image to the Post-CT image for each CT image pair; and align the registered Pre-CT and Post-CT image pair to the atlas image so that anatomical structures in the region of the respective implant recipient are roughly in the same spatial location and orientation.
In some embodiments, the microcontroller is further configured to, for each CT image pair, apply the ASM method to the registered Pre-CT image to generate a segmentation mesh of the SOI in the registered Pre-CT image( Meshpre); transfer the segmentation mesh of the SOI in the registered Pre-CT image (Meshpre) to the Post-CT image, so as to generate a segmentation mesh of the SOI in the Post-CT image (Meshpost); and convert the segmentation mesh of the SOI in the Post-CT image (Meshpost) to segmentation masks of the SOI in the Post-CT image (Segpost).
In some embodiments, the networks are trained to learn to register the artifact-affected CT images and the atlas image with assistance of the paired artifact-free CT images.
In some embodiments, the networks is trained by inputting a concatenation of the atlas image (Atlasatlas) and each Post-CT image ( Postpost) into the networks so that the first network (NETatlas-post) generates a DDF from the atlas space to the Post-CT space (DDFatlas-post) and the second network(NETpost-atlas) generates a DDF from the Post-CT space to the atlas space( DDFpost-atlas); warping the Pre-CT image (PresSpc), the segmentation masks(MasksSpc), and the fiducial vertices (FidVsSpc) in a source space to a target space by using the corresponding DDFs, to generate PresSpc-tSpc, MasksSpc-tSpc, and FidVsSpc-tSpc, and transferring PresSpc-tSpc , MasksSpc-tSpc, and FidVsSpc-tSpc back to sSpc using the corresponding DDF, to generate PresSpc-tSpc-sSpc, MasksSpc-tSpc- sSpc, and FidVsSpc-tSpc-sSpc, respectively, wherein OxSpc denotes an object O in thex space, sSpc and tSpc respectively denote the source and target spaces.
In some embodiments, the training objective for NETsSpc-tSpc is constructed using measurements of similarity between a target object in tSpc (OtSpc) and a source object that is transferred to tSpc from sSpc (OsSpc-tSpc).
In some embodiments, the trained networks are characterized with a training objective that is a weighted sum of loss terms of MSPDice, Mean FRE, NCC, CycConsis, BendE, and L2.
MSPDice = MSPDice(MaskPost, Maskatlas-post) +MSPDice(Maskatlas,
Maskpost-atlas), wherein MSPDice(MasktSpc, MasksSpc-tSpc ) is a multiscale soft probabilistic Dice between MasktSpc and MasksSpc-tSpc that measures the similarity of the segmentation masks between MasktSpc and MasksSpc-tSpc.
Mean FRE = FRE(FidVpost, FidVatlas-post ) + FRE(FidVatlas, FidVpost-atlas) wherein FRE( FidVtSpc, FidVsSpc-tSpc) is a mean fiducial registration error that measures the similarity of the fiducial vertices between FidVtSpc and FidVsSpc- tspc, and is calculated as the average Euclidean distance between the fiducial vertices in FidVtSpc and the corresponding vertices in FidVsSpc-tSpc.
NCC = NCC(Prepost, Atlasatlas-Post) NCC(Atlasatlas, Prepost-atlas), wherein NCC(PregtSpc, PresSpc-tSpc) is a normalized cross-correlation between PretSpc and PresSpc-tSpc that measures the similarity between the warped source image and the target image.
CycConsis = CycConsisatlas-post + CycConsispost-atlas, wherein CycConsissSpc-tSpc is a cycle-consistency loss that measures the similarity between the original source objects in the source space and the source objects that are transferred from the source space to the target space and then transferred back to the source space, wherein CycConsissSpc-tSpc =
MSPDlce(MasksSpc, MasksSpc-tSpc-sSpc) + 2 x FRE(FidVsSpc, FidVsSpc-tSpc-sSpc)
0.5x NCC (PresSpc, Pre sSpc-tSpc-sSpc ).
BendE = BendE(DDFatlas-post) + BendE(DDFpost-atlas), wherein
BendE(DDFsSpc-tSpc) is a bending loss for which the DDF from the source space to the target space DDFsSpc-tSpc is regularized using bending energy. L2 = L 2(NETatlas-post) + L2(NETpost-atlas), wherein L2(NETsSpc-tSpc) is a L2 loss for which the learnable parameters of the registration network NETsSpc-tSpc are regularized by an L2 term.
Segmentation of the ICA is important to assist audiologists in programming cochlear implant. Segmenting the anatomy in images acquired after implantation is difficult because the implant produces very strong artifacts. The method system disclosed herein permit the segmentation of these images despite these artifacts. It is also robust to poor image quality, i.e., images affected by noise or blurred.
It should be noted that all or a part of the steps according to the embodiments of the present invention is implemented by hardware or a program instructing relevant hardware. Yet another aspect of the invention provides a non-transitory computer readable storage medium/memory which stores computer executable instructions or program codes. The computer executable instructions or program codes enable a system to complete various operations in the above disclosed method for segmentation of structures of interest (e.g., ICA) in a CT image post-operatively acquired with an implant user in a region of interest in which an implant is implanted. The storage medium/memory may include, but is not limited to, high-speed random access medium/memory such as DRAM, SRAM, DDR RAM or other random access solid state memory devices, and non-volatile memory such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices.
Without intent to limit the scope of the invention, examples and their related results according to the embodiments of the present invention are given below. Note that titles or subtitles may be used in the examples for convenience of a reader, which in no way should limit the scope of the invention. Moreover, certain theories are proposed and disclosed herein; however, in no way they, whether they are right or wrong, should limit the scope of the invention so long as the invention is practiced according to the invention without regard for any particular theory or scheme of action.
EXAMPLE
ATLAS-BASED SEGMENTATION OF INTRACOCHLEAR ANATOMY IN METAL ARTIFACT AFFECTED CT IMAGES OF THE EAR WITH CO-TRAINED DEEP NEURAL NETWORKS
In this exemplary example, an end-to-end atlas-based method is developed, which first generates a dense deformation field (DDF) between an artifact-free atlas image and a Post-CT image. The segmentation of the intracochlear anatomy (ICA) in the Post-CT image can then be obtained by transferring the predefined segmentation meshes of the ICA in the atlas image to the Post-CT image using that the DDF. Practically, the inter-subject non- rigid registration between the atlas image and the Post-CT image is a difficult task because (1) considerable variation in cochlear anatomy across individuals has been documented, and (2) the artifacts in the Post-CT image change, often severely, the appearance of the anatomy, which has a significant influence on the accuracy of registration methods guided by intensity-based similarity metrics. To overcome the challenges, the exemplary study herein discloses a method to perform registrations between an atlas image and the Post-CT images that rely on deep networks. Following the idea of consistent image registration obtained by jointly estimating the forward and reverse transformations between two images proposed by Christensen et al., a pair of co-trained networks that generate DDFs in opposite directions is adapted. One network is tasked with registering the atlas image to the Post-CT image and the other one is tasked with registering the Post-CT image to the atlas image. The networks are trained using loss functions that include voxel-wise labels, image content, fiducial registration error (FRE), and cycle-consistency constraint. The model can segment the ICA and preserve point-to-point correspondence between the atlas and the Post-CT meshes, even when the ICA is difficult to localize visually.
Method
Data: the dataset includes Pre-CT and Post-CT image pairs of 624 ears. The atlas image is a Pre-CT image of an ear that is not in the 624 ears. The Pre-CT images are acquired with several conventional scanners (GE BrightSpeed, LightSpeed Ultra; Siemens Sensation 16; and Philips M x 8000 IDT, iCT 128, and Brilliance 64) and the Post-CT images are acquired with a low-dose flat-panel volumetric scanner (Xoran Technologies xCAT® ENT).
The typical voxel size is 0.25 x 0.25 x 0.3 mm3 for the Pre-CT images and 0.4 x 0.4 x 0.4 mm3 for the Post-CT images. For each ear, the Pre-CT image is rigidly registered to the Post-CT image. The registration is accurate because the surgery, which comprises threading an electrode array through a small hole into the bony cavity, does not induce non-rigid deformation of the cochlea. The registered Pre-CT and Post-CT image pairs are then aligned to the atlas image so that the ears are roughly in the same spatial location and orientation. All of the images are resampled to an isotropic voxel size of 0.2 mm. Images of 64 x 64 x 64 voxels that contain the cochleae are cropped from the full-sized images, and the networks are trained to process such cropped images. Learning to Register the Artifact-affected Images and the Atlas Image with Assistance of the Paired Artifact-free Images: FIG. 2A shows a list of images, meshes, and masks used to train the networks. For simplicity, OxSpc or Ox is used to denote an object O in the x space. For example, Atlaslmgatlas orAtlasatlas is the atlas image in the atlas space. Similarly, PostImgpost or Postpost is a Post-CT image in the Post-CT space. Meshatlas is the segmentation mesh of the ICA in Atlasatlas generated by applying the active shape model-based (ASM) method to Atlasatlas. PreImgpost or Prepost is the paired Pre-CT image of Postpost registered to the original Post-CT image. Meshpost is the segmentation mesh of the ICA in Postpost, which is generated by applying the ASM method to Prepost and then transferring the meshes to Postpost. Maskatlas ( Segatlas) and Maskpost ( Segpost ) are segmentation masks of the ST, SY, and MD. They are generated by converting Meshatlas and Meshpost to masks.
As shown in FIG. 2B, the input of the networks is the concatenation of Atlasatlas and Postpost. The networks include a first network (NETatlas-post) that generates a DDF from the atlas space to the Post-CT space ( DDFatlas-post ) and a second network (NETpost-atlas) that generates a DDF from the Post-CT space to the atlas space ( DDFpost-atlas). FidVatlas and FidVpost are fiducial vertices randomly sampled from Meshatlas and Meshpost on the fly for calculating fiducial registration error (FRE) during training.
Assuming that sSpc is the source space and tSpc is the target space. The Pre-CT image, the segmentation masks, and the fiducial points in sSpc are warped to tSpc by using the corresponding DDFs (note that one DDF is used for the images and masks and the other for the fiducial points), and the results are denoted as PresSpc-tSpc , MasksSpc-tSpc , and FidVsSpc-tSpc. Then, PresSpc-tSpc, MasksSpc-tSpc , and FidVsSpc-tSpc are transferred back to sSpc using the corresponding DDF, and the results are denoted as PresSpc-tSpc-sSpc, MasksSpc-tSpc- sSpc, and FidVsSpc-tSpc-sSpc, respectively. The training objective for NETsSpc-tSpc can be constructed by using similarity measurements between the target object in tSpc (denoted as OtSpc) and the source object that has been transferred to tSpc from sSpc (denoted as OsSpc-tSpc). Specifically, the multiscale soft probabilistic Dice (MSPDice) between MasktSpc and MasksSpc-tSpc, which is denoted as MSPDice(MasktSpc, MasksSpc-tSpc), is used to measure the similarity of the segmentation masks. The multiscale soft probabilistic Dice is less sensitive to the class imbalance in the segmentation tasks and is more appropriate for measuring label similarity in the context of image registration. The similarity between FidVtSpc and FidVsSpc-tSpc is measured by the mean fiducial registration error FRE(FidVtSpc, FidVsSpc-tSpc), which is calculated as the average Euclidean distance between the vertices in FidVtSpc and the corresponding vertices in FidVsSpc-tSpc. The Post-CT images cannot be used for calculating intensity- based loss due to the artifacts, thus the normalized cross-correlation (NCC) between PretSpc and PresSpc-tSpc, which is denoted as NCC(PretSpc, PresSpc-tSpc), is used to measure the similarity between the warped source image and the target image. A cycle-consistency loss is used for regularizing the transformations. It imposes inverse consistency between the objects in the two spaces and has been shown to reduce folding problems. The cycle-consistency loss CycConsissSpc-tSpc measures the similarity between the original source objects in the source space and the source objects that have been transferred from the source space to the target space and then transferred back to the source space, which is calculated as MSPDice(MasksSpc, MasksSpc-tSpc-sSpc) + 2 x FRE( FidVsSpc, FidVsSpc-tSpc-sSpc) + 0.5 x NCC(PresSpc, PresSpc-tSpc-sSpc). Furthermore, the DDF from the source space to the target space DDFsSpc-tSpc is regularized using bending energy, which is denoted as BendE(DDFsSpc-tSpc). The learnable parameters of the registration network NETsSpc-tSpc (except for the biases) are regularized by an L2 term, which is denoted as L2( NETsSpc-tSpc). To summarize, the training objective for the networks is the weighted sum of the loss terms listed in Table 1; wherein the weights have been selected empirically by looking at training performance on a small number of epochs.
Figure imgf000025_0001
Network Architecture: The registration networks in the model are adapted from the network architecture proposed by Hu et al. and Ghavami et al. As shown in FIG. 3, NETsSpc-tSpc, which is tasked with generating a DDF for warping the source image S to the target image T, is composed of a Global-net and a Local-net. After receiving the concatenation of S and T, the Global-net generates an affine transformation matrix. S is warped to T by using this affine transformation and the resulting image is denoted as S. Then, the Local-net takes the concatenation of S’ and T to generate a non-rigid local DDF. The affine transformation and the local DDF are composed to produce the output DDF. The details about the Global-net and Local-net can be found in Hu et al. The Global-net is a 3D convolutional neural network with three down- sampling blocks, followed by an output block that maps the extracted image information to the affine transformation parameters. The Local-net is a 3D convolutional neural network based on an adapted U-network architecture with three down-sampling blocks, followed by three up-sampling blocks. NETsSpc- tSpc is trained end-to-end, i.e., the Global-net and the Local-net are trained together using the same loss function which is the weighted sum of the loss terms listed in Table 1.
Evaluation: As shown in FIG. 2C, at the inference phase, given a new Post-CT image Postpost , the ICA in Postpost can be segmented by warping Meshatlas to PostImgpost using the DDF generated by the trained network. The resulting segmentation mesh of the ICA is denoted as Meshatlas-post. Meshpost is used as the ground truth for comparison. As Meshatlas and Meshpost are the outputs of the ASM method, both of them have a predefined number of vertices, and the vertices of Meshatlas and Meshpost have a one-to-one correspondence. There are 3344, 3132, and 2852 vertices on the ST, SV, and MD mesh surfaces, respectively, for a total of 9328 vertices. Point-to-point error (P2PE), computed as the Euclidean distance in millimeters, between the corresponding vertices on Meshatlas-post and Meshpost are used to quantify the accuracy of the segmentation and registration. The P2PEs between the corresponding vertices on Meshpost and the meshes generated by cGANs+ASM are calculated and serve as values that are used to compare the novel method with the state of the art (SOTA). The method proposed in Hu et al, which uses a unidirectional registration network trained with the MSPDice loss and the regularization loss, is used as a baseline for comparison. In addition to the MSPDice loss and the regularization loss, the training objective also includes the FRE loss, NCC loss, and the cycle-consistency loss. An ablation study is conducted to analyze how these loss terms affect the performance of the networks.
Experiments
The 624 ears are partitioned into 465 ears for training, 66 ears for validation, and 93 ears for testing. The partition is random, with the constraint that ears of the same object cannot be used in both training and testing. Augmentation is applied to the training set by rotating each image by 6 random angles in the range of -25 and 25 degrees about the x-, y-, and z-axis. The training images are blurred by applying a Gaussian filter with a kernel size selected randomly from {0, 0.5, 1.0, 1.5} with equal probability. This results in a training set expanded to 8835 images. Each image is clipped between its 5th and 95th intensity percentiles, and the intensity values are rescaled to -1 to 1. A batch size of 1 is used, at each training step, 30% of the vertices on the ICA meshes are randomly sampled and used as the fiducial points for calculating the FRE loss.
Results
FIGS. 4A-4B show two cases for which the method leads to (FIG. 4 A) good and (FIG. 4B) poor results. For each case, the first row shows three orthogonal views of the original atlas image in the atlas space. The second row shows the Post-CT image. The third row shows the atlas image registered to the Post-CT image. The fourth row shows the paired Pre-CT image of the Post- CT image. The warped atlas image (third row) should be as similar as possible to the Pre-CT image (fourth row). The last row shows the original segmentation mesh in the atlas image ( Meshatlas ), the segmentation mesh in the Post-CT image generated using the method ( Meshatlas-post ), and the ground truth mesh in the Post-CT image (Meshpost ). For Meshatlas and Meshpost , the ST, the SV, and the MD are shown in red, blue, and green, respectively. Meshatlas-post is color-coded with the P2PE at each vertex on the mesh surfaces. Both these cases illustrate the severity of the artifact introduced by the implant. In the second case, the cochlea is barely visible.
For each testing ear, the P2PEs of the vertices on the mesh surfaces of the ST, the SV, and the MD are calculated respectively. The maximum (Max), median, and standard deviation (STD) of the P2Pes are calculated. FIGS. 5A- 5C show the boxplots of these statistics for the 93 testing ears. “cGAN+ASM” denotes the results of the SOTA. “Novel” denotes the results of the method according to the invention. “Novel-NoNCC”, “Novel-NoCycConsis”, and “Novel-NoFRE” denote the results of the novel networks trained without using the NCC loss, the cycle-consistency loss, and the FRE loss. “Baseline” denotes the results of the baseline method. “No registration” denotes the P2PEs between the vertices on the mesh surfaces in the original atlas space and the Post-CT space. Two-sided and one-sided Wilcoxon signed-rank tests between the “Novel” group and the other groups are performed. The p-values have been corrected using the Holm-Bonferroni method. The median values for each group are shown on top of the boxplots, in which red denotes that both the two-sided and the one-sided tests are significant, cyan denotes that only the two-sided test is significant, and blue denotes that the two-sided test is not significant. The results show that the networks trained using all of the novel loss terms achieve a significantly lower segmentation error compared to the baseline method and the networks that are not trained using all of the loss terms. The method according to the invention produces results that are similar to those obtained with the SOTA in terms of the medians of the segmentation error. The Max of the segmentation error and the STD of the segmentation error for the SV and MD remain slightly superior to those obtained with the SOTA.
As mentioned earlier, the SOTA is a two-step process: (1) generate a synthetic Pre-CT image from a Post-CT image with cGANs trained for this purpose and (2) apply an ASM method to the synthetic image. Step 2 requires the very accurate registration of an atlas to the image to be segmented to initialize the ASM. This is achieved through an affine and then a non-rigid intensity-based registration in a volume-of-interest that includes the inner ear. Step 1 takes about 0.3s while step 2 takes on average 75s. The novel method according to the invention only requires providing a volume-of-interest that includes the inner ear to the networks and inference time is also about 0.3s. Segmentation is thus essentially instantaneous with the novel method while it takes over a minute with the SOTA. This is of importance for clinical deployment and end-user acceptance.
In sum, the exemplary example discloses networks capable of performing image registration between artifact-affected CT images and an artifact-free atlas image, which is a very challenging task because of the severity of the artifact introduced by the implant. Because maintaining point-to-point correspondence between meshes in the atlas and meshes in the segmented Post-CT images is needed, a point-to-point loss is introduced, which, to the best of inventors’ knowledge, has not yet been reported. The experiments have shown that this loss is critical to achieve results that are comparable to those obtained with the SOTA that relies on an ASM fitted to a preoperative image synthesized from a post-operative image. By design, ASM methods always produce plausible shapes. It is observed that with the point-to-point loss, the network also produces plausible shapes even when the images are of very poor quality (see FIG. 4B). Thanks to the point-to-point loss, the network has been able to learn the shape of the cochlea and can fit this shape to partial information in the post-operative image. More experiments are ongoing to verify this hypothesis.
The foregoing description of the exemplary embodiments of the invention has been presented only for the purposes of illustration and description and is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching.
The embodiments were chosen and described in order to explain the principles of the invention and their practical application so as to enable others skilled in the art to utilize the invention and various embodiments and with various modifications as are suited to the particular use contemplated. Alternative embodiments will become apparent to those skilled in the art to which the invention pertains without departing from its spirit and scope. Accordingly, the scope of the invention is defined by the appended claims rather than the foregoing description and the exemplary embodiments described therein.
Some references, which may include patents, patent applications and various publications, are cited and discussed in the description of this disclosure. The citation and/or discussion of such references is provided merely to clarify the description of the present disclosure and is not an admission that any such reference is “prior art” to the disclosure described herein. All references cited and discussed in this specification are incorporated herein by reference in their entireties and to the same extent as if each reference was individually incorporated by reference.
LIST OF REFERENCES
[1], What is a Cochlear Implant, https://www.fda.gov/medical-devices/cochlear- implants/what-cochlear-implant, last accessed 2020/11/17.
[2], Image-guided Cochlear Implant Programming (IGCIP), https://clinicaltrials.gov/ct2/show/NCT03306082, last accessed 2020/11/17.
[3], Noble, J. H. et al. : Automatic segmentation of intracochlear anatomy in conventional CT.
IEEE Transactions on Biomedical Engineering 58(9), 2625-2632 (2011).
[4], Wang, J. et al. : Metal artifact reduction for the segmentation of the intra cochlear anatomy in CT images of the ear with 3D-conditional GANs. Medical Image Analysis 58, 101553 (2019).
[5], Wang, J. et al:. Conditional generative gdversarial networks for metal artifact reduction in CT images of the ear. In: Frangi, A. et al. (eds) Medical Image Computing and Computer Assisted Intervention - MICCAI 2018. Lecture Notes in Computer Science, vol. 11070, pp. 1-3. Springer, Cham (2018).
[6], Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv:1411.1784
(2014).
[7], Isola, P. etal:. Image-to-image translation with conditional adversarial networks. In
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1125-1134 (2017).
[8], Pelosi, S. etal:. Analysis of intersubject variations in intracochlear and middle ear surface anatomy for cochlear implantation. Otology & Neurotology 34(9), 1675-1680 (2013).
[9], Christensen, G. E. and Johnson, H. J.: Consistent image registration. IEEE Transactions on Medical Imaging 20(7), 568-582 (2001).
[10], Milletari, F., Navab, N., Ahmadi, S.: V-Net: Fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565-571 (2016).
[11], Hu, Y. et al. Weakly-supervised convolutional neural networks for multimodal image registration. Medical Image Analysis 49, 1-13 (2018).
[12], Kim, B. et al: Unsupervised deformable image registration using cycle-consistent CNN.
In: Shen D. etal. (eds) Medical Image Computing and Computer Assisted Intervention - MICCAI 2019. Lecture Notes in Computer Science, vol. 11769, pp. 166-174. Springer, Cham (2019).
[13], Rueckert, D. et al. : Nonrigid registration using free-form deformations: Application to breast MR images. IEEE Transactions on Medical Imaging 18(8), 712-721 (1999).
[14], Hu, Y. et al: Label-driven weakly-supervised learning for multimodal deformable image registration,” In: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI), pp. 1070-1074 (2018).
[15], Ghavami, N. et al: Automatic slice segmentation of intraoperative transrectal ultrasound images using convolutional neural networks. In: Fei, B., Webster III, R. J. (eds) Proceedings Volume 10576, Medical Imaging 2018: Image-Guided Procedures, Robotic Interventions, and Modeling, 1057603 (2018).
[16], Holm, S.: A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6(2), 65-70 (1979).

Claims

What is claimed is:
1. A method for segmentation of structures of interest (SOI) in a computed tomography (CT) image post-operatively acquired from an implant user in a region of interest in which an implant is implanted, comprising: providing an atlas image, a dataset and networks, wherein the dataset comprises a plurality of CT image pairs, randomly partitioned into a training set, a validation set, and a testing set, wherein each CT image pair has a pre-implantation CT (Pre-CT) image and a post-implantation CT (Post-CT) image respectively acquired in a region of a respective implant recipient before and after an implant is implanted in the region, so that the Pre- CT image and the Post-CT image of each CT image pair are an artifact-free CT image and an artifact-affected CT image, respectively; wherein the atlas image is a Pre-CT image of the region of a subject that is not in the plurality of CT image pairs; and wherein the networks comprises a first network for registering the atlas image to each Post-CT image and a second network for registering each Post-CT image to the atlas image; training the networks with the plurality of CT image pairs, so as to learn to register the artifact-affected CT images and the atlas image with assistance of the paired artifact-free CT images; inputting the CT image post-operatively acquired from the implant user to the trained networks to generate a dense deformation field (DDF) from the input Post-CT image to the atlas image; and warping a segmentation mesh of the SOI in the atlas image to the input Post-CT image using the DDF so as to generate the segmentation mesh of the SOI in the input Post-CT image, wherein the segmentation mesh of the SOI in the atlas image is generated by applying an active shape model-based (ASM) method to the atlas image.
2. The method of claim 1, wherein said providing the dataset comprises, for each CT image pair: rigidly registering the Pre-CT image to the Post-CT image; and aligning the registered Pre-CT and Post-CT image pair to the atlas image so that anatomical structures in the region of the respective implant recipient are roughly in the same spatial location and orientation.
3. The method of claim 2, wherein said providing the dataset further comprises, for each CT image pair: applying the ASM method to the registered Pre-CT image to generate a segmentation mesh of the SOI in the registered Pre-CT image ( Meshpre); transferring the segmentation mesh of the SOI in the registered Pre-CT image ( Meshpre) to the Post-CT image, so as to generate a segmentation mesh of the SOI in the Post-CT image ( Meshpost); and converting the segmentation mesh of the SOI in the Post-CT image (Meshpost ) to segmentation masks of the SOI in the Post-CT image (Segpost).
4. The method of claim 1, wherein all of the images are resampled to an isotropic voxel size, and cropped with images of 3D voxels containing the structures of interest.
5. The method of claim 1, wherein said providing the dataset comprises applying image augmentation to the training set by rotating each image by a plurality of small random angles in the range of -25 and 25 degrees about the x-axis, y-axis, and z-axis, to create additional training images from each original image.
6. The method of claim 1, wherein the networks have network architecture in which NETsSpc-tSpc, designed for generating a DDF for warping a source image S to a target image T, comprises a Global-net and a Local-net, wherein the network architecture is configured such that after receiving the concatenation of S and T, the Global-net generates an affine transformation matrix; S is warped to T by using the affine transformation to generate an image S’; the Local-net takes the concatenation of S’ and T to generate a non-rigid local DDF; and the affine transformation and the local DDF are composed to produce an output DDF.
7. The method of claim 6, wherein said training the networks comprises: inputting a concatenation of the atlas image ( Atlasatlas) and each Post-CT image ( Postpost ) into the networks so that the first network ( NETatlas-post ) generates a DDF from the atlas space to the Post-CT space ( DDFatlas-post ) and the second network ( NETpost-atlas) generates a DDF from the Post-CT space to the atlas space ( DDFpost-atlas ); warping the Pre-CT image ( PresSpc ), the segmentation masks (MasksSpc), and the fiducial vertices (FidVsSpc) in a source space to a target space by using the corresponding DDFs, to generate PresSpc-tSpc, MasksSpc-tSpc , and FidVsSpc-tSpc ; and transferring PresSpc-tSpc, MasksSpc-tSpc, and FidVsSpc-tSpc back to sSpc using the corresponding DDF, to generate PresSpc-tSpc-sSpc , MasksSpc-tspc- sSpc, and FidVsSpc-tSpc-sSpc , respectively, wherein OxSpc denotes an object O in the x space, sSpc and tSpc respectively denote the source and target spaces.
8. The method of claim 7, wherein FidVatias and FidVpost are the fiducial vertices randomly sampled from Meshatlas and Meshpost on the fly for calculating the fiducial registration error during training.
9. The method of claim 7, wherein the training objective for NETsSpc-tSpc is constructed using measurements of similarity between a target object in tSpc ( OtSpc ) and a source object that is transferred to tSpc from sSpc
(OsSpc-tSpc).
10. The method of claim 9, wherein the training objective for the networks is a weighted sum of loss terms of MSPDice, Mean FRE, NCC, CycConsis, BendE, and L2, wherein:
MSPDice = MSPDice(Maskpost, Maskatlas-post ) + MSPDice(Maskatlas, Maskpost-atlas ), wherein MSPDice(MasktSpc, MasksSpc-tSpc ) is a multiscale soft probabilistic Dice between MasktSpc and MasksSpc-tSpc that measures the similarity of the segmentation masks between MasktSpc and MasksSpc-tSpc ;
Mean FRE = FRE(FidVpost, FidVatlas-post ) + FRE (FidVatlas,
FidVpost-atlas), wherein FRE(FidVtSpc, FidVsSpc-tSpc) is a mean fiducial registration error that measures the similarity of the fiducial vertices between FidVtSpc and FidVsSpc-tSpc, and is calculated as the average Euclidean distance between the fiducial vertices in FidVtSpc and the corresponding vertices in FidVsSpc-tSpc,
NCC = NCC( Prepost, Atlasatlas-post) + NCC( Atlasatlas, Prepost-atlas), wherein NCC(PregtSpc, PresSpc-tSpc ) is a normalized cross-correlation between PretSpc and PresSpc-tSpc that measures the similarity between the warped source image and the target image;
CycConsis = CycConsisatlas-post + CycConsispost-atlas, wherein CycConsissSpc-tSpc is a cycle-consistency loss that measures the similarity between the original source objects in the source space and the source objects that are transferred from the source space to the target space and then transferred back to the source space, wherein CycConsissSpc-tSpc MSPDice(MasksSpc, MasksSpc-tSpc-SSpc ) +
2 x FRE( FidVtSpc, FidVsSpc-tSpc-sSpc ) + 0.5 X NCC ( PresSpc, PresSpc-tSpc-sSpc );
BendE = BendE (DDFatlas-post) + BendE( DDFpost-atlas), wherein BendE(DDFsSpc-tSpc) is a bending loss for which the DDF from the source space to the target space DDFsSpc-tSpc is regularized using bending energy; and
L2 = L2 ( NETatlas-post ) + L2(NETpost-atlas), wherein L2(NETsSpc-tSpc) is a L2 loss for which the learnable parameters of the registration network NETsSpc-tSpc are regularized by an L2 term.
11. The method of claim 1 , wherein the region of interest includes ear, brain, heart, or other organs of a living subject, wherein the structures of interest comprise anatomical structures in the region of interest.
12. The method of claim 11 , wherein the anatomical structures comprise intracochlear anatomy (ICA).
13. The method of claim 1, wherein the implant is a cochlear implant, a deep brain stimulator, or a pacemaker.
14. A method for segmentation of structures of interest (SOI) in a computed tomography (CT) image post-operatively acquired with an implant user in a region of interest in which an implant is implanted, comprising: inputting the post-operatively acquired CT (Post-CT) image to trained networks to generate a dense deformation field (DDF) from the input Post-CT image to an atlas image, wherein the atlas image is a Pre-CT image; and warping a segmentation mesh of the SOI in the atlas image to the input Post-CT image using the DDF so as to generate the segmentation mesh of the SOI in the input Post-CT image, wherein the segmentation mesh of the SOI in the atlas image is generated by applying an active shape model-based (ASM) method to the atlas image.
15. The method of claim 14, wherein the networks have network architecture in which NETsSpc-tSpc, designed for generating a DDF for warping a source image S to a target image T, comprises a Global-net and a Local-net, wherein the network architecture is configured such that after receiving the concatenation of S and T, the Global-net generates an affine transformation matrix; S is warped to T by using the affine transformation to generate an image S’; the Local-net takes the concatenation of S’ and T to generate a non-rigid local DDF; and the affine transformation and the local DDF are composed to produce an output DDF.
16. The method of claim 15, wherein the networks is trained with a dataset comprising a plurality of CT image pairs, randomly partitioned into a training set, a validation set, and a testing set, wherein each CT image pair has a pre-implantation CT (Pre-CT) image and a post-implantation CT (Post-CT) image respectively acquired in a region of a respective implant recipient before and after an implant is implanted in the region, so that the Pre-CT image and the Post-CT image of each CT image pair are an artifact-free CT image and an artifact-affected CT image, respectively.
17. The method of claim 16, wherein the Pre-CT image is rigidly registered to the Post-CT image for each CT image pair; and the registered Pre- CT and Post-CT image pair is aligned to the atlas image so that anatomical structures in the region of the respective implant recipient are roughly in the same spatial location and orientation.
18. The method of claim 17, wherein for each CT image pair, the ASM method is applied to the registered Pre-CT image to generate a segmentation mesh of the SOI in the registered Pre-CT image ( Meshpre ); the segmentation mesh of the SOI in the registered Pre-CT image (Meshpre) is transferred to the Post-CT image so as to generate a segmentation mesh of the SOI in the Post-CT image (Meshpost ); and the segmentation mesh of the SOI in the Post-CT image (Meshpost) is converted to segmentation masks of the SOI in the Post-CT image (Segpost ).
19. The method of claim 18, wherein the networks are trained to learn to register the artifact-affected CT images and the atlas image with assistance of the paired artifact-free CT images. The method of claim 19, wherein the networks is trained by: inputting a concatenation of the atlas image (Atlasatlas) and each Post-CT image ( Postpost) into the networks so that the first network (NETatlas-post) generates a DDF from the atlas space to the Post-CT space (DDFatlas-post ) and the second network (NETpost-atlas ) generates a DDF from the Post-CT space to the atlas space (DDFpost-atlas) ; warping the Pre-CT image ( PresSpc), the segmentation masks( MasksSpc), and the fiducial vertices (FidVsSpc ) in a source space to a target space by using the corresponding DDFs, to generate PresSpc-tSpc, MasksSpc-tSpc, and FidVsSpc-tSpc ; and transferring PresSpc-tSpc, MasksSpc-tSpc, and FidVsSpc-tSpc back to sSpc using the corresponding DDF, to generate PresSpc-tSpc-sSpc, MasksSpc-tSpc- sSpc, and FidVsSpc-tSpc-sSpc , respectively, wherein OxSpc denotes an object O in the x space, sSpc and tSpc respectively denote the source and target spaces. The method of claim 20, wherein the training objective for NETsSpc-tSpc is constructed using measurements of similarity between a target object in tSpc ( OtSpc) and a source object that is transferred to tSpc from sSpc ( OsSpc-tSpc). The method of claim 21 , wherein the trained networks are characterized with a training objective that is a weighted sum of loss terms of MSPDice, Mean FRE, NCC, CycConsis, BendE, and L2, wherein:
MSPDice = MSPDice(Maskpost, Maskatlas-post ) + MSPDice(Maskatlas, Maskpost-atlas ), wherein MSPDice(MasktSpc, MasksSpc-tSpc ) is a multiscale soft probabilistic Dice between MasktSpc and MasksSpc-tSpc that measures the similarity of the segmentation masks between MasktSpc and MasksSpc-tSpc ;
Mean FRE = FRE(FidVpost, FidVatlas-post ) + FRE(FidVatlas,
FidVpost-atlas ), wherein FRE(FidVtSpc, FidVsSpc-tSpc ) is a mean fiducial registration error that measures the similarity of the fiducial vertices between FidVtSpc and FidVsSpc-tSpc, and is calculated as the average Euclidean distance between the fiducial vertices in FidVtSpc and the corresponding vertices in FidVsSpc-tSpc ;
NCC = NCC(Pre post, AtlaSatlas-post) + NCC( Atlasatlas, Prepost-atlas), wherein NCC(PregtSpc, PresSpc-tSpc) is a normalized cross-correlation between PretSpc and PresSpc-tSpc that measures the similarity between the warped source image and the target image;
CycConsis = CycConsisatlas-post + CycConsispost-atlas, wherein CycConsissSpc-tSpc is a cycle-consistency loss that measures the similarity between the original source objects in the source space and the source objects that are transferred from the source space to the target space and then transferred back to the source space, wherein CycConsissSpc-tSpc MSPDice ( MasksSpc, MasksSpc-tSpc-tSpc ) + 2 x FRE( FidVsSpc , FidVsSpc-tSpc-sSpc ) + 0.5 x NCC( PresSpc, PresSpc-tSpc-sSpc ) ; BendE = BendE( DDFatlas-post ) + BendE(DDFpost-atlas ), wherein BendE(DDFsSpc-tSpc ) is a bending loss for which the DDF from the source space to the target space DDFsSpc-tSpc is regularized using bending energy; and
L2 = L2 ( NETatlas-post ) + L2(NETpost-atlas), wherein L2 (NETsSpc-tSpc) is a L2 loss for which the learnable parameters of the registration network NETsSpc-tSpc are regularized by an L2 term.
23. A non-transitory tangible computer-readable medium storing instructions which, when executed by one or more processors, cause a system to perform the method of any one of claims 1-22.
24. A system for segmentation of structures of interest (SOI) in a computed tomography (CT) image post-operatively acquired from an implant user in a region of interest in which an implant is implanted, comprising: a trained networks for generating dense deformation fields (DDFs) in opposite directions; and a microcontroller coupled with the trained networks and configured to generate a DDF from the post-operatively acquired CT (Post-CT) image to an atlas image; warp a segmentation mesh of the SOI in the atlas image to the Post-CT image using the DDF to generate a segmentation mesh of the SOI in the Post-CT image, wherein the segmentation mesh of the SOI in the atlas image is generated by applying an active shape model-based (ASM) method to the atlas image, wherein the atlas image is a Pre-CT image.
25. The system of claim 24, wherein the trained networks comprises networks having network architecture in which NETsSpc-tSpc, designed for generating a DDF for warping a source image S to a target image T , comprises a Global-net and a Local-net, wherein the network architecture is configured such that after receiving the concatenation of S and T, the Global-net generates an affine transformation matrix; S is warped to T by using the affine transformation to generate an image S’; the Local-net takes the concatenation of S’ and T to generate a non- rigid local DDF; and the affine transformation and the local DDF are composed to produce an output DDF.
26. The system of claim 25, wherein the microcontroller is further configured to train the network with a dataset comprising a plurality of CT image pairs, randomly partitioned into a training set, a validation set, and a testing set, wherein each CT image pair has a preimplantation CT (Pre-CT) image and a post-implantation CT (Post-CT) image respectively acquired in a region of a respective implant recipient before and after an implant is implanted in the region, so that the Pre-CT image and the Post-CT image of each CT image pair are an artifact-free CT image and an artifact-affected CT image, respectively.
27. The system of claim 26, wherein the microcontroller is further configured to apply image augmentation to the training set by rotating each image by a plurality of small random angles in the range of -25 and 25 degrees about the x-axis, y-axis, and z-axis, to create additional training images from each original image.
28. The system of claim 27, wherein the microcontroller is further configured to: rigidly register the Pre-CT image to the Post-CT image for each CT image pair; and align the registered Pre-CT and Post-CT image pair to the atlas image so that anatomical structures in the region of the respective implant recipient are roughly in the same spatial location and orientation.
29. The system of claim 28, wherein the microcontroller is further configured to, for each CT image pair: apply the ASM method to the registered Pre-CT image to generate a segmentation mesh of the SOI in the registered Pre-CT image (Meshpre ); transfer the segmentation mesh of the SOI in the registered Pre- CT image (Meshpre) to the Post-CT image, so as to generate a segmentation mesh of the SOI in the Post-CT image (Meshpost ); and convert the segmentation mesh of the SOI in the Post-CT image (Meshpost ) to segmentation masks of the SOI in the Post-CT image (Segpost ).
30. The system of claim 29, wherein the networks are trained to learn to register the artifact-affected CT images and the atlas image with assistance of the paired artifact-free CT images.
31. The system of claim 30, wherein the networks is trained by: inputting a concatenation of the atlas image ( Atlasatlas ) and each Post-CT image (Postpost ) into the networks so that the first network (NETatlas-post) generates a DDF from the atlas space to the Post-CT space (DDFatlas-post ) and the second network (NETpost-atlas ) generates a DDF from the Post-CT space to the atlas space ( DDFpost-atlas) ; warping the Pre-CT image ( PresSpc) the segmentation masks (MasksSpc), and the fiducial vertices (FidVsSpc ) in a source space to a target space by using the corresponding DDFs, to generate PresSpc-tSpc, MasksSpc-tSpc, and FidVsSpc-tSpc ; and transferring PresSpc-tSpc , MasksSpc-tspc, and FidVsSpc-tSpc back to sSpc using the corresponding DDF, to generate PresSpc-tSpc-sSpc, MasksSpc-tSpc- sSpc, and FidVsSpc-tSpc-sSpc, respectively, wherein OxSpc denotes an object O in the x space, sSpc and tSpc respectively denote the source and target spaces.
32. The system of claim 31 , wherein the training objective for NETsSpc-tSpc is constructed using measurements of similarity between a target object in tSpc ( OtSpc ) and a source object that is transferred to tSpc from sSpc ( OsSpc-tSpc ) .
33. The system of claim 32, wherein the trained networks are characterized with a training objective that is a weighted sum of loss terms of MSPDice, Mean FRE, NCC, CycConsis, BendE, and L2, wherein:
MSPDice MSPDice(Maskpost, Maskatlas-post ) + MSPDice(Maskatlas, Maskpost-atlas ) , wherein MSPDice(MasktSpc , MasksSpc-tSpc ) is a multiscale soft probabilistic Dice between MasktSpc and MasksSpc-tSpc that measures the similarity of the segmentation masks between MasktSpc and MasksSpc-tSpc ;
Mean FRE = FRE(FidVpost, FidVatlas-post ) + FRE(FidVatlas, FidVpost-atlas ), wherein FRE (FidVtSpc, FidVsSpc-tSpc ) is a mean fiducial registration error that measures the similarity of the fiducial vertices between FidVtSpc and FidVsSpc-tSpc, and is calculated as the average Euclidean distance between the fiducial vertices in FidVtSpc and the corresponding vertices in FidVsSpc-tSpc ;
NCC = NCC( Prepost; Atlasatlas-post ) + NCC( Atlasatlas, Prepost-atlas ), wherein NCC(PregtSpc, PresSpc-tSpc ) is a normalized cross-correlation between PretSpc and PresSpc-tSpc that measures the similarity between the warped source image and the target image;
CycConsis = CycConsisatlas-post + CycConsispost-atlas, wherein CycConsissSpc-tSpc is a cycle-consistency loss that measures the similarity between the original source objects in the source space and the source objects that are transferred from the source space to the target space and then transferred back to the source space, wherein CycConsissSpc-tSpc = MSPDice( MasksSpc, MasksSpc-tSpc-sSpc ) + 2 x FRE( FidVsSpc, FidVsSpc-tSpc-sSpc ) 0.5 x NCC( PresSpc, PresSpc-tSpc-sSpc ), BendE = BendE(DDFatlas-post ) + BendE(DDFpost-atlas), wherein BendE(DDFsSpc-tSpc ) is a bending loss for which the DDF from the source space to the target space DDFsSpc-tSpc is regularized using bending energy; and
L2 = L2(NETatlas-post ) + L2( NETpost-atlas ), wherein L2( NETsSpc-tSpc ) is a
L2 loss for which the learnable parameters of the registration network NETsSpc-tspc are regularized by an L2 term.
34. The system of claim 24, wherein the region of interest includes ear, brain, heart, or other organs of a living subject, wherein the structures of interest comprise anatomical structures in region of interest.
35. The system of claim 34, wherein the anatomical structures comprise intracochlear anatomy (ICA).
36 The system of claim 24, wherein the implant is a cochlear implant, a deep brain stimulator, or a pacemaker.
PCT/US2022/026262 2018-08-06 2022-04-26 Methods of automatic segmentation of anatomy in artifact affected ct images with deep neural networks and applications of same WO2022232084A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/288,058 US20240202914A1 (en) 2018-08-06 2022-04-26 Methods of automatic segmentation of anatomy in artifact affected ct images with deep neural networks and applications of same

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163179655P 2021-04-26 2021-04-26
US63/179,655 2021-04-26

Publications (1)

Publication Number Publication Date
WO2022232084A1 true WO2022232084A1 (en) 2022-11-03

Family

ID=83848787

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/026262 WO2022232084A1 (en) 2018-08-06 2022-04-26 Methods of automatic segmentation of anatomy in artifact affected ct images with deep neural networks and applications of same

Country Status (1)

Country Link
WO (1) WO2022232084A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050251029A1 (en) * 2004-04-21 2005-11-10 Ali Khamene Radiation therapy treatment plan
US20170177967A1 (en) * 2013-02-07 2017-06-22 Vanderbilt University Methods for automatic segmentation of inner ear anatomy in post-implantation ct and applications of same
US10699410B2 (en) * 2017-08-17 2020-06-30 Siemes Healthcare GmbH Automatic change detection in medical images
WO2020206135A1 (en) * 2019-04-02 2020-10-08 The Methodist Hospital System Image-based methods for estimating a patient-specific reference bone model for a patient with a craniomaxillofacial defect and related systems

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050251029A1 (en) * 2004-04-21 2005-11-10 Ali Khamene Radiation therapy treatment plan
US20170177967A1 (en) * 2013-02-07 2017-06-22 Vanderbilt University Methods for automatic segmentation of inner ear anatomy in post-implantation ct and applications of same
US10699410B2 (en) * 2017-08-17 2020-06-30 Siemes Healthcare GmbH Automatic change detection in medical images
WO2020206135A1 (en) * 2019-04-02 2020-10-08 The Methodist Hospital System Image-based methods for estimating a patient-specific reference bone model for a patient with a craniomaxillofacial defect and related systems

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ONOFREY JOHN A., STAIB LAWRENCE H., PAPADEMETRIS XENOPHON: "Learning intervention-induced deformations for non-rigid MR-CT registration and electrode localization in epilepsy patients", NEUROIMAGE: CLINICAL, vol. 10, 1 January 2016 (2016-01-01), pages 291 - 301, XP055980662, ISSN: 2213-1582, DOI: 10.1016/j.nicl.2015.12.001 *

Similar Documents

Publication Publication Date Title
Wang et al. Conditional generative adversarial networks for metal artifact reduction in CT images of the ear
US11763502B2 (en) Deep-learning-based method for metal reduction in CT images and applications of same
Kida et al. Visual enhancement of cone‐beam CT by use of CycleGAN
Wang et al. Metal artifact reduction for the segmentation of the intra cochlear anatomy in CT images of the ear with 3D-conditional GANs
Noble et al. Automatic segmentation of intracochlear anatomy in conventional CT
Nikan et al. PWD-3DNet: a deep learning-based fully-automated segmentation of multiple structures on temporal bone CT scans
Noble et al. Statistical shape model segmentation and frequency mapping of cochlear implant stimulation targets in CT
CN109598722B (en) Image analysis method based on recurrent neural network
Zhou et al. DuDoUFNet: dual-domain under-to-fully-complete progressive restoration network for simultaneous metal artifact reduction and low-dose CT reconstruction
Kjer et al. Patient-specific estimation of detailed cochlear shape from clinical CT images
Lv et al. Automatic segmentation of temporal bone structures from clinical conventional CT using a CNN approach
Zhang et al. Two-level training of a 3D U-Net for accurate segmentation of the intra-cochlear anatomy in head CTs with limited ground truth training data
Kida et al. Cone-beam CT to planning CT synthesis using generative adversarial networks
US20240202914A1 (en) Methods of automatic segmentation of anatomy in artifact affected ct images with deep neural networks and applications of same
Han et al. Joint synthesis and registration network for deformable MR-CBCT image registration for neurosurgical guidance
Wang et al. A co-registration approach for electrocorticogram electrode localization using post-implantation MRI and CT of the head
Lu et al. Facial nerve image enhancement from CBCT using supervised learning technique
JP2018535746A (en) Tissue classification method, computer program, and magnetic resonance imaging system
EP2953543B1 (en) Automatic segmentation of intra-cochlear anatomy in post-implantation ct of unilateral cochlear implant recipients
Sismono et al. 3D-localisation of cochlear implant electrode contacts in relation to anatomical structures from in vivo cone-beam computed tomography
WO2022232084A1 (en) Methods of automatic segmentation of anatomy in artifact affected ct images with deep neural networks and applications of same
Demarcy Segmentation and study of anatomical variability of the cochlea from medical images
EP3102110B1 (en) Methods for automatic segmentation of inner ear anatomy in post-implantation ct and applications of same
Wang et al. Atlas-based segmentation of intracochlear anatomy in metal artifact affected CT images of the ear with co-trained deep neural networks
US10102441B2 (en) Methods for automatic segmentation of inner ear anatomy in post-implantation CT and applications of same

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22796519

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18288058

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22796519

Country of ref document: EP

Kind code of ref document: A1