Abstract
Objective Validation of medical image segmentation algorithms is an open question, considering variance of individual pathologies and the related clinical requirements for accuracy. In this paper, we propose a validation metric capable to distinguish between an over and under-segmentation and account for different clinical applications.
Materials and methods In this paper, we propose a validation metric representing a tradeoff between sensitivity and specificity. The metric has an advantage of differentiating between an over or under-segmentation which is an important feature for validating large sets of segmentation results, as human inspection is exhausting and time consuming. Although it is oriented to the accuracy measurement it is also closely related to the robustness of a method.
Results Features of the metrics are analyzed alongside their medical impact. A set of numerical simulations is performed in order to compare the proposed metric with standardly used discrepancy measures. The metric is illustrated with a clinical case study, presenting accuracy assessment of an algorithm for calvarial tumor segmentation, validated on six patients.
Similar content being viewed by others
References
Jannin P, Fitzpatrick JM, Hawkes DJ, Pennec X, Shahidi R and Vannier MW (2002). Validation of medical image processing in image guided therapy. IEEE Trans Medical Imaging 21(12): 1445–1449
Udupa JK, LeBlanc VR, Schmidt H, Imielinska C, Saha PK, Grevera GJ, Zhuge Y, Molholt P, Jin Y, Currie LM (2002) A methodology for evaluating image segmentation algorithms. In: Proceedings of SPIE: medical imaging, pp. 266–277
Udupa JK, LeBlanc VR, Zhuge Y, Imielinska C, Schmidt H, Currie LM, Hirsch BE and Woodburn J (2006). A framework for evaluating image segmentation algorithms. Comput Medical Imaging Graph 30(2): 75–87
Warfield SK, Zou KH and Wells WM (2002). Validation of image segmentation and expert quality with an expectation—maximization algorithm. In: Dohi, T and Kikinis, R (eds) Proceedings of MICCAI 2002, the fifth international conference, pp 298–306. Springer, Heidelberg
Yitzhaky Y and Peli E (2003). A method for objective edge detection evaluation and detector parameter selection. IEEE Trans Pattern Anal Mach Intell 25(10): 1–7
Warfield SK, Zou KH and Wells WM (2004). Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Trans Medical Imaging 23(7): 903–921
Collins DL, Zijdenbos AP, Kollokian V, Sled JG, Kabani NJ, Holmes CJ and Evans AC (1998). Design and construction of a realistic digital brain phantom. IEEE Trans Medical Imaging 17(3): 463–468
Zubal IG, Harrell CR, Smith EO, Smith AL, Krischlunas P (1995) Two dedicated software voxel-based antropomorphic (Torso and Head) phantoms. In: Dimbylow PJ (ed) Proceeding of the international conference at the national radilogical protection board, pp 105–111
Cardoso JS and Corte-Real L (2005). Toward a generic evaluation of image segmentation. IEEE Trans Image Proces 14(11): 1773–1782
Yoo TS, Ackerman MJ and Vannier M (2000). Toward a common validation methodology for segmentation and registration algorithms. In: Delp, S, DiGioia, A, and Jaramaz, B (eds) Proceedings of MICCAI 2000, the 3rd international conference, vol 1935 of Lecture Notes in Computer Science, pp 422–431. Springer, Heidelberg
Duncan JC and Ayache N (2000). Medical image analysis: Progress over two decades and the challenges ahead. IEEE Trans Pattern Anal Mach Intell 22(1): 85–106
Jannin P, Grova C and Maurer CR (2006). Model for defining and reporting reference based validation protocols in medical image processing. Int J Comput Assisted Radiol Surg 1(2): 63–73
Kraemer HC (1992) Evaluating medical tests. SAGE
Dice LR (1945). Measures of the amount of ecologic association between species. Ecology 26: 297–302
Prastawa M, Bullitt E, Ho S and Gerig G (2003). Robust estimation for brain tumor segmentation. In: Ellis, RE and Peters, TM (eds) Proceedings of MICCAI 2003, the sixth international conference, vol 2879 of Lecture Notes in Computer Science, pp 530–537. Springer, Heidelberg
Gerig G, Jomier M, Chakos M (2001) VALMETanew validation tool for assesing and improving 3D object segmentation. In: Niessen WJ, Viergever MA (eds) MICCAI 2001, the fourth international conference, vol 2208 of Lecture Notes in Computer Science, pp 516–528
Zou KH, Wells WM, Kikinis R and Warfield SK (2003). Three validation metrics for automated probabilistic image segmentation of brain tumours. Statist Med 23(8): 1259–1282
Zou KH, Warfield SK, Bharatha A, Tempany CMC, Kaus MR, Haker SJ, Wells WM, Jolesz FA and Kikinis R (2004). Statistical validation of image segmentation quality based on a spatial overlap index. Acad Radiol 11(2): 178–189
Jaccard P (1912). The distribution of flora in the alpine zone. New Phytol 11: 37–50
Shan ZY, Ji Q, Gajjar A and Reddick WE (2005). A knowledge-guided active contour method of segmentation of cerebella on mr images of pediatric patients with medulloblastoma. J Magn Reson Imaging 21: 1–11
Roman-Roldan R, Gomez-Lopera JF, Atae-Allah C, Martinez-Aroza J and Luque-Escamilla PL (2001). A measure of quality for evaluating methods of segmentation and edge detection. Pattern Recogn 34: 969–980
Goumeidane AB, Khamadja M, Belaroussi B, Benoit-Cattin H, Odet C (2003) New discrepancy measures for segmentation evaluation. In: International conference on image processing (ICIP), vol 2. IEEE, pp 411–414
Kohavi R, Provost F (1998) Glossary of terms. Editorial for the special issue on applications of machine learning and the knowledge discovery process. J Mach Learn 30(2/3) (in press)
Lasko TA, Bhagwat JG, Zou KH and Ohno-Machado L (2005). The use of receiver operating characteristic curves in biomedical informatics. J Biomed Inf 38(5): 404–415
Fleiss JL (1975). Measuring agreement between two judges on the presence or absence of a trait. Biometrics 31: 651–659
Hripcsak G and Rothschild AS (2005). Agreement, the f-measure and reliability in information retrieval. J Am Med Inf Assoc 12(3): 296–297
Fuernkranz J and Flach PA (2005). Roc ‘n’ rule learning—towards a better understanding of covering algorithms. Mach Learn 58(1): 39–77
Bradly A (1997). The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7): 1145–1159
Horsch K, Giger ML, Venta LA and Vyborny CJ (2001). Automatic segmentation of breast lesions on ultrasound. Med Phys 28(8): 1652–1659
Grova C, Daunizeau J, Lina J-M, Bnar CG, Benali H and Gotman J (2006). Evaluation of EEG localization methods using realistic simulations of interictal spikes. Neuroimage 29(3): 734–753
Flach PA (2003) The geometry of ROC space: understanding machine learning metrics through roc isometrics. In: Proc 20th international conference on machine learning (ICML’03). AAAI Press, pp 194–201
Kuan Xu (2000). Inference for generalized Gini indices using the iterated-bootstrap method. J Bus Econ Statist 18(2): 223–227
Castillo-Salgado C, Schneider C, Loyola E, Mujica O, Roca A, Yerg T (2001) Measuring health inequalities: Gini coefficient and concentration index. Epidemiol Bull Pan Am Health Organization 22(1) (in press)
Vilalta R, Oblinger D (2000) A quantification of distance bias between evaluation metrics in classification. In: ICML ’00: proceedings of the seventeenth international conference on machine learning. San Francisco, Morgan Kaufmann, pp 1087–1094
Popovic A, Engelhardt M, Radermacher K (2006) Knowledge-based segmentation of calvarial tumors in computed tomography images. In: Bildverarbeitung für Medizin, BVM 2006, Informatik-Aktuell. Springer, Heidelberg, pp 151–155
Huang J, Ling CX (2005) Using AUCand accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3) (in press)
Zijdenbos AP, Dawant BM, Margolin RA and Palmer AC (1994). Morphometric analysis of white matter lesions in MR images: method and validation. IEEE Trans Med Imaging 13(4): 716–724
Popovic A, Engelhardt M, Wu T, Portheine F, Schmieder K, Radermacher K (2003) CRANIO—computer assisted planning for navigation and robot-assisted surgery on the skull. In: Lemke HU, Vannier MW, Inamura K, Farman AG, Doi K, Reiber JHC (eds), Proceedings of the 17th international congress and exhibition (CARS), vol 1256 of International Congress Series. Elsevier, pp 1269–1276
Bast P, Popovic A, Wu T, Heger S, Engelhardt M, Lauer W, Radermacher K and Schmieder K (2006). Robot- and computer-assisted craniotomy: resection planning, implant modelling and robot safety. Int J Med Robot Comput Assisted Surg 2(2): 168–178
Engelhardt M, Bast P, Jeblink N, Lauer W, Popovic A, Eufinger H, Scholz M, Christmann A, Harders A, Radermacher K and Schmieder K (2006). Analysis of surgical management of calvarial tumours and first results of a newly designed robotic trepanation system. Minim Invasive Neurosurg 49(2): 98–103
Popovic A, Engelhardt M, Wu T and Radermacher K (2006). Modeling of intensity priors for knowledge-based level set algorithm in calvarial tumors segmentation. In: Larsen, R, Nielsen, M, and Sporring, J (eds) Proceedings of 9th international conference on medical image computation and computer assisted intervention (MICCAI 2006), vol 4191 of Lecture Notes in Computer Science, pp 864–871. Springer, Heidelberg
Popovic A, Engelhardt M, Wu T, Radermacher K (2006) Towards automatic parameter optimization for medical image segmentation algorithms. In: Proceedings of the 11th international fall workshop, vision modeling, and visualization—VMV 2006
Maddah M, Zou KH, Wells WM, Kikinis R and Warfield SK (2004). Automatic optimization of segmentation algorithms through simultaneous truth and performance level estimation (STAPLE). In: Barillot, C, Haynor, DR, and Hellier, P (eds) Proceedings of MICCAI 2004, seventh international conference, vol 3216 of Lecture Notes in Computer Science., pp 274–282. Springer, Heidelberg
Abdul-Karim M-A, Roysam B, Dowell-Mesfin NM, Jeromin A, Yuksel M and Kalyanaraman S (2005). Automatic selection of parameters for vessel/neurite segmentation algorithms. IEEE Trans Image Proces 14(9): 1338–1350
Crum WR, Camara O, Rueckert D, Bhatia KK, Jenkinson M and Hill DLG (2005). Generalized overlap measures for assessment of pairwise and groupwise image registration and segmentation. In: Duncan, J and Gerig, G (eds) Proceedings of MICCAI 2005, the 8th international conference, vol 3749 of Lecture Notes in Computer Science, pp 99–106. Springer, Berlin
Breiman L (1996). Technical note: some properties of splitting criteria. Mach Learn 24(1): 41–47
Berzal F, Cubero J-C, Cuenca F and Martin-Bautista MJ (2003). On the quest for easy-to-understand splitting rules. Data Knowl Eng 44(1): 31–48
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Popovic, A., de la Fuente, M., Engelhardt, M. et al. Statistical validation metric for accuracy assessment in medical image segmentation. Int J CARS 2, 169–181 (2007). https://doi.org/10.1007/s11548-007-0125-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11548-007-0125-1