Abstract
Non-invasive image-based machine learning models have been used to classify subtypes of non-small cell lung cancer (NSCLC). However, the classification performance is limited by the dataset size, because insufficient data cannot fully represent the characteristics of the tumor lesions. In this work, a data augmentation method named elastic deformation is proposed to artificially enlarge the image dataset of NSCLC patients with two subtypes (squamous cell carcinoma and large cell carcinoma) of 3158 images. Elastic deformation effectively expanded the dataset by generating new images, in which tumor lesions go through elastic shape transformation. To evaluate the proposed method, two classification models were trained on the original and augmented dataset, respectively. Using augmented dataset for training significantly increased classification metrics including area under the curve (AUC) values of receiver operating characteristics (ROC) curves, accuracy, sensitivity, specificity, and f1-score, thus improved the NSCLC subtype classification performance. These results suggest that elastic deformation could be an effective data augmentation method for NSCLC tumor lesion images, and building classification models with the help of elastic deformation has the potential to serve for clinical lung cancer diagnosis and treatment design.
Similar content being viewed by others
References
Bhattacharjee A, Richards WG, Staunton J, et al. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proceedings of the National Academy of Sciences. 2001; 98(24):13790-13795.
Siegel R, Naishadham D, Jemal A. Cancer statistics, 2012. CA Cancer J Clin. 2012;62(1):10-29.
Jung K-W, Won Y-J, Oh C-M, et al. Prediction of Cancer Incidence and Mortality in Korea, 2016. Cancer research and treatment : official journal of Korean Cancer Association. 2016; 48(2):451-457.
Center NC. China Cancer Report: 2017. Beijing 2017.
Travis WD. Pathology & genetics tumours of the lung, pleura, thymus and heart. World Health Organization classification of tumours. 2004.
Risch A, Plass C. Lung cancer epigenetics and genetics. International Journal of Cancer. 2008;123(1):1-7.
Weston A, Willey JC, Modali R, et al. Differential DNA sequence deletions from chromosomes 3, 11, 13, and 17 in squamous-cell carcinoma, large-cell carcinoma, and adenocarcinoma of the human lung. Proceedings of the National Academy of Sciences. 1989; 86(13):5099-5103.
Pikor LA, Ramnarine VR, Lam S, Lam WL. Genetic alterations defining NSCLC subtypes and their therapeutic implications. Lung Cancer. 2013; 82(2):179-189.
Johnson DH, Fehrenbacher L, Novotny WF, et al. Randomized Phase II Trial Comparing Bevacizumab Plus Carboplatin and Paclitaxel With Carboplatin and Paclitaxel Alone in Previously Untreated Locally Advanced or Metastatic Non-Small-Cell Lung Cancer. J Clin Oncol. 2004; 22(11):2184-2191.
Scagliotti GV, Parikh P, von Pawel J, et al. Phase III Study Comparing Cisplatin Plus Gemcitabine With Cisplatin Plus Pemetrexed in Chemotherapy-Naive Patients With Advanced-Stage Non–Small-Cell Lung Cancer. J Clin Oncol. 2008; 26(21):3543-3551.
Scagliotti G, Hanna N, Fossella F, et al. The Differential Efficacy of Pemetrexed According to NSCLC Histology: A Review of Two Phase III Studies. The Oncologist. 2009; 14(3):253-263.
Travis WD. Classification of Lung Cancer. Semin Roentgenol. 2011; 46(3):178-186.
Barash O, Peled N, Tisch U, Bunn PA, Hirsch FR, Haick H. Classification of lung cancer histology by gold nanoparticle sensors. Nanomed Nanotechnol Biol Med. 2012; 8(5):580-589.
Cufer T, Ovcaricek T, O’Brien MER. Systemic therapy of advanced non-small cell lung cancer: Major-developments of the last 5-years. Eur J Cancer. 2013; 49(6):1216-1225.
Mok TS, Wu YL, Thongprasert S, et al. Gefitinib or Carboplatin–Paclitaxel in Pulmonary Adenocarcinoma. N Engl J Med. 2009; 361(10):947-957.
16. Swanton C. Intratumor heterogeneity: evolution through space and time. Cancer Res. 2012; 72(19):4875-4882.
Wu W, Parmar C, Grossmann P, et al. Exploratory Study to Identify Radiomics Classifiers for Lung Cancer Histology. Front Oncol. 2016; 6(71).
Saad M, Choi TS. Deciphering unclassified tumors of non-small-cell lung cancer through radiomics. Comput Biol Med. 2017; 91:222-230.
E L, Lu L, Li L, Yang H, Schwartz LH, Zhao B. Radiomics for Classification of Lung Cancer Histological Subtypes Based on Nonenhanced Computed Tomography. Acad Radiol. 2018.
Saad M, Choi TS. Computer-assisted subtyping and prognosis for non-small cell lung cancer patients with unresectable tumor. Comput Med Imaging Graph. 2018; 67:1-8.
Lambin P, Leijenaar RTH, Deist TM, et al. Radiomics: the bridge between medical imaging and personalized medicine. Nature Reviews Clinical Oncology. 2017; 14:749.
Sanduleanu S, Woodruff HC, de Jong EEC, et al. Tracking tumor biology with radiomics: A systematic review utilizing a radiomics quality score. Radiother Oncol. 2018; 127(3):349-360.
Thawani R, McLane M, Beig N, et al. Radiomics and radiogenomics in lung cancer: A review for the clinician. Lung Cancer. 2018; 115:34-41.
Haga A, Takahashi W, Aoki S, et al. Classification of early stage non-small cell lung cancers on computed tomographic images into histological types using radiomic features: interobserver delineation variability analysis. Radiological Physics and Technology. 2018; 11(1):27-35.
Madani A, Moradi M, Karargyris A, Syeda-Mahmood T. Chest x-ray generation and data augmentation for cardiovascular abnormality classification. Paper presented at: SPIE Medical Imaging2018.
Zhu X, Dong D, Chen Z, et al. Radiomic signature as a diagnostic factor for histologic subtype classification of non-small cell lung cancer. Eur Radiol. 2018; 28(7):2772-2778.
Tanner MA, Wong WH. The Calculation of Posterior Distributions by Data Augmentation. J Am Stat Assoc. 1987; 82(398):528-540.
van Dyk DA, Meng X-L. The Art of Data Augmentation. Journal of Computational and Graphical Statistics. 2001; 10(1):1-50.
Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. Paper presented at: Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015; 2015//, 2015; Cham.
Simard PY, Steinkraus D, Platt JC. Best practices for convolutional neural networks applied to visual document analysis. Paper presented at: Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.; 6–6 Aug. 2003, 2003.
Dosovitskiy A, Springenberg JT, Riedmiller M, Brox T. Discriminative Unsupervised Feature Learning with Convolutional Neural Networks. 2014:766--774.
Al-masni MA, Al-antari MA, Park J-M, et al. Simultaneous detection and classification of breast masses in digital mammograms via a deep learning YOLO-based CAD system. Comput Methods Programs Biomed. 2018; 157(0):85-94.
Devalla SK, Renukanand PK, Sreedhar BK, et al. DRUNET: a dilated-residual U-Net deep learning network to segment optic nerve head tissues in optical coherence tomography images. Biomedical optics express. 2018; 9(7):3244-3265.
Ramos-González J, López-Sánchez D, Castellanos-Garzón JA, de Paz JF, Corchado JM. A CBR framework with gradient boosting based feature selection for lung cancer subtype classification. Comput Biol Med. 2017; 86:98-106.
Rabbani M, Kanevsky J, Kafi K, Chandelier F, Giles FJ. Role of artificial intelligence in the care of patients with nonsmall cell lung cancer. Eur J Clin Invest. 2018; 48(4):e12901.
Coudray N, Ocampo PS, Sakellaropoulos T, et al. Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Nat Med. 2018.
Pedersen H, Brünner N, Francis D, et al. Prognostic Impact of Urokinase, Urokinase Receptor, and Type 1 Plasminogen Activator Inhibitor in Squamous and Large Cell Lung Cancer Tissue. Cancer Res. 1994; 54(17):4671-4675.
Peterson P, Park K, Fossella F, Gatzemeier U, John W, Scagliotti G. P2-328: Is pemetrexed more effective in adenocarcinoma and large cell lung cancer than in squamous cell carcinoma? A retrospective analysis of a phase III trial of pemetrexed vs docetaxel in previously treated patients with advanced non-small cell lung cancer (NSCLC). J Thorac Oncol. 2007; 2(8):S851.
Monica V, Ceppi P, Righi L, et al. Desmocollin-3: a new marker of squamous differentiation in undifferentiated large-cell carcinoma of the lung. Modern Pathol. 2009; 22:709.
Zhao G-Y, Lin Z-W, Lu C-L, et al. USP7 overexpression predicts a poor prognosis in lung squamous cell carcinoma and large cell carcinoma. Tumor Biol. 2015; 36(3):1721-1729.
Cai Z, Xu D, Zhang Q, Zhang J, Ngai S-M, Shao J. Classification of lung cancer using ensemble-based feature selection and machine learning methods. Mol Biosyst. 2015; 11(3):791-800.
Clark K, Vendt B, Smith K, et al. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository. J Digit Imaging. 2013; 26(6):1045-1057.
Aerts HJWL, Velazquez ER, Leijenaar RTH, et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nature communications. 2014; 5(0):4006.
Aerts HJWL, Rios Velazquez E, Leijenaar RTH, et al. Data From NSCLC-Radiomics. In: Archive TCI, ed2015.
Maayan Frid-Adar ID, Eyal Klang, Michal Amitai, Jacob Goldberger, Hayit Greenspan. GAN-based Synthetic Medical Image Augmentation for increased CNN Performance in Liver Lesion Classification. ArXiv. 2018; 1803(01229).
Beig N, Khorrami M, Alilou M, et al. Perinodular and Intranodular Radiomic Features on Lung CT Images Distinguish Adenocarcinomas from Granulomas. Radiology. 2018:180910.
Bradski G. The OpenCV Library. Dr Dobb's Journal of Software Tools. 2000:2236121.
Saeb S, Lonini L, Jayaraman A, Mohr DC, Kording KP. Voodoo Machine Learning for Clinical Predictions. bioRxiv. 2016.
Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011; 12(0):2825-2830.
Muller A, Guido S. Introduction to machine learning with python. O'Reilly; 2016.
He B, Zhao W, Pi J-Y, et al. A biomarker basing on radiomics for the prediction of overall survival in non–small cell lung cancer patients. Respir Res. 2018; 19(1):199.
W.D. Travis EB, A.P. Burke, A. Marx, A.G. Nicholson (Eds.). WHO classification of tumours of the lung, pleura, thymus and heart (4th ed.). International Agency for Research on Cancer, Lyon, France; 2015.
Pezeshk A, Petrick N, Chen W, Sahiner B. Seamless Lesion Insertion for Data Augmentation in CAD Training. IEEE Trans Med Imaging. 2017; 36(4):1005-1015.
Gevaert O, Xu J, Hoang CD, et al. Non–Small Cell Lung Cancer: Identifying Prognostic Imaging Biomarkers by Leveraging Public Gene Expression Microarray Data—Methods and Preliminary Results. Radiology. 2012; 264(2):387-396.
Bakr S, Gevaert O, Echegaray S, et al. Data for NSCLC Radiogenomics Collection. In: Archive TCI, ed2017.
Bakr S, Gevaert O, Echegaray S, et al. A radiogenomic dataset of non-small cell lung cancer. Scientific Data. 2018; 5:180202.
Schuurbiers OCJ, Meijer TWH, Kaanders JHAM, et al. Glucose Metabolism in NSCLC Is Histology-Specific and Diverges the Prognostic Potential of 18FDG-PET for Adenocarcinoma and Squamous Cell Carcinoma. J Thorac Oncol. 2014; 9(10):1485-1493.
Liu J, Cui J, Liu F, Yuan Y, Guo F, Zhang G. Multi-subtype classification model for non-small cell lung cancer based on radiomics: SLS model. Med Phys. 2019; 46(7):3091-3100.
Neto ACdS, Diniz PHB, Diniz JOB, et al. Diagnosis of Non-Small Cell Lung Cancer Using Phylogenetic Diversity in Radiomics Context. Image Analysis and Recognition. 2018:598–604.
Han Y, Ma Y, Wu Z, et al. Histologic subtype classification of non-small cell lung cancer using PET/CT images. Eur J Nucl Med Mol Imaging. 2020.
Funding
This work was partially supported by the National Natural Science Foundation of China (Nos. 61871251, 62027901 , 61601019, 61871022), the 111 Project (No. B13003), the Fundamental Research Funds for Central Universities, and the Beijing Natural Science Foundation (7202102).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Gao, Y., Song, F., Zhang, P. et al. Improving the Subtype Classification of Non-small Cell Lung Cancer by Elastic Deformation Based Machine Learning. J Digit Imaging 34, 605–617 (2021). https://doi.org/10.1007/s10278-021-00455-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10278-021-00455-0