Abstract
Prediction models for data-starved medical applications lag behind general machine learning solutions, despite their potential to improve early interventions. This is largely due to the assumption that optimization approaches are applied on a balanced distribution of events, yet medical data often has an imbalanced distribution within classes. The curse of dimensionality is further exacerbated by small samples and a high number of features in individual-based risk prediction models. In this paper, we propose a data augmentation system to gradually create synthetic minority samples with a control coefficient, which improves the quality of generated data over time and consequently boosts prediction model performance. This system incrementally adjusts to the data distribution, avoiding overfitting. We evaluate our approach using four synthetic oversampling techniques on real asthma patient data. Our results show that this system enhances classifiers’ overall performance across all four techniques. Specifically, applying the incremental data augmentation approach to three oversampling methods led to an increase in sensitivity of 4.01% to 7.79% in deep transfer learning-based classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39804-2_12
Gretel: Gretel. https://gretel.ai/. Accessed 4 May 2024
Hoens, T.R., Chawla, N.V.: Imbalanced datasets: from sampling to classifiers. Foundations, Algorithms, and Applications. Wiley, Imbalanced Learning (2013)
Kamalov, F., Denisov, D.: Gamma distribution-based sampling for imbalanced data. Knowl.-Based Syst. 207, 106368 (2020)
Lee, H., Kim, J., Kim, S.: Gaussian-based smote algorithm for solving skewed class distributions. Int. J. Fuzzy Logic Intell. Syst. 17(4), 229–234 (2017)
MIT: The synthetic data vault. https://sdv.dev. Accessed 4 May 2024
Raghupathi, W., Raghupathi, V.: Big data analytics in healthcare: promise and potential. Health Inf. Sci. Syst. 2(1), 1–10 (2014)
Sağlam, F., Cengiz, M.A.: A novel smote-based resampling technique trough noise detection and the boosting procedure. Expert Syst. Appl. 200, 117023 (2022)
Sharma, A., Singh, P.K., Chandra, R.: SMOTified-GAN for class imbalanced pattern classification problems. Ieee Access 10, 30655–30665 (2022)
Wan, Q., Deng, X., Li, M., Yang, H.: Sddsmote: synthetic minority oversampling technique based on sample density distribution for enhanced classification on imbalanced microarray data. In: The 6th International Conference on Compute and Data Analysis, pp. 35–42 (2022)
Woo, J., Rudasingwa, G., Kim, S.: Assessment of daily personal pm2. 5 exposure level according to four major activities among children. Appl. Sci. 10(1), 159 (2020)
Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Acknowledgement
This study received support from United Arab Emirates University under UAEU NFRP grant (Grant No. G00004281) and Seattle University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bae, W.D. et al. (2024). Incremental SMOTE with Control Coefficient for Classifiers in Data Starved Medical Applications. In: Wrembel, R., Chiusano, S., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2024. Lecture Notes in Computer Science, vol 14912. Springer, Cham. https://doi.org/10.1007/978-3-031-68323-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-68323-7_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-68322-0
Online ISBN: 978-3-031-68323-7
eBook Packages: Computer ScienceComputer Science (R0)