Incremental SMOTE with Control Coefficient for Classifiers in Data Starved Medical Applications

Wan D. Bae¹²,
Shayma Alkobaisi¹³,
Siddheshwari Bankar¹²,
Sartaj Bhuvaji¹²,
Jay Singhvi¹²,
Madhuroopa Irukulla¹² &
…
William McDonnell¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14912))

Included in the following conference series:

International Conference on Big Data Analytics and Knowledge Discovery

324 Accesses

Abstract

Prediction models for data-starved medical applications lag behind general machine learning solutions, despite their potential to improve early interventions. This is largely due to the assumption that optimization approaches are applied on a balanced distribution of events, yet medical data often has an imbalanced distribution within classes. The curse of dimensionality is further exacerbated by small samples and a high number of features in individual-based risk prediction models. In this paper, we propose a data augmentation system to gradually create synthetic minority samples with a control coefficient, which improves the quality of generated data over time and consequently boosts prediction model performance. This system incrementally adjusts to the data distribution, avoiding overfitting. We evaluate our approach using four synthetic oversampling techniques on real asthma patient data. Our results show that this system enhances classifiers’ overall performance across all four techniques. Specifically, applying the incremental data augmentation approach to three oversampling methods led to an increase in sensitivity of 4.01% to 7.79% in deep transfer learning-based classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 99.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 59.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

ACTIVE SMOTE for Imbalanced Medical Data Classification

Investigating the Stability of SMOTE-Based Oversampling on COVID-19 Data

Learning from Imbalanced Healthcare Data Using Overlap Pattern Synthesis

References

Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39804-2_12
Chapter Google Scholar
Gretel: Gretel. https://gretel.ai/. Accessed 4 May 2024
Hoens, T.R., Chawla, N.V.: Imbalanced datasets: from sampling to classifiers. Foundations, Algorithms, and Applications. Wiley, Imbalanced Learning (2013)
Google Scholar
Kamalov, F., Denisov, D.: Gamma distribution-based sampling for imbalanced data. Knowl.-Based Syst. 207, 106368 (2020)
Article Google Scholar
Lee, H., Kim, J., Kim, S.: Gaussian-based smote algorithm for solving skewed class distributions. Int. J. Fuzzy Logic Intell. Syst. 17(4), 229–234 (2017)
Article MathSciNet Google Scholar
MIT: The synthetic data vault. https://sdv.dev. Accessed 4 May 2024
Raghupathi, W., Raghupathi, V.: Big data analytics in healthcare: promise and potential. Health Inf. Sci. Syst. 2(1), 1–10 (2014)
Article Google Scholar
Sağlam, F., Cengiz, M.A.: A novel smote-based resampling technique trough noise detection and the boosting procedure. Expert Syst. Appl. 200, 117023 (2022)
Article Google Scholar
Sharma, A., Singh, P.K., Chandra, R.: SMOTified-GAN for class imbalanced pattern classification problems. Ieee Access 10, 30655–30665 (2022)
Article Google Scholar
Wan, Q., Deng, X., Li, M., Yang, H.: Sddsmote: synthetic minority oversampling technique based on sample density distribution for enhanced classification on imbalanced microarray data. In: The 6th International Conference on Compute and Data Analysis, pp. 35–42 (2022)
Google Scholar
Woo, J., Rudasingwa, G., Kim, S.: Assessment of daily personal pm2. 5 exposure level according to four major activities among children. Appl. Sci. 10(1), 159 (2020)
Article Google Scholar
Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar

Download references

Acknowledgement

This study received support from United Arab Emirates University under UAEU NFRP grant (Grant No. G00004281) and Seattle University.

Author information

Authors and Affiliations

Computer Science, Seattle University, Seattle, WA, USA
Wan D. Bae, Siddheshwari Bankar, Sartaj Bhuvaji, Jay Singhvi, Madhuroopa Irukulla & William McDonnell
College of Information Technology, United Arab Emirates University, Al Ain, UAE
Shayma Alkobaisi

Authors

Wan D. Bae
View author publications
You can also search for this author in PubMed Google Scholar
Shayma Alkobaisi
View author publications
You can also search for this author in PubMed Google Scholar
Siddheshwari Bankar
View author publications
You can also search for this author in PubMed Google Scholar
Sartaj Bhuvaji
View author publications
You can also search for this author in PubMed Google Scholar
Jay Singhvi
View author publications
You can also search for this author in PubMed Google Scholar
Madhuroopa Irukulla
View author publications
You can also search for this author in PubMed Google Scholar
William McDonnell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shayma Alkobaisi .

Editor information

Editors and Affiliations

Poznan University of Technology, Poznan, Poland
Robert Wrembel
Polytechnic University of Turin, Turin, Italy
Silvia Chiusano
Johannes Kepler University Linz, Linz, Austria
Gabriele Kotsis
Vienna University of Technology, Vienna, Austria
A Min Tjoa
Johannes Kepler University Linz, Linz, Austria
Ismail Khalil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bae, W.D. et al. (2024). Incremental SMOTE with Control Coefficient for Classifiers in Data Starved Medical Applications. In: Wrembel, R., Chiusano, S., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2024. Lecture Notes in Computer Science, vol 14912. Springer, Cham. https://doi.org/10.1007/978-3-031-68323-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-68323-7_9
Published: 18 August 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-68322-0
Online ISBN: 978-3-031-68323-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Incremental SMOTE with Control Coefficient for Classifiers in Data Starved Medical Applications

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

ACTIVE SMOTE for Imbalanced Medical Data Classification

Investigating the Stability of SMOTE-Based Oversampling on COVID-19 Data

Learning from Imbalanced Healthcare Data Using Overlap Pattern Synthesis

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Incremental SMOTE with Control Coefficient for Classifiers in Data Starved Medical Applications

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

ACTIVE SMOTE for Imbalanced Medical Data Classification

Investigating the Stability of SMOTE-Based Oversampling on COVID-19 Data

Learning from Imbalanced Healthcare Data Using Overlap Pattern Synthesis

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation