[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Incremental SMOTE with Control Coefficient for Classifiers in Data Starved Medical Applications

  • Conference paper
  • First Online:
Big Data Analytics and Knowledge Discovery (DaWaK 2024)

Abstract

Prediction models for data-starved medical applications lag behind general machine learning solutions, despite their potential to improve early interventions. This is largely due to the assumption that optimization approaches are applied on a balanced distribution of events, yet medical data often has an imbalanced distribution within classes. The curse of dimensionality is further exacerbated by small samples and a high number of features in individual-based risk prediction models. In this paper, we propose a data augmentation system to gradually create synthetic minority samples with a control coefficient, which improves the quality of generated data over time and consequently boosts prediction model performance. This system incrementally adjusts to the data distribution, avoiding overfitting. We evaluate our approach using four synthetic oversampling techniques on real asthma patient data. Our results show that this system enhances classifiers’ overall performance across all four techniques. Specifically, applying the incremental data augmentation approach to three oversampling methods led to an increase in sensitivity of 4.01% to 7.79% in deep transfer learning-based classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 99.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 59.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  2. Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39804-2_12

    Chapter  Google Scholar 

  3. Gretel: Gretel. https://gretel.ai/. Accessed 4 May 2024

  4. Hoens, T.R., Chawla, N.V.: Imbalanced datasets: from sampling to classifiers. Foundations, Algorithms, and Applications. Wiley, Imbalanced Learning (2013)

    Google Scholar 

  5. Kamalov, F., Denisov, D.: Gamma distribution-based sampling for imbalanced data. Knowl.-Based Syst. 207, 106368 (2020)

    Article  Google Scholar 

  6. Lee, H., Kim, J., Kim, S.: Gaussian-based smote algorithm for solving skewed class distributions. Int. J. Fuzzy Logic Intell. Syst. 17(4), 229–234 (2017)

    Article  MathSciNet  Google Scholar 

  7. MIT: The synthetic data vault. https://sdv.dev. Accessed 4 May 2024

  8. Raghupathi, W., Raghupathi, V.: Big data analytics in healthcare: promise and potential. Health Inf. Sci. Syst. 2(1), 1–10 (2014)

    Article  Google Scholar 

  9. Sağlam, F., Cengiz, M.A.: A novel smote-based resampling technique trough noise detection and the boosting procedure. Expert Syst. Appl. 200, 117023 (2022)

    Article  Google Scholar 

  10. Sharma, A., Singh, P.K., Chandra, R.: SMOTified-GAN for class imbalanced pattern classification problems. Ieee Access 10, 30655–30665 (2022)

    Article  Google Scholar 

  11. Wan, Q., Deng, X., Li, M., Yang, H.: Sddsmote: synthetic minority oversampling technique based on sample density distribution for enhanced classification on imbalanced microarray data. In: The 6th International Conference on Compute and Data Analysis, pp. 35–42 (2022)

    Google Scholar 

  12. Woo, J., Rudasingwa, G., Kim, S.: Assessment of daily personal pm2. 5 exposure level according to four major activities among children. Appl. Sci. 10(1), 159 (2020)

    Article  Google Scholar 

  13. Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

Download references

Acknowledgement

This study received support from United Arab Emirates University under UAEU NFRP grant (Grant No. G00004281) and Seattle University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shayma Alkobaisi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bae, W.D. et al. (2024). Incremental SMOTE with Control Coefficient for Classifiers in Data Starved Medical Applications. In: Wrembel, R., Chiusano, S., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2024. Lecture Notes in Computer Science, vol 14912. Springer, Cham. https://doi.org/10.1007/978-3-031-68323-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-68323-7_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-68322-0

  • Online ISBN: 978-3-031-68323-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics