Optimized Machine Learning for the Early Detection of Polycystic Ovary Syndrome in Women
<p>Schematic representation of the proposed framework.</p> "> Figure 2
<p>A grid showing the relationship between PCOS and its features.</p> "> Figure 3
<p>The correlation matrix between input and output attributes.</p> "> Figure 4
<p>Boxplot analysis of the features.</p> "> Figure 5
<p>The steps to obtain the blended symptomatic dataset.</p> "> Figure 6
<p>Implementation steps of WoA algorithm for EL model.</p> "> Figure 7
<p>Learning curve for base models with the training dataset.</p> "> Figure 7 Cont.
<p>Learning curve for base models with the training dataset.</p> "> Figure 8
<p>ROC comparison for base learning models.</p> "> Figure 9
<p>The convergence plot for CSO and WaO.</p> "> Figure 10
<p>Confusion matrix of optimized EL models for PCOS detection.</p> "> Figure 11
<p>ROC comparison for optimized EL models.</p> "> Figure 12
<p>Accuracy comparison of different learning models.</p> "> Figure 13
<p>Feature importance with RF model.</p> "> Figure 14
<p>Feature importance with XGB model using SHAP.</p> ">
Abstract
:1. Introduction
- Through the amalgamation of two different datasets, an entirely new symptomatic PCOS dataset, with initial symptoms and fundamental health indicators, was created.
- Learning algorithms, such as KNN, LR, SVM, DT, RF, XGB, and DL, tuned with RSO, were implemented to identify PCOS. The efficiency of the designed models was evaluated and validated based on accuracy, precision, F1 score, cross-validation comparisons, and other metrics.
- Ensembled models were created with stacking techniques to further improve their performance; comparative analysis for different meta-level classifiers was conducted to identify the best-performing model.
- A metaheuristic WaO algorithm was proposed to design the optimally tuned EL classifier (WaOEL), with the DL network as the meta-classifier for symptomatic PCOS prediction.
- Other optimization techniques, including RSO and CSO, were also employed to design EL models. A rigorous performance comparison of the proposed WaOEL was made with CSOEL and RSOEL based on the convergence plot, fitness value, and prediction metrics.
- Feature importance analysis was carried out using the SHAP framework to interpret the predictions obtained using the designed models.
2. Proposed Framework
3. Materials and Methods
3.1. Dataset Description
- Initially, feature selection was performed, and relevant attributes from various biomarkers and basic clinical tests were extracted from the raw datasets. The attributes related to hypertension, diabetes, obesity, and cardiometabolic disease were considered, while the attributes obtained from complex clinical tests were disregarded.
- The CVD dataset contains 1026 sample records for 13 attributes: age, sex, resting blood pressure, chest pain type, cholesterol, colored vessels in fluoroscopy, resting electrocardiogram, fasting blood sugar, maximum heart rate, angina, ST segment, thalassemia, and the target value. The recorded samples considered for further analysis included only women of reproductive age. The attributes selected for further analysis were cholesterol, fasting blood sugar, resting blood pressure, maximum heart rate, resting ECG, and chest pain.
- The PCOS dataset contains 44 parameters and 542 records, such as patient number, including age, weight, height, BMI, blood type, pulse rate, cycle length, marital status, random glucose, pregnancy, number of abortions, I beta-HCG, II beta-HCG, FSH, LH, FSH/LH, waist-to-hip ratio, TSH, AMH, PRL, vit D3, PRG, RBS, weight gain, hair growth, skin darkening, pimples, follicle number, average follicle size, endometrium, target PCOS, and a few others. The features determined using specific tests were removed, and only the basic noticeable parameters, such as BMI, glucose, cycle length, waist-to-hip ratio, weight gain, hair growth, and PCOS, were considered for further analysis.
- The new symptomatic dataset, after blending relevant symptoms, contains a total of 932 samples with 13 attributes, namely resting blood pressure (trestbps), maximum heart rate (thalach), cholesterol (chol), chest pain (cp), resting electrocardiogram (restecg), fasting blood sugar (fbs), random glucose (glucose), body mass index (BMI), weight gain, waist-to-hip ratio, cycle length, hair growth, and the result PCOS (Y/N). The dataset samples are shown in Table 2.
- The blended dataset contains both numerical and categorical data types and records a woman’s health parameters related to hypertension, heart health, obesity, diabetes, and noticeable hormonal effects. PCOS, being the target parameter, represents the patient’s PCOS diagnosis in relation to the attributes mentioned above. A null value denotes that the patient is not diagnosed with PCOS, and a value of one denotes that the patient suffers from PCOS.
- The obtained dataset is suitable for exploratory data analysis (EDA) to understand the relationships among various health indicators, potential biomarkers, and the outcome of PCOS. Figure 3 shows the correlation matrix between various input features and the target output. Negative coefficient values denote adverse relationships between input and output, i.e., an increase in one value results in a decrease in the other and vice versa.
- To develop an expert data-driven optimized model, boxplot analysis was used to examine the data for outliers. Figure 4 shows that most of the features have even distributions with clear medians; however, features such as waist-to-hip ratio, weight gain (Y/N), and cycle length (days) have several outliers. The outliers in the numerical attributes of the dataset were trimmed, resulting in 900 samples with 12 features.
- Data normalization was carried out using the MinMax scalar function for feature scaling. Data normalization helped achieve promising results compared to the raw data, bringing uniformity to the data and advancing interoperability. The steps to obtain the blended symptomatic dataset are depicted in Figure 5.
- In order to evaluate the performance of the designed models, the dataset was split into training and testing datasets, with a ratio of 80:20.
3.2. Learning Models
3.2.1. K-Nearest Neighbor
3.2.2. Logistic Regression
3.2.3. Support Vector Machine
3.2.4. Decision Tree
3.2.5. Random Forest
- The algorithm initiates by randomly choosing ‘k’ features from the total ‘m’ features. The root node is determined using the best-split technique for the selected ‘k’ features.
- Child nodes are created until a specified depth is reached.
- Through the selection of different subsets of features, ‘n’ randomly constructed trees are obtained, which collectively form the RF.
- The bootstrap aggregating (bagging) approach facilitates tree learning in the RF training process.
- Bagging is carried out by continually replacing a random sample with another sample from the training set and fitting trees using these samples.
3.2.6. Extreme Gradient Boost Algorithm
3.2.7. Deep Learning Network
3.2.8. Ensemble Learning
3.3. Walrus Optimization (WaO) Algorithm
- Stage 1: Foraging
- Stage 2: Migrating
- Stage 3: Fleeing or fighting
4. Results
4.1. Simulation Setup and Evaluation Metric
4.2. PCOS Prediction Using Learning Models
4.3. PCOS Prediction with EL Model
4.4. Optimization of the EL Model
4.5. PCOS Prediction with Optimized EL Model
5. Discussions
- Feature importance with RF
- b.
- Feature importance for XGB with SHAP
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Barrera, F.J.; Brown, E.D.L.; Rojo, A.; Obeso, J.; Plata, H.; Lincango, E.P.; Terry, N.; Rodriguez-Gutierrez, R.; Hall, J.E.; Shekhar, S. Application of machine learning and artificial intelligence in the diagnosis and classification of polycystic ovarian syndrome: A systematic review. Front. Endocrinol. 2023, 14, 1106625. [Google Scholar] [CrossRef] [PubMed]
- Escobar-Morreale, H.F. Polycystic ovary syndrome: Definition, aetiology, diagnosis and treatment. Nat. Rev. Endocrinol. 2018, 14, 270–284. [Google Scholar] [CrossRef] [PubMed]
- Aggarwal, M.; Yadav, P.; Badhe, S.; Deolekar, P. A cross sectional study on prevalence of PCOS and risk factors associated with it among medical students. Indian J. Obstet. Gynecol. Res. 2019, 6, 522–526. [Google Scholar] [CrossRef]
- Sadeghi, H.M.; Adeli, I.; Calina, D.; Docea, A.O.; Mousavi, T.; Daniali, M.; Nikfar, S.; Tsatsakis, A.; Abdollahi, M. Polycystic ovary syndrome: A comprehensive review of pathogenesis, management, and drug repurposing. Int. J. Mol. Sci. 2022, 23, 583. [Google Scholar] [CrossRef]
- Wang, E.T.; Calderon-Margalit, R.; Cedars, M.I.; Daviglus, M.L.; Merkin, S.S.; Schreiner, P.J.; Sternfeld, B.; Wellons, M.; Schwartz, S.M.; Lewis, C.E. Polycystic ovary syndrome and risk for long-term diabetes and dyslipidemia. Obstet. Gynecol. 2011, 117, 6–13. [Google Scholar] [CrossRef]
- Bulsara, J.; Patel, P.; Soni, A.; Acharya, S. A review: Brief insight into Polycystic Ovarian syndrome. Endocr. Metab. Sci. 2021, 3, 100085. [Google Scholar] [CrossRef]
- Deswal, R.; Narwal, V.; Dang, A.; Pundir, C.S. The prevalence of polycystic ovary syndrome: A brief systematic review. J. Hum. Reprod. Sci. 2020, 13, 261–271. [Google Scholar]
- Orio, F.; Muscogiuri, G.; Nese, C.; Palomba, S.; Savastano, S.; Tafuri, D.; Colarieti, G.; La Sala, G.; Colao, A.; Yildiz, B.O. Obesity, type 2 diabetes mellitus and cardiovascular disease risk: An uptodate in the management of polycystic ovary syndrome. Eur. J. Obstet. Gynecol. Reprod. Biol. 2016, 207, 214–219. [Google Scholar] [CrossRef]
- Yadav, S.; Delau, O.; Bonner, A.J.; Markovic, D.; Patterson, W.; Ottey, S.; Buyalos, R.P.; Azziz, R. Direct economic burden of mental health disorders associated with polycystic ovary syndrome: Systematic review and meta-analysis. eLife 2023, 12, e85338. [Google Scholar] [CrossRef]
- Neubronner, S.A.; Indran, I.R.; Chan, Y.H.; Thu, A.W.P.; Yong, E.-L. Effect of body mass index (BMI) on phenotypic features of polycystic ovary syndrome (PCOS) in Singapore women: A prospective cross-sectional study. BMC Women’s Health 2021, 21, 135. [Google Scholar] [CrossRef]
- Zhu, T.; Cui, J.; Goodarzi, M.O. Polycystic ovary syndrome and risk of type 2 diabetes, coronary heart disease, and stroke. Diabetes 2021, 70, 627–637. [Google Scholar] [CrossRef] [PubMed]
- Belsti, Y.; Enticott, J.; Azumah, R.; Tay, C.T.; Moran, L.; Ma, R.C.; Joham, A.E.; Laven, J.; Teede, H.; Mousa, A. Diagnostic accuracy of oral glucose tolerance tests, fasting plasma glucose and haemoglobin A1c for type 2 diabetes in women with polycystic ovary syndrome: A systematic review and meta-analysis. Diabetes Metab. Syndr. Clin. Res. Rev. 2024, 18, 102970. [Google Scholar] [CrossRef] [PubMed]
- Henney, A.E.; Gillespie, C.S.; Lai, J.Y.; Schofield, P.; Riley, D.R.; Caleyachetty, R.; Barber, T.M.; Miras, A.D.; Dobbie, L.J.; Hughes, D.M. Risk of type 2 diabetes, MASLD and cardiovascular disease in people living with polycystic ovary syndrome. J. Clin. Endocrinol. Metab. 2024, dgae481. [Google Scholar] [CrossRef]
- Al-Jawadi, Z.A. The Role of Diabetes on Polycystic Ovary Syndrome (PCOS). Int. Innov. J. Appl. Sci. 2024, 1, 1–5. [Google Scholar] [CrossRef]
- Wekker, V.; Van Dammen, L.; Koning, A.; Heida, K.; Painter, R.; Limpens, J.; Laven, J.; Roeters van Lennep, J.; Roseboom, T.; Hoek, A. Long-term cardiometabolic disease risk in women with PCOS: A systematic review and meta-analysis. Hum. Reprod. Update 2020, 26, 942–960. [Google Scholar] [CrossRef]
- Wal, A.; Dash, B.; Jaiswal, V.; Gupta, D.; Mishra, A.K. Role of inflammation, oxidative stress, and angiogenesis in polycystic ovary syndrome (PCOS): Current perspectives. In Targeting Angiogenesis, Inflammation, and Oxidative Stress in Chronic Diseases; Academic Press: Cambridge, MA, USA, 2024; pp. 459–485. [Google Scholar]
- Torchen, L.C. Cardiometabolic risk in PCOS: More than a reproductive disorder. Curr. Diabetes Rep. 2017, 17, 137. [Google Scholar] [CrossRef]
- Aksun, S.; Sonu, N.; Aygun, S.; Karakulak, U.; Mumusoglu, S.; Yildiz, B. Alterations of cardiometabolic risk profile in polycystic ovary syndrome: 13 years follow-up in an unselected population. J. Endocrinol. Investig. 2024, 47, 1129–1137. [Google Scholar] [CrossRef]
- Riestenberg, C.; Jagasia, A.; Markovic, D.; Buyalos, R.P.; Azziz, R. Health care-related economic burden of polycystic ovary syndrome in the United States: Pregnancy-related and long-term health consequences. J. Clin. Endocrinol. Metab. 2022, 107, 575–585. [Google Scholar] [CrossRef]
- Guan, M.; Li, R.; Wang, B.; He, T.; Luo, L.; Zhao, J.; Lei, J. Healthcare professionals’ perspectives on the challenges with managing polycystic ovary syndrome: A systematic review and meta-synthesis. Patient Educ. Couns. 2024, 123, 108197. [Google Scholar] [CrossRef]
- Rehman, R.; Alam, F.; Khan, R. Situation analysis of polycystic ovary syndrome in Central and East Asia. In Polycystic Ovary Syndrome; Elsevier: Amsterdam, The Netherlands, 2024; pp. 191–199. [Google Scholar]
- Franks, S. Polycystic ovary syndrome. N. Engl. J. Med. 1995, 333, 853–861. [Google Scholar] [CrossRef]
- McCartney, C.R.; Marshall, J.C. Polycystic ovary syndrome. N. Engl. J. Med. 2016, 375, 54–64. [Google Scholar] [CrossRef] [PubMed]
- Azziz, R.; Carmina, E.; Chen, Z.; Dunaif, A.; Laven, J.S.; Legro, R.S.; Lizneva, D.; Natterson-Horowtiz, B.; Teede, H.J.; Yildiz, B.O. Polycystic ovary syndrome. Nat. Rev. Dis. Primers 2016, 2, 16057. [Google Scholar] [CrossRef] [PubMed]
- Palomba, S.; Santagni, S.; Falbo, A.; La Sala, G.B. Complications and challenges associated with polycystic ovary syndrome: Current perspectives. Int. J. Women’s Health 2015, 7, 745–763. [Google Scholar] [CrossRef]
- Kim, A.E.; Lee, I.T.; Ottey, S.; Dokras, A. Lack of adequate counseling about pregnancy complications in patients with polycystic ovary syndrome: A cross-sectional survey study. F&S Rep. 2024, 5, 312–319. [Google Scholar]
- Stankiewicz, M.; Norman, R. Diagnosis and management of polycystic ovary syndrome: A practical guide. Drugs 2006, 66, 903–912. [Google Scholar] [CrossRef]
- Dewailly, D. Diagnostic criteria for PCOS: Is there a need for a rethink? Best Pract. Res. Clin. Obstet. Gynaecol. 2016, 37, 5–11. [Google Scholar] [CrossRef]
- Chen, M.; Hofestädt, R. A medical bioinformatics approach for metabolic disorders: Biomedical data prediction, modeling, and systematic analysis. J. Biomed. Inform. 2006, 39, 147–159. [Google Scholar] [CrossRef]
- Zhang, S.; Xiang, X.; Liu, L.; Yang, H.; Cen, D.; Tang, G. Bioinformatics analysis of hub genes and potential therapeutic agents associated with gastric cancer. Cancer Manag. Res. 2021, 13, 8929–8951. [Google Scholar] [CrossRef]
- Wang, B. Big Data Analytics in Bioinformatics and Healthcare; IGI Global: Hershey, PA, USA, 2014. [Google Scholar]
- Olorunsogo, T.O.; Balogun, O.D.; Ayo-Farai, O.; Ogundairo, O.; Maduka, C.P.; Okongwu, C.C.; Onwumere, C. Bioinformatics and personalized medicine in the US: A comprehensive review: Scrutinizing the advancements in genomics and their potential to revolutionize healthcare delivery. World J. Adv. Res. Rev. 2024, 21, 335–351. [Google Scholar] [CrossRef]
- Lu, H.; Uddin, S. Unsupervised machine learning for disease prediction: A comparative performance analysis using multiple datasets. Health Technol. 2024, 14, 141–154. [Google Scholar] [CrossRef]
- Park, D.J.; Park, M.W.; Lee, H.; Kim, Y.-J.; Kim, Y.; Park, Y.H. Development of machine learning model for diagnostic disease prediction based on laboratory tests. Sci. Rep. 2021, 11, 7567. [Google Scholar] [CrossRef] [PubMed]
- Krittanawong, C.; Virk, H.U.H.; Bangalore, S.; Wang, Z.; Johnson, K.W.; Pinotti, R.; Zhang, H.; Kaplin, S.; Narasimhan, B.; Kitai, T. Machine learning prediction in cardiovascular diseases: A meta-analysis. Sci. Rep. 2020, 10, 16057. [Google Scholar] [CrossRef] [PubMed]
- Yu, Z.; Wang, K.; Wan, Z.; Xie, S.; Lv, Z. Popular deep learning algorithms for disease prediction: A review. Clust. Comput. 2023, 26, 1231–1251. [Google Scholar] [CrossRef] [PubMed]
- Ali, F.; El-Sappagh, S.; Islam, S.R.; Kwak, D.; Ali, A.; Imran, M.; Kwak, K.-S. A smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and feature fusion. Inf. Fusion 2020, 63, 208–222. [Google Scholar] [CrossRef]
- Woldaregay, A.Z.; Årsand, E.; Walderhaug, S.; Albers, D.; Mamykina, L.; Botsis, T.; Hartvigsen, G. Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes. Artif. Intell. Med. 2019, 98, 109–134. [Google Scholar] [CrossRef]
- Liu, Y.-Q.; Chang, T.-W.; Lee, L.-C.; Chen, C.-Y.; Hsu, P.-S.; Tsan, Y.-T.; Yang, C.-T.; Chu, W.-M. Use of Machine Learning to Predict the Incidence of Type 2 Diabetes Among Relatively Healthy Adults: A 10-Year Longitudinal Study in Taiwan. Diagnostics 2024, 15, 72. [Google Scholar] [CrossRef]
- Ahmad, R.; Maghrabi, L.A.; Khaja, I.A.; Maghrabi, L.A.; Ahmad, M. SMOTE-Based Automated PCOS Prediction Using Lightweight Deep Learning Models. Diagnostics 2024, 14, 2225. [Google Scholar] [CrossRef]
- Wang, H.; Zhang, X.; Xia, Y.; Wu, X. A differential privacy-preserving deep learning caching framework for heterogeneous communication network systems. Int. J. Intell. Syst. 2022, 37, 11142–11166. [Google Scholar] [CrossRef]
- Tiwari, S.; Kane, L.; Koundal, D.; Jain, A.; Alhudhaif, A.; Polat, K.; Zaguia, A.; Alenezi, F.; Althubiti, S.A. SPOSDS: A smart Polycystic Ovary Syndrome diagnostic system using machine learning. Expert Syst. Appl. 2022, 203, 117592. [Google Scholar] [CrossRef]
- Elmannai, H.; El-Rashidy, N.; Mashal, I.; Alohali, M.A.; Farag, S.; El-Sappagh, S.; Saleh, H. Polycystic Ovary Syndrome Detection Machine Learning Model Based on Optimized Feature Selection and Explainable Artificial Intelligence. Diagnostics 2023, 13, 1506. [Google Scholar] [CrossRef]
- Prajna, K.B.; Iyer, B.V.; Bhuvan, C.; Thambanda, K.M.; Kanasu, H.R. Implementation of Various Machine Learning Algorithms to Predict Polycystic Ovary Syndrome. In Proceedings of the 2023 4th International Conference for Emerging Technology (INCET), Belgaum, India, 26–28 May 2023; pp. 1–6. [Google Scholar]
- Khanna, V.V.; Chadaga, K.; Sampathila, N.; Prabhu, S.; Bhandage, V.; Hegde, G.K. A Distinctive Explainable Machine Learning Framework for Detection of Polycystic Ovary Syndrome. Appl. Syst. Innov. 2023, 6, 32. [Google Scholar] [CrossRef]
- Nasim, S.; Almutairi, M.S.; Munir, K.; Raza, A.; Younas, F. A Novel Approach for Polycystic Ovary Syndrome Prediction Using Machine Learning in Bioinformatics. IEEE Access 2022, 10, 97610–97624. [Google Scholar] [CrossRef]
- Thakre, V. PCOcare: PCOS Detection and Prediction using Machine Learning Algorithms. Biosci. Biotechnol. Res. Commun. 2020, 13, 240–244. [Google Scholar] [CrossRef]
- Ahmed, S.; Rahman, M.S.; Jahan, I.; Kaiser, M.S.; Hosen, A.S.M.S.; Ghimire, D.; Kim, S.-H. A Review on the Detection Techniques of Polycystic Ovary Syndrome Using Machine Learning. IEEE Access 2023, 11, 86522–86543. [Google Scholar] [CrossRef]
- Prabha, A.; Yadav, J.; Rani, A.; Singh, V. Intelligent estimation of blood glucose level using wristband PPG signal and physiological parameters. Biomed. Signal Process. Control 2022, 78, 103876. [Google Scholar] [CrossRef]
- Lim, J.; Li, J.; Feng, X.; Feng, L.; Xiao, X.; Zhou, M.; Yang, H.; Xu, Z. Predicting TCM patterns in PCOS patients: An exploration of feature selection methods and multi-label machine learning models. Heliyon 2024, 10, e35283. [Google Scholar] [CrossRef]
- Wang, Z.; Wolf, A.T.; Asokan, G.; Onnela, J.-P.; Baird, D.D.; Jukic, A.M.Z.; Wilcox, A.J.; Williams, M.A.; Hauser, R.; Coull, B. Prediction of polycystic ovary syndrome (pcos) using self-reported characteristics from a digital cohort in the unites states. Fertil. Steril. 2024, 122, e358. [Google Scholar] [CrossRef]
- Kaur, R.; Kumar, R.; Gupta, M. Food Image-based diet recommendation framework to overcome PCOS problem in women using deep convolutional neural network. Comput. Electr. Eng. 2022, 103, 108298. [Google Scholar] [CrossRef]
- Khushal, R.; Fatima, U. Fuzzy machine learning logic utilization on hormonal imbalance dataset. Comput. Biol. Med. 2024, 174, 108429. [Google Scholar] [CrossRef]
- Rahman, M.M.; Islam, A.; Islam, F.; Zaman, M.; Islam, M.R.; Sakib, M.S.A.; Babu, H.M.H. Empowering early detection: A web-based machine learning approach for PCOS prediction. Inform. Med. Unlocked 2024, 47, 101500. [Google Scholar] [CrossRef]
- Zigarelli, A.; Jia, Z.; Lee, H. Machine-aided self-diagnostic prediction models for polycystic ovary syndrome: Observational study. JMIR Form. Res. 2022, 6, e29967. [Google Scholar] [CrossRef] [PubMed]
- Aggarwal, S.; Pandey, K. Early identification of PCOS with commonly known diseases: Obesity, diabetes, high blood pressure and heart disease using machine learning techniques. Expert Syst. Appl. 2023, 217, 119532. [Google Scholar] [CrossRef]
- Kakoly, N.; Khomami, M.; Joham, A.; Cooray, S.; Misso, M.; Norman, R.; Harrison, C.; Ranasinha, S.; Teede, H.; Moran, L. Ethnicity, obesity and the prevalence of impaired glucose tolerance and type 2 diabetes in PCOS: A systematic review and meta-regression. Hum. Reprod. Update 2018, 24, 455–467. [Google Scholar] [CrossRef]
- Pachauri, N.; Ahn, C.W. Regression tree ensemble learning-based prediction of the heating and cooling loads of residential buildings. Build. Simul. 2022, 15, 2003–2017. [Google Scholar] [CrossRef]
- Ergen, F.; Katlav, M. Investigation of optimized machine learning models with PSO for forecasting the shear capacity of steel fiber-reinforced SCC beams with/out stirrups. J. Build. Eng. 2024, 83, 108455. [Google Scholar] [CrossRef]
- Zhang, X.; Lu, B.; Zhang, L.; Pan, Z.; Liao, M.; Shen, H.; Zhang, L.; Liu, L.; Li, Z.; Hu, Y. An enhanced grey wolf optimizer boosted machine learning prediction model for patient-flow prediction. Comput. Biol. Med. 2023, 163, 107166. [Google Scholar] [CrossRef]
- Rojas, M.G.; Olivera, A.C.; Vidal, P.J. A genetic operators-based Ant Lion Optimiser for training a medical multi-layer perceptron. Appl. Soft Comput. 2024, 151, 111192. [Google Scholar] [CrossRef]
- Ghasemi, M.; Zare, M.; Trojovský, P.; Rao, R.V.; Trojovská, E.; Kandasamy, V. Optimization based on the smart behavior of plants with its engineering applications: Ivy algorithm. Knowl.-Based Syst. 2024, 295, 111850. [Google Scholar] [CrossRef]
- Saman, S.; Narayanan, S.J. Optimal feature subset selection for MRI brain tumor classification using improved ant-lion optimization. In Evolutionary Intelligence; Springer: Berlin/Heidelberg, Germany, 2024; pp. 1–23. [Google Scholar]
- Aziz, R.M.; Desai, N.P.; Baluch, M.F. Computer vision model with novel cuckoo search based deep learning approach for classification of fish image. Multimed. Tools Appl. 2023, 82, 3677–3696. [Google Scholar] [CrossRef]
- Lundberg, S. A unified approach to interpreting model predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar]
- Tatarchuk, T.; Pedachenko, N.; Kosei, N.; Malysheva, I.; Snizhko, T.; Kozub, T.; Zolotarevska, O.; Kosianenko, S.; Tutchenko, T. Distribution and anthropometric characteristics of Rotterdam criteria-based phenotypic forms of Polycystic ovaries syndrome in Ukraine. Eur. J. Obstet. Gynecol. Reprod. Biol. 2024, 295, 104–110. [Google Scholar] [CrossRef] [PubMed]
- Heart Disease Dataset. Kaggle 2019. Available online: https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset (accessed on 10 June 2024).
- Polycystic Ovary Syndrome (PCOS). Kaggle 2020. Available online: https://www.kaggle.com/datasets/prasoonkottarathil/polycystic-ovary-syndrome-pcos (accessed on 10 June 2024).
- Uddin, S.; Haque, I.; Lu, H.; Moni, M.A.; Gide, E. Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction. Sci. Rep. 2022, 12, 6256. [Google Scholar] [CrossRef] [PubMed]
- Palanisamy, S.; Rajaguru, H. Leveraging Classifier Performance Using Heuristic Optimization for Detecting Cardiovascular Disease from PPG Signals. Diagnostics 2024, 14, 2287. [Google Scholar] [CrossRef]
- Thomas, N.M.; Jerome, S.A. Diabetic retinopathy detection using ensembled transfer learning based thrice CNN with SVM classifier. In Multimedia Tools and Applications; Springer: Berlin/Heidelberg, Germany, 2024; pp. 1–27. [Google Scholar]
- Khashei, M.; Etemadi, S.; Bakhtiarvand, N. A New Discrete Learning-Based Logistic Regression Classifier for Bankruptcy Prediction. Wirel. Pers. Commun. 2024, 134, 1075–1092. [Google Scholar] [CrossRef]
- Ray, A.; Chaudhuri, A.K. A Novel Diagnosis System for Parkinson’s Disease Based on Ensemble Random Forest. In Data Driven Science for Clinically Actionable Knowledge in Diseases; Chapman and Hall/CRC: Boca Raton, FL, USA, 2024; pp. 92–107. [Google Scholar]
- Alickovic, E.; Subasi, A. Medical Decision Support System for Diagnosis of Heart Arrhythmia using DWT and Random Forests Classifier. J. Med. Syst. 2016, 40, 108. [Google Scholar] [CrossRef]
- Alghazzawi, D.M.; Alquraishee, A.G.A.; Badri, S.K.; Hasan, S.H. ERF-XGB: Ensemble Random Forest-Based XG Boost for Accurate Prediction and Classification of E-Commerce Product Review. Sustainability 2023, 15, 7076. [Google Scholar] [CrossRef]
- Asif, S.; Wenhui, Y.; Tao, Y.; Jinhai, S.; Jin, H. An Ensemble Machine Learning Method for the Prediction of Heart Disease. In Proceedings of the 2021 4th International Conference on Artificial Intelligence and Big Data (ICAIBD), Chengdu, China, 28–31 May 2021; pp. 98–103. [Google Scholar]
- Isabona, J.; Imoize, A.L.; Kim, Y. Machine Learning-Based Boosted Regression Ensemble Combined with Hyperparameter Tuning for Optimal Adaptive Learning. Sensors 2022, 22, 3776. [Google Scholar] [CrossRef]
- Liu, H.; Chen, S.; Huang, F.; Li, Q. Study on characteristics and parameter optimization of medical waste crushing process. Powder Technol. 2024, 431, 119085. [Google Scholar] [CrossRef]
- Trojovsky, P.; Dehghani, M. A new bio-inspired metaheuristic algorithm for solving optimization problems based on walruses behavior. Sci. Rep. 2023, 13, 8775. [Google Scholar] [CrossRef]
- Han, M.; Du, Z.; Yuen, K.F.; Zhu, H.; Li, Y.; Yuan, Q. Walrus optimizer: A novel nature-inspired metaheuristic algorithm. Expert Syst. Appl. 2024, 239, 122413. [Google Scholar] [CrossRef]
- Velasco, L.; Guerrero, H.; Hospitaler, A. A literature review and critical analysis of metaheuristics recently developed. Arch. Comput. Methods Eng. 2024, 31, 125–146. [Google Scholar] [CrossRef]
- Gambineri, A.; Pelusi, C.; Vicennati, V.; Pagotto, U.; Pasquali, R. Obesity and the polycystic ovary syndrome. Int. J. Obes. 2002, 26, 883–896. [Google Scholar] [CrossRef] [PubMed]
- Daescu, A.-M.C.; Dehelean, L.; Navolan, D.-B.; Gaitoane, A.-I.; Daescu, A.; Stoian, D. Effects of hormonal profile, weight, and body image on sexual function in women with polycystic ovary syndrome. Healthcare 2023, 11, 1488. [Google Scholar] [CrossRef] [PubMed]
Reference | Applied Approach | Inference Made |
---|---|---|
Shamik et al. [42] | PCOS classification was analyzed by using classifiers such as SVM, DT, RF, LR, QDA, LDA, KNN, Gradient Boost, AdaBoost, extreme gradient boosting (XGB), and CatBoost. | By assessing accuracy using out-of-bag error analysis, RF showed better performance in PCOS diagnosis. |
Elmannai et al. [43] | Employed basic ML models and AdaBoost. Hyperparameter optimization was used to determine the ideal hyperparameter. | Stacking ML with REF feature selection recorded the highest performance. |
Prajna et al. [44] | Various models were employed, such as Chi-Square, DT, RF, and LR. | Results were generated by chatbots based on the parameters produced by users. However, chatbots can predict the possibility of PCOS but cannot provide a definite diagnosis. |
Khanna et al. [45] | Feature selection was performed by applying the Harris Hawks Optimization algorithm. To improve performance, DL and XAI techniques were implemented. | The creation of an interface implemented real-time PCOS screening. |
Shazia et al. [46] | The ten hyper-parametrized ML models were applied with the Gaussian Naive Bayes classifier. | Overfitting was seen in the models. |
Vaidehi et al. [47] | The chi-square method was employed for feature selection, employing models such as RF, SVM, LR, GNB, and KNN. | The RF Classifier was the most accurate and dependable. |
Ahmed et al. [48] | PCOS was detected using various ML techniques, including convolutional neural networks and the naive Bayes technique. | Results in an imbalanced dataset, decreased detection rate, noise in ultrasound images, and less use of clustering approaches. |
Jyoti et al. [49] | Blood glucose levels were computed using support vector regression and extreme gradient boost regression algorithms. | The research led to the creation of a blood glucose surveillance device that uses a wristband and a mix of physiological and MFCC features to accurately predict blood glucose levels. |
cp | trestbps | chol | fbs | restecg | thalach | Glucose | BMI | Cycle Length | Waist: Hip Ratio | Weight Gain (Y/N) | Hair Growth (Y/N) | PCOS (Y/N) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 125 | 212 | 0 | 1 | 168 | 89 | 23.3 | 5 | 0.83 | 0 | 0 | 0 |
0 | 140 | 203 | 1 | 0 | 155 | 95 | 24.9 | 5 | 0.84 | 0 | 0 | 0 |
0 | 145 | 174 | 1 | 1 | 125 | 136 | 28.3 | 5 | 0.9 | 0 | 0 | 1 |
2 | 140 | 185 | 0 | 0 | 155 | 75 | 32 | 2 | 0.89 | 1 | 1 | 1 |
0 | 104 | 208 | 0 | 0 | 148 | 109 | 21.6 | 5 | 0.85 | 0 | 0 | 0 |
2 | 129 | 196 | 0 | 1 | 163 | 144 | 31.2 | 7 | 0.95 | 0 | 1 | 1 |
2 | 128 | 229 | 0 | 1 | 150 | 151 | 29.2 | 5 | 0.84 | 0 | 1 | 1 |
2 | 120 | 258 | 0 | 0 | 147 | 93 | 18.2 | 5 | 0.82 | 0 | 0 | 0 |
0 | 140 | 226 | 0 | 1 | 178 | 89 | 29.7 | 3 | 0.84 | 1 | 1 | 1 |
1 | 135 | 203 | 0 | 1 | 132 | 142 | 29.4 | 3 | 0.83 | 0 | 0 | 0 |
Parameters of WaO | Utilized Value | Parameters of CSO | Utilized Value |
---|---|---|---|
Maximum number of iterations, T | 50 | Maximum number of iterations | 50 |
Number of walrus/population, N | 25 | Number of nests/population | 25 |
Number of decision variables, m | 25 | Number of optimizable variables | 25 |
The discovery rate of alien eggs | 0.25 |
Learning Algorithm | Mean Validation Accuracy (%) |
---|---|
KNN | 79.3 |
SVM | 80.5 |
LR | 80.8 |
DT | 83.2 |
RF | 84.1 |
XGB | 83.7 |
DL | 81.0 |
Model/ Metric | KNN | SVM | LR | DT | RF | XGB | DL |
---|---|---|---|---|---|---|---|
Accuracy | 84.4 | 82.2 | 84.4 | 84.7 | 86.8 | 86.2 | 85.0 |
Precision | 84.1 | 76.0 | 85.7 | 85.4 | 84.6 | 84.3 | 80.4 |
Sensitivity | 63.8 | 65.5 | 62.1 | 70.7 | 75.9 | 74.1 | 70.7 |
Specificity | 94.3 | 90.2 | 95.1 | 94.3 | 93.4 | 93.4 | 91.8 |
NPV | 84.6 | 84.6 | 84.1 | 87.1 | 89.1 | 88.4 | 86.8 |
F1 score | 72.6 | 70.4 | 72.0 | 77.4 | 80.0 | 78.9 | 75.2 |
FPR | 05.7 | 09.8 | 04.9 | 05.7 | 06.6 | 06.6 | 08.2 |
FNR | 36.2 | 34.5 | 37.9 | 29.3 | 24.1 | 25.9 | 29.3 |
Model/ Metric | KNN | SVM | DT | LR | RF | XG | DL |
---|---|---|---|---|---|---|---|
Accuracy | 88.3 | 86.7 | 87.2 | 86.1 | 87.2 | 88.3 | 89.9 |
Precision | 91.0 | 83.3 | 80.7 | 81.6 | 81.8 | 91.1 | 89.6 |
Sensitivity | 69.0 | 71.4 | 79.3 | 71.4 | 77.6 | 70.7 | 74.1 |
Specificity | 97.5 | 93.6 | 91.0 | 92.7 | 91.8 | 96.7 | 95.9 |
NPV | 86.9 | 87.9 | 90.2 | 87.8 | 89.6 | 87.4 | 88.6 |
F1 score | 79.2 | 76.9 | 80.0 | 76.2 | 79.7 | 79.6 | 81.1 |
FPR | 02.5 | 06.5 | 09.0 | 07.3 | 08.2 | 03.3 | 04.1 |
FNR | 31.0 | 28.6 | 20.7 | 28.6 | 22.4 | 29.3 | 25.9 |
Algorithm | Hyperparameter Description | Search Space | Optimum Values for | ||
---|---|---|---|---|---|
RSOEL | CSOEL | WaOEL | |||
LR | The inverse of regularization strength | [0.001, 100] | 0.7 | 0.3 | 0.17 |
KNN | Number of neighbors | [8, 20] | 13 | 8 | 10 |
SVM | The inverse of regularization strength | [0.001, 100] | 0.04 | 0.21 | 0.063 |
DT | Min_samples_split | [2, 30] | 5 | 2 | 29 |
Min_samples_leaf | [2, 8] | 7 | 2 | 13 | |
Max_leaf_nodes | [2, 20] | 8 | 8 | 10 | |
Max_depth | [2, 20] | 6 | 2 | 12 | |
RF | n_estimators | [20, 150] | 46 | 20 | 31 |
max_depth | [2, 20] | 9 | 16 | 6 | |
min_sample_split | [2, 8] | 4 | 8 | 6 | |
min_sample_leaf | [2, 8] | 2 | 2 | 2 | |
random_state | [0, 100] | 61 | 61 | 94 | |
XGB | learning_rate | [0.001, 0.5] | 0.012 | 0.107 | 0.186 |
max_depth | [2, 20] | 11 | 19 | 15 | |
n_estimators | [10, 160] | 25 | 54 | 72 | |
DL | Hidden_layer_sizes (layer 1) | [4, 24] | 6 | 22 | 23 |
Hidden_layer_sizes (layer 2) | [10, 32] | 15 | 10 | 14 | |
Hidden_layer_sizes (layer 3) | [20, 80] | 48 | 24 | 70 | |
Hidden_layer_sizes (layer 4) | [4, 32] | 30 | 10 | 18 | |
learning_rate | [0.001, 0.5] | 0.02 | 0.018 | 0.022 | |
Meta-Classifier DL | Hidden_layer_sizes (layer 1) | [4, 32] | 10 | 8 | 8 |
Hidden_layer_sizes (layer 2) | [8, 40] | 11 | 18 | 18 | |
Hidden_layer_sizes (layer 3) | [20, 80] | 56 | 46 | 42 | |
Hidden_layer_sizes (layer 4) | [4, 32] | 30 | 16 | 18 | |
learning_rate | [0.001, 0.5] | 0.007 | 0.011 | 0.14 |
Learning Model | Average Validation Accuracy (%) |
---|---|
RSOEL | 86.3 |
CSOEL | 88.6 |
WaOEL | 90.1 |
Model | RSOEL | CSOEL | WaOEL |
---|---|---|---|
Accuracy | 89.9 | 91.7 | 92.8 |
Precision | 89.6 | 91.6 | 92.7 |
Sensitivity | 74.1 | 91.7 | 92.8 |
Specificity | 95.9 | 91.7 | 92.8 |
NPV | 88.6 | 91.7 | 92.8 |
F1 score | 81.1 | 91.6 | 92.7 |
FPR | 04.1 | 8.3 | 7.2 |
FNR | 25.9 | 8.3 | 7.2 |
Literature | Methods | Accuracy (%) | AUC |
---|---|---|---|
Tiwari et al. [42] | RF classifier with out-of-bag error tuning | 92.4 | 0.91 |
Thakre et al. [47] | RF, SVM, LR, Gaussian Naïve Bayes, KNN | 86.3 | 0.89 |
Zigrelli et al. [55] | RF classifier | 82.5 (invasive tests), 90.0 (non-invasive parameters) | - |
Proposed Technique | WaO tuned EL model | 92.7 | 0.93 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Panjwani, B.; Yadav, J.; Mohan, V.; Agarwal, N.; Agarwal, S. Optimized Machine Learning for the Early Detection of Polycystic Ovary Syndrome in Women. Sensors 2025, 25, 1166. https://doi.org/10.3390/s25041166
Panjwani B, Yadav J, Mohan V, Agarwal N, Agarwal S. Optimized Machine Learning for the Early Detection of Polycystic Ovary Syndrome in Women. Sensors. 2025; 25(4):1166. https://doi.org/10.3390/s25041166
Chicago/Turabian StylePanjwani, Bharti, Jyoti Yadav, Vijay Mohan, Neha Agarwal, and Saurabh Agarwal. 2025. "Optimized Machine Learning for the Early Detection of Polycystic Ovary Syndrome in Women" Sensors 25, no. 4: 1166. https://doi.org/10.3390/s25041166
APA StylePanjwani, B., Yadav, J., Mohan, V., Agarwal, N., & Agarwal, S. (2025). Optimized Machine Learning for the Early Detection of Polycystic Ovary Syndrome in Women. Sensors, 25(4), 1166. https://doi.org/10.3390/s25041166