[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3607947.3608092acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesic3Conference Proceedingsconference-collections
research-article

SVM Kernel and It’s Aggregation Using Stacking on Imbalanced Dataset

Published: 28 September 2023 Publication History

Abstract

The imbalanced dataset’s existing classification methods have low prediction accuracy for the minority class because of the little information present. Using over- and under-sampling techniques, we can improve the minority’s ability to forecast outcomes. However, the minority class’s accuracy of prediction is negatively impacted by the two methods due to the loss of vital information or the addition of irrelevant details for classification. SVM kernels have great abilities to handle asymmetric data, but when we need to use SVM kernels alone or as part of the ensemble technique for an unbalanced dataset, we don’t have a strong reason to choose which kernel to use, and also how a particular kernel will act depends a lot on the data set. In this paper, we present a framework in which several kernel SVM (Linear, Polynomial, Sigmoid, RBF) classifiers were utilized as the base learners and one of the kernels (say RBF kernel) as meta learner using the Stacking Ensembles technique, which shows that stacked generalization of SVM kernels gives similar results as best performing kernel for an imbalanced dataset of software change proneness, using AUC, ROC, MCC, and BAS as an evaluation matrix.

References

[1]
Rehan Akbani, Stephen Kwek, and Nathalie Japkowicz. 2004. Applying support vector machines to imbalanced datasets. In European conference on machine learning. Springer, 39–50.
[2]
Jale Bektaş. 2022. EKSL: An effective novel dynamic ensemble model for unbalanced datasets based on LR and SVM hyperplane-distances. Information Sciences 597 (2022), 182–192.
[3]
Jie Dou, Ali P Yunus, Dieu Tien Bui, Abdelaziz Merghadi, Mehebub Sahana, Zhongfan Zhu, Chi-Wen Chen, Zheng Han, and Binh Thai Pham. 2020. Improved landslide assessment using support vector machine with bagging, boosting, and stacking ensemble machine learning framework in a mountainous watershed, Japan. Landslides 17, 3 (2020), 641–658.
[4]
Xudong Du, Wei Li, Sumei Ruan, and Li Li. 2020. CUS-heterogeneous ensemble-based financial distress prediction for imbalanced dataset with ensemble feature selection. Applied Soft Computing 97 (2020), 106758.
[5]
Md Faisal Kabir and Simone A Ludwig. 2019. Enhancing the performance of classification using super learning. Data-Enabled Discovery and Applications 3, 1 (2019), 1–13.
[6]
Mateusz Lango and Jerzy Stefanowski. 2022. What makes multi-class imbalanced problems difficult? An experimental study. Expert Systems with Applications 199 (2022), 116962.
[7]
Sebastián Maldonado and Julio López. 2018. Dealing with high-dimensional class-imbalanced datasets: Embedded feature selection for SVM classification. Applied Soft Computing 67 (2018), 94–105.
[8]
Josey Mathew, Chee Khiang Pang, Ming Luo, and Weng Hoe Leong. 2017. Classification of imbalanced data by oversampling in kernel space of support vector machines. IEEE transactions on neural networks and learning systems 29, 9 (2017), 4065–4076.
[9]
Roweida Mohammed, Jumanah Rawashdeh, and Malak Abdullah. 2020. Machine learning with oversampling and undersampling techniques: overview study and experimental results. In 2020 11th international conference on information and communication systems (ICICS). IEEE, 243–248.
[10]
Forhad An Naim, Ummae Hamida Hannan, and Md Humayun Kabir. 2022. Effective rate of minority class over-sampling for maximizing the imbalanced dataset model performance. In Proceedings of Data Analytics and Management: ICDAM 2021, Volume 2. Springer, 9–20.
[11]
Ashley I Naimi and Laura B Balzer. 2018. Stacked generalization: an introduction to super learning. European journal of epidemiology 33, 5 (2018), 459–464.
[12]
Preeti Nair and Indu Kashyap. 2019. Hybrid pre-processing technique for handling imbalanced data and detecting outliers for KNN classifier. In 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon). IEEE, 460–464.
[13]
Haydemar Núñez, Luis Gonzalez-Abril, and Cecilio Angulo. 2017. Improving SVM classification on imbalanced datasets by introducing a new bias. Journal of Classification 34, 3 (2017), 427–443.
[14]
Namrata Singh and Pradeep Singh. 2020. A stacked generalization approach for diagnosis and prediction of type 2 diabetes mellitus. In computational intelligence in data mining. Springer, 559–570.
[15]
Jie Sun, Hui Li, Hamido Fujita, Binbin Fu, and Wenguo Ai. 2020. Class-imbalanced dynamic financial distress prediction based on Adaboost-SVM ensemble combined with SMOTE and time weighting. Information Fusion 54 (2020), 128–144.
[16]
Yuchun Tang, Yan-Qing Zhang, Nitesh V Chawla, and Sven Krasser. 2008. SVMs modeling for highly imbalanced classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39, 1 (2008), 281–288.
[17]
Xinmin Tao, Qing Li, Wenjie Guo, Chao Ren, Chenxi Li, Rui Liu, and Junrong Zou. 2019. Self-adaptive cost weights-based support vector machine cost-sensitive ensemble for imbalanced data classification. Information Sciences 487 (2019), 31–56.
[18]
Marwa Tolba, Salima Ouadfel, and Souham Meshoul. 2021. Hybrid ensemble approaches to online harassment detection in highly imbalanced data. Expert Systems with Applications 175 (2021), 114751.
[19]
Hamza Turabieh, Majdi Mafarja, and Xiaodong Li. 2019. Iterated feature selection algorithms with layered recurrent neural network for software fault prediction. Expert Systems with Applications 122 (2019), 27–42. https://doi.org/10.1016/j.eswa.2018.12.033
[20]
XU Xiaolong, CHEN Wen, and SUN Yanfei. 2019. Over-sampling algorithm for imbalanced data classification. Journal of Systems Engineering and Electronics 30, 6 (2019), 1182–1191.
[21]
Xin Yin, Quansheng Liu, Yucong Pan, Xing Huang, Jian Wu, and Xinyu Wang. 2021. Strength of stacking technique of ensemble learning in rockburst prediction with imbalanced data: Comparison of eight single and ensemble models. Natural Resources Research 30, 2 (2021), 1795–1815.
[22]
Chunkai Zhang, Ying Zhou, Jianwei Guo, Guoquan Wang, and Xuan Wang. 2019. Research on classification method of high-dimensional class-imbalanced datasets based on SVM. International Journal of Machine Learning and Cybernetics 10, 7 (2019), 1765–1778.
[23]
Aolong Zhou, Kaijun Ren, Xiaoyong Li, and Wen Zhang. 2019. MMSE: A multi-model stacking ensemble learning algorithm for purchase prediction. In 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC). IEEE, 96–102.
[24]
Bing Zhu, Xin Pan, Seppe vanden Broucke, and Jin Xiao. 2022. A GAN-based hybrid sampling method for imbalanced customer classification. Information Sciences 609 (2022), 1397–1411.
[25]
Seng Zian, Sameem Abdul Kareem, and Kasturi Dewi Varathan. 2021. An empirical evaluation of stacked ensembles with different meta-learners in imbalanced classification. IEEE Access 9 (2021), 87434–87452.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
IC3-2023: Proceedings of the 2023 Fifteenth International Conference on Contemporary Computing
August 2023
783 pages
ISBN:9798400700224
DOI:10.1145/3607947
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 September 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Ensemble Technique
  2. Imbalanced Dataset
  3. SVM Kernels
  4. Stacking

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

IC3 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 25
    Total Downloads
  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)3
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media