[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

A practical study of methods for deriving insightful attribute importance rankings using decision bireducts

Published: 01 October 2023 Publication History

Abstract

Subject matter experts (SMEs) often rely on attribute importance rankings to verify machine learning models, acquire insights into their outcomes, and gain a deeper understanding of the investigated phenomena. To further increase their usefulness, we introduce a new approach to the evaluation of attribute rankings produced by any machine learning method. As a real-world case study, we investigate the attribute importance scores produced using XGBoost and decision bireducts on the data gathered by an HR company, where the goal is to predict the willingness of candidates to change their job. For this task, XGBoost delivers accurate models but fails to identify many attributes that are important to SMEs. In comparison, decision bireducts lead to models that are easier to interpret and explore the data with a higher focus on the diversity of attributes. The ensembles of decision bireducts deliver comparable accuracy and their associated attribute rankings are more insightful than those of XGBoost.

Highlights

We propose a procedure to evaluate and compare the attribute importance rankings.
We compare attribute rankings produced by XGBoost and decision bireduct ensemble.
We propose improvements to the state-of-the-art decision bireduct computation algorithms.
We describe and thoroughly analyze a real-world application related to the HR industry.
In experiments, bireducts provided comparable accuracy and more insightful rankings than XGBoost.

References

[1]
A. Barredo Arrieta, N. Diaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. Garcia, S. Gil-Lopez, D. Molina, R. Benjamins, R. Chatila, F. Herrera, Explainable Artificial Intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI, Inf. Fusion 58 (2020) 82–115.
[2]
J. Błaszczyński, A.T. de Almeida Filho, A. Matuszyk, M. Szeląg, R. Słowiński, Auto loan fraud detection using dominance-based rough set approach versus machine learning methods, Expert Syst. Appl. 163 (2021).
[3]
J. Bobadilla, F. Ortega, A. Hernando, A. Gutiérrez, Recommender systems survey, Knowl.-Based Syst. 46 (2013) 109–132.
[4]
S. Boeschoten, C. Catal, B. Tekinerdogan, A. Lommen, M. Blokland, The automation of the development of classification models and improvement of model quality using feature engineering techniques, Expert Syst. Appl. 213 (2023).
[5]
L. Breiman, Random forests, Mach. Learn. 45 (2001) 5–32.
[6]
M. Cerioli, M. Leotta, F. Ricca, COVID-19 hits the job market: an 88 million job ads analysis, in: Proceedings of the 36th Annual ACM Symposium on Applied Computing (SAC 2021), 2021, pp. 1721–1726.
[7]
T. Chen, C. Guestrin, XGBoost: a scalable tree boosting system, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016), 2016, pp. 785–794.
[8]
Y. Cheng, Y. Xie, Z. Chen, A. Agrawal, A. Choudhary, S. Guo, JobMiner: a real-time system for mining job-related patterns from social media, in: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013, pp. 1450–1453.
[9]
S. Delecraz, L. Eltarr, O. Oullier, Transparency and explainability of a machine learning model in the context of human resource management, in: Proceedings of the Workshop on Ethical and Legal Issues in Human Language Technologies and Multilingual De-Identification of Sensitive Data in Language Resources Within the 13th Language Resources and Evaluation Conference (LEGAL 2022), 2022, pp. 38–43.
[10]
W. Duch, T. Wieczorek, J. Biesiada, M. Blachnik, Comparison of feature ranking methods based on information entropy, in: Proceedings of the 2004 IEEE International Joint Conference on Neural Networks (IJCNN 2004) – Part II, 2004, pp. 1415–1419.
[11]
K. Fauvel, É. Fromont, V. Masson, P. Faverdin, A. Termier, XEM: an explainable-by-design ensemble method for multivariate time series classification, Data Min. Knowl. Discov. 36 (2022) 917–957.
[12]
M. Garbulowski, K. Diamanti, K. Smolińska, N. Baltzer, P. Stoll, S. Bornelöv, A. Øhrn, L. Feuk, J. Komorowski, R.ROSETTA: an interpretable machine learning framework, BMC Bioinform. 22 (2021) 110.
[13]
L.H. Gilpin, D. Bau, B.Z. Yuan, A. Bajwa, M.A. Specter, L. Kagal, Explaining explanations: an overview of interpretability of machine learning, in: Proceedings of the 5th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2018), 2018, pp. 80–89.
[14]
D. Goretzko, L.S.F. Israel, Pitfalls of machine learning based personnel selection – fairness, transparency and data quality, J. Person. Psychol. 21 (2021) 37–47.
[15]
S. Hara, K. Hayashi, Making tree ensembles interpretable: a Bayesian model selection approach, in: Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018, pp. 77–85.
[16]
G. Hinton, S. Roweis, Stochastic neighbor embedding, Adv. Neural Inf. Process. Syst. 15 (2003) 833–840.
[17]
A. Janusz, D. Kałuża, M. Matraszek, Ł. Grad, M. Świechowski, D. Ślęzak, Learning multimodal entity representations and their ensembles, with applications in a data-driven advisory framework for video game players, Inf. Sci. 617 (2022) 193–210.
[18]
A. Janusz, D. Ślęzak, Computation of approximate reducts with dynamically adjusted approximation threshold, in: Proceedings of the 22nd International Symposium on Methodologies for Intelligent Systems (ISMIS 2015), 2015, pp. 19–28.
[19]
A. Janusz, D. Ślęzak, KnowledgePit meets BrightBox: a step toward insightful investigation of the results of data science competitions, in: Proceedings of the 17th Conference on Computer Science and Intelligence Systems (FedCSIS 2022), 2022, pp. 393–398.
[20]
A. Janusz, S. Stawicki, M. Drewniak, K. Ciebiera, D. Ślęzak, K. Stencel, How to match jobs and candidates – a recruitment support system based on feature engineering and advanced analytics, in: Proceedings of the 17th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2018) – Part II, 2018, pp. 503–514.
[21]
A. Janusz, A. Zalewska, Ł. Wawrowski, P. Biczyk, J. Ludziejewski, M. Sikora, D. Ślęzak, BrightBox – a rough set based technology for diagnosing mistakes of machine learning models, Appl. Soft Comput. 141 (2023).
[22]
M. Jerbi, Z. Chelly Dagdia, S. Bechikh, L. Ben Said, Malware evolution and detection based on the variable precision rough set model, in: Proceedings of the 17th Conference on Computer Science and Intelligence Systems (FedCSIS 2022), 2022, pp. 253–262.
[23]
A.V. Konstantinov, L.V. Utkin, Interpretable machine learning with an ensemble of gradient boosting machines, Knowl.-Based Syst. 222 (2021).
[24]
Q. Le, T. Mikolov, Distributed representations of sentences and documents, in: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), 2014, pp. 1188–1196.
[25]
J. Li, D. Arya, V. Ha-Thuc, S. Sinha, How to get them a dream job?: Entity-aware features for personalized job search ranking, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016), 2016, pp. 501–510.
[26]
J. Lu, D. Wu, M. Mao, W. Wang, G. Zhang, Recommender system application developments: a survey, Decis. Support Syst. 74 (2015) 12–32.
[27]
S.M. Lundberg, G. Erion, H. Chen, A. DeGrave, J.M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, S.I. Lee, From local explanations to global understanding with explainable AI for trees, Nat. Mach. Intell. 2 (2020) 56–67.
[28]
S.M. Lundberg, S.I. Lee, A unified approach to interpreting model predictions, in: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS 2017), 2017, pp. 4768–4777.
[29]
N. Mac Parthaláin, R. Jensen, R. Diao, Fuzzy-rough set bireducts for data reduction, IEEE Trans. Fuzzy Syst. 28 (2020) 1840–1850.
[30]
J. Morales-Arilla, C. Daboin, Is remote work in high demand? Evidence from job postings during COVID-19, in: Proceedings of the ACM SIGCAS Conference on Computing and Sustainable Societies (COMPASS 2021), 2021, pp. 27–37.
[31]
D. Nguyen, S. Gupta, S. Rana, A. Shilton, S. Venkatesh, Fairness improvement for black-box classifiers with Gaussian process, Inf. Sci. 576 (2021) 542–556.
[32]
M.T. Özsu, A systematic view of data science, IEEE Data Eng. Bull. 43 (2020) 3–11.
[33]
Z. Pawlak, A. Skowron, Rudiments of rough sets, Inf. Sci. 177 (2007) 3–27.
[34]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in Python, J. Mach. Learn. Res. 12 (2011) 2825–2830.
[35]
B. Pękała, T. Mroczek, D. Gil, M. Kępski, Application of fuzzy and rough logic to posture recognition in fall detection system, Sensors 22 (2022) 1602.
[36]
J. Quevedo, A. Bahamonde, O. Luaces, A simple and efficient method for variable ranking according to their usefulness for learning, Comput. Stat. Data Anal. 52 (2007) 578–595.
[37]
I. Ramezani, M. Khorram Niaki, M. Dehghani, M. Rezapour, Stability analysis of feature ranking techniques in the presence of noise: a comparative study, Int. J. Bus. Intell. Data Min. 17 (2020) 413.
[38]
P. Refaeilzadeh, L. Tang, H. Liu, On comparison of feature selection algorithms, in: Proceedings of the AAAI 2007 Workshop on Evaluation Methods for Machine Learning II, 2007, pp. 34–39.
[39]
M.T. Ribeiro, S. Singh, C. Guestrin, Anchors: high-precision model-agnostic explanations, in: Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI 2018), 2018, pp. 1527–1535.
[40]
O. Sagi, L. Rokach, Approximating XGBoost with an interpretable decision tree, Inf. Sci. 572 (2021) 522–542.
[41]
B. Seijo-Pardo, V. Bolón-Canedo, I. Porto-Díaz, A. Alonso-Betanzos, Ensemble feature selection for rankings of features, in: Proceedings of the International Work-Conference on Artificial Neural Networks (IWANN 2015) – Part II, 2015, pp. 29–42.
[42]
A. Singh, C. Rose, K. Visweswariah, V. Chenthamarakshan, N. Kambhatla, PROSPECT: a system for screening candidates for recruitment, in: Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM 2010), 2010, pp. 659–668.
[43]
A. Skowron, D. Ślęzak, Rough sets turn 40: from information systems to intelligent systems, in: Proceedings of the 17th Conference on Computer Science and Intelligence Systems (FedCSIS 2022), 2022, pp. 23–34.
[44]
D. Ślęzak, M. Grzegorowski, A. Janusz, M. Kozielski, S.H. Nguyen, M. Sikora, S. Stawicki, Ł. Wróbel, A framework for learning and embedding multi-sensor forecasting models into a decision support system: a case study of methane concentration in coal mines, Inf. Sci. 451–452 (2018) 112–133.
[45]
D. Ślęzak, A. Janusz, Ensembles of bireducts: towards robust classification and simple representation, in: Proceedings of the 3rd International Conference on Future Generation Information Technology (FGIT 2011), 2011, pp. 64–77.
[46]
S. Stawicki, D. Ślęzak, A. Janusz, S. Widz, Decision bireducts and decision reducts – a comparison, Int. J. Approx. Reason. 84 (2017) 75–109.
[47]
R.A. Stein, P.A. Jaques, J.F. Valiati, An analysis of hierarchical text classification using word embeddings, Inf. Sci. 471 (2019) 216–232.
[48]
M. Wojtas, K. Chen, Feature importance ranking for deep learning, in: Proceedings of the 33rd International Conference on Neural Information Processing Systems (NIPS 2020), 2020, pp. 5105–5114.
[49]
X. Yi, J. Allan, W.B. Croft, Matching resumes and jobs based on relevance models, in: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2007), 2007, pp. 809–810.
[50]
C. Zhu, H. Zhu, H. Xiong, C. Ma, F. Xie, P. Ding, P. Li, Person-job fit: adapting the right talent for the right job with joint representation learning, ACM Trans. Manag. Inf. Syst. 9 (2018) 12:1–12:17.

Cited By

View all
  • (2024)T-shaped expert mining: a novel approach based on skill translation and focal lossJournal of Intelligent Information Systems10.1007/s10844-023-00831-y62:2(535-554)Online publication date: 1-Apr-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information Sciences: an International Journal
Information Sciences: an International Journal  Volume 645, Issue C
Oct 2023
904 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 October 2023

Author Tags

  1. Attribute importance rankings
  2. XGBoost and ensembles of decision trees
  3. Ensembles of decision bireducts
  4. Recruitment support systems

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)T-shaped expert mining: a novel approach based on skill translation and focal lossJournal of Intelligent Information Systems10.1007/s10844-023-00831-y62:2(535-554)Online publication date: 1-Apr-2024

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media