More Web Proxy on the site http://driver.im/

research-article

A practical study of methods for deriving insightful attribute importance rankings using decision bireducts

Authors:

Andrzej Janusz,

Dominik Ślęzak,

Sebastian Stawicki,

Krzysztof StencelAuthors Info & Claims

Volume 645, Issue C

https://doi.org/10.1016/j.ins.2023.119354

Published: 01 October 2023 Publication History

Abstract

Subject matter experts (SMEs) often rely on attribute importance rankings to verify machine learning models, acquire insights into their outcomes, and gain a deeper understanding of the investigated phenomena. To further increase their usefulness, we introduce a new approach to the evaluation of attribute rankings produced by any machine learning method. As a real-world case study, we investigate the attribute importance scores produced using XGBoost and decision bireducts on the data gathered by an HR company, where the goal is to predict the willingness of candidates to change their job. For this task, XGBoost delivers accurate models but fails to identify many attributes that are important to SMEs. In comparison, decision bireducts lead to models that are easier to interpret and explore the data with a higher focus on the diversity of attributes. The ensembles of decision bireducts deliver comparable accuracy and their associated attribute rankings are more insightful than those of XGBoost.

Highlights

•

We propose a procedure to evaluate and compare the attribute importance rankings.

•

We compare attribute rankings produced by XGBoost and decision bireduct ensemble.

•

We propose improvements to the state-of-the-art decision bireduct computation algorithms.

•

We describe and thoroughly analyze a real-world application related to the HR industry.

•

In experiments, bireducts provided comparable accuracy and more insightful rankings than XGBoost.

References

[1]

A. Barredo Arrieta, N. Diaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. Garcia, S. Gil-Lopez, D. Molina, R. Benjamins, R. Chatila, F. Herrera, Explainable Artificial Intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI, Inf. Fusion 58 (2020) 82–115.

Digital Library

[2]

J. Błaszczyński, A.T. de Almeida Filho, A. Matuszyk, M. Szeląg, R. Słowiński, Auto loan fraud detection using dominance-based rough set approach versus machine learning methods, Expert Syst. Appl. 163 (2021).

[3]

J. Bobadilla, F. Ortega, A. Hernando, A. Gutiérrez, Recommender systems survey, Knowl.-Based Syst. 46 (2013) 109–132.

Digital Library

[4]

S. Boeschoten, C. Catal, B. Tekinerdogan, A. Lommen, M. Blokland, The automation of the development of classification models and improvement of model quality using feature engineering techniques, Expert Syst. Appl. 213 (2023).

[5]

L. Breiman, Random forests, Mach. Learn. 45 (2001) 5–32.

Digital Library

[6]

M. Cerioli, M. Leotta, F. Ricca, COVID-19 hits the job market: an 88 million job ads analysis, in: Proceedings of the 36th Annual ACM Symposium on Applied Computing (SAC 2021), 2021, pp. 1721–1726.

[7]

T. Chen, C. Guestrin, XGBoost: a scalable tree boosting system, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016), 2016, pp. 785–794.

[8]

Y. Cheng, Y. Xie, Z. Chen, A. Agrawal, A. Choudhary, S. Guo, JobMiner: a real-time system for mining job-related patterns from social media, in: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013, pp. 1450–1453.

[9]

S. Delecraz, L. Eltarr, O. Oullier, Transparency and explainability of a machine learning model in the context of human resource management, in: Proceedings of the Workshop on Ethical and Legal Issues in Human Language Technologies and Multilingual De-Identification of Sensitive Data in Language Resources Within the 13th Language Resources and Evaluation Conference (LEGAL 2022), 2022, pp. 38–43.

[10]

W. Duch, T. Wieczorek, J. Biesiada, M. Blachnik, Comparison of feature ranking methods based on information entropy, in: Proceedings of the 2004 IEEE International Joint Conference on Neural Networks (IJCNN 2004) – Part II, 2004, pp. 1415–1419.

[11]

K. Fauvel, É. Fromont, V. Masson, P. Faverdin, A. Termier, XEM: an explainable-by-design ensemble method for multivariate time series classification, Data Min. Knowl. Discov. 36 (2022) 917–957.

[12]

M. Garbulowski, K. Diamanti, K. Smolińska, N. Baltzer, P. Stoll, S. Bornelöv, A. Øhrn, L. Feuk, J. Komorowski, R.ROSETTA: an interpretable machine learning framework, BMC Bioinform. 22 (2021) 110.

[13]

L.H. Gilpin, D. Bau, B.Z. Yuan, A. Bajwa, M.A. Specter, L. Kagal, Explaining explanations: an overview of interpretability of machine learning, in: Proceedings of the 5th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2018), 2018, pp. 80–89.

[14]

D. Goretzko, L.S.F. Israel, Pitfalls of machine learning based personnel selection – fairness, transparency and data quality, J. Person. Psychol. 21 (2021) 37–47.

[15]

S. Hara, K. Hayashi, Making tree ensembles interpretable: a Bayesian model selection approach, in: Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018, pp. 77–85.

[16]

G. Hinton, S. Roweis, Stochastic neighbor embedding, Adv. Neural Inf. Process. Syst. 15 (2003) 833–840.

[17]

A. Janusz, D. Kałuża, M. Matraszek, Ł. Grad, M. Świechowski, D. Ślęzak, Learning multimodal entity representations and their ensembles, with applications in a data-driven advisory framework for video game players, Inf. Sci. 617 (2022) 193–210.

[18]

A. Janusz, D. Ślęzak, Computation of approximate reducts with dynamically adjusted approximation threshold, in: Proceedings of the 22nd International Symposium on Methodologies for Intelligent Systems (ISMIS 2015), 2015, pp. 19–28.

[19]

A. Janusz, D. Ślęzak, KnowledgePit meets BrightBox: a step toward insightful investigation of the results of data science competitions, in: Proceedings of the 17th Conference on Computer Science and Intelligence Systems (FedCSIS 2022), 2022, pp. 393–398.

[20]

A. Janusz, S. Stawicki, M. Drewniak, K. Ciebiera, D. Ślęzak, K. Stencel, How to match jobs and candidates – a recruitment support system based on feature engineering and advanced analytics, in: Proceedings of the 17th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2018) – Part II, 2018, pp. 503–514.

[21]

A. Janusz, A. Zalewska, Ł. Wawrowski, P. Biczyk, J. Ludziejewski, M. Sikora, D. Ślęzak, BrightBox – a rough set based technology for diagnosing mistakes of machine learning models, Appl. Soft Comput. 141 (2023).

[22]

M. Jerbi, Z. Chelly Dagdia, S. Bechikh, L. Ben Said, Malware evolution and detection based on the variable precision rough set model, in: Proceedings of the 17th Conference on Computer Science and Intelligence Systems (FedCSIS 2022), 2022, pp. 253–262.

[23]

A.V. Konstantinov, L.V. Utkin, Interpretable machine learning with an ensemble of gradient boosting machines, Knowl.-Based Syst. 222 (2021).

[24]

Q. Le, T. Mikolov, Distributed representations of sentences and documents, in: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), 2014, pp. 1188–1196.

[25]

J. Li, D. Arya, V. Ha-Thuc, S. Sinha, How to get them a dream job?: Entity-aware features for personalized job search ranking, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016), 2016, pp. 501–510.

[26]

J. Lu, D. Wu, M. Mao, W. Wang, G. Zhang, Recommender system application developments: a survey, Decis. Support Syst. 74 (2015) 12–32.

Digital Library

[27]

S.M. Lundberg, G. Erion, H. Chen, A. DeGrave, J.M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, S.I. Lee, From local explanations to global understanding with explainable AI for trees, Nat. Mach. Intell. 2 (2020) 56–67.

[28]

S.M. Lundberg, S.I. Lee, A unified approach to interpreting model predictions, in: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS 2017), 2017, pp. 4768–4777.

[29]

N. Mac Parthaláin, R. Jensen, R. Diao, Fuzzy-rough set bireducts for data reduction, IEEE Trans. Fuzzy Syst. 28 (2020) 1840–1850.

[30]

J. Morales-Arilla, C. Daboin, Is remote work in high demand? Evidence from job postings during COVID-19, in: Proceedings of the ACM SIGCAS Conference on Computing and Sustainable Societies (COMPASS 2021), 2021, pp. 27–37.

[31]

D. Nguyen, S. Gupta, S. Rana, A. Shilton, S. Venkatesh, Fairness improvement for black-box classifiers with Gaussian process, Inf. Sci. 576 (2021) 542–556.

[32]

M.T. Özsu, A systematic view of data science, IEEE Data Eng. Bull. 43 (2020) 3–11.

[33]

Z. Pawlak, A. Skowron, Rudiments of rough sets, Inf. Sci. 177 (2007) 3–27.

[34]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in Python, J. Mach. Learn. Res. 12 (2011) 2825–2830.

[35]

B. Pękała, T. Mroczek, D. Gil, M. Kępski, Application of fuzzy and rough logic to posture recognition in fall detection system, Sensors 22 (2022) 1602.

[36]

J. Quevedo, A. Bahamonde, O. Luaces, A simple and efficient method for variable ranking according to their usefulness for learning, Comput. Stat. Data Anal. 52 (2007) 578–595.

[37]

I. Ramezani, M. Khorram Niaki, M. Dehghani, M. Rezapour, Stability analysis of feature ranking techniques in the presence of noise: a comparative study, Int. J. Bus. Intell. Data Min. 17 (2020) 413.

[38]

P. Refaeilzadeh, L. Tang, H. Liu, On comparison of feature selection algorithms, in: Proceedings of the AAAI 2007 Workshop on Evaluation Methods for Machine Learning II, 2007, pp. 34–39.

[39]

M.T. Ribeiro, S. Singh, C. Guestrin, Anchors: high-precision model-agnostic explanations, in: Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI 2018), 2018, pp. 1527–1535.

[40]

O. Sagi, L. Rokach, Approximating XGBoost with an interpretable decision tree, Inf. Sci. 572 (2021) 522–542.

[41]

B. Seijo-Pardo, V. Bolón-Canedo, I. Porto-Díaz, A. Alonso-Betanzos, Ensemble feature selection for rankings of features, in: Proceedings of the International Work-Conference on Artificial Neural Networks (IWANN 2015) – Part II, 2015, pp. 29–42.

[42]

A. Singh, C. Rose, K. Visweswariah, V. Chenthamarakshan, N. Kambhatla, PROSPECT: a system for screening candidates for recruitment, in: Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM 2010), 2010, pp. 659–668.

[43]

A. Skowron, D. Ślęzak, Rough sets turn 40: from information systems to intelligent systems, in: Proceedings of the 17th Conference on Computer Science and Intelligence Systems (FedCSIS 2022), 2022, pp. 23–34.

[44]

D. Ślęzak, M. Grzegorowski, A. Janusz, M. Kozielski, S.H. Nguyen, M. Sikora, S. Stawicki, Ł. Wróbel, A framework for learning and embedding multi-sensor forecasting models into a decision support system: a case study of methane concentration in coal mines, Inf. Sci. 451–452 (2018) 112–133.

[45]

D. Ślęzak, A. Janusz, Ensembles of bireducts: towards robust classification and simple representation, in: Proceedings of the 3rd International Conference on Future Generation Information Technology (FGIT 2011), 2011, pp. 64–77.

[46]

S. Stawicki, D. Ślęzak, A. Janusz, S. Widz, Decision bireducts and decision reducts – a comparison, Int. J. Approx. Reason. 84 (2017) 75–109.

[47]

R.A. Stein, P.A. Jaques, J.F. Valiati, An analysis of hierarchical text classification using word embeddings, Inf. Sci. 471 (2019) 216–232.

[48]

M. Wojtas, K. Chen, Feature importance ranking for deep learning, in: Proceedings of the 33rd International Conference on Neural Information Processing Systems (NIPS 2020), 2020, pp. 5105–5114.

[49]

X. Yi, J. Allan, W.B. Croft, Matching resumes and jobs based on relevance models, in: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2007), 2007, pp. 809–810.

[50]

C. Zhu, H. Zhu, H. Xiong, C. Ma, F. Xie, P. Ding, P. Li, Person-job fit: adapting the right talent for the right job with joint representation learning, ACM Trans. Manag. Inf. Syst. 9 (2018) 12:1–12:17.

Cited By

Jiang FHu QYang ZLiu JDu J(2025)A neighborhood rough sets-based ensemble method, with application to software fault predictionExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.125919264:COnline publication date: 10-Mar-2025
https://dl.acm.org/doi/10.1016/j.eswa.2024.125919
Fallahnejad ZKarimian MLashkari FBeigy H(2023)T-shaped expert mining: a novel approach based on skill translation and focal lossJournal of Intelligent Information Systems10.1007/s10844-023-00831-y62:2(535-554)Online publication date: 28-Nov-2023
https://dl.acm.org/doi/10.1007/s10844-023-00831-y

Index Terms

A practical study of methods for deriving insightful attribute importance rankings using decision bireducts

Index terms have been assigned to the content through auto-classification.

Recommendations

Min-max attribute-object bireducts: On unifying models of reducts in rough set theory
Abstract
A decision table describes a finite set of objects OB by using a finite set of condition attributes C and a finite set of decision attributes D. Pawlak defines attribute reducts by considering the entire decision table. As a ...
Ensembles of bireducts: towards robust classification and simple representation
FGIT'11: Proceedings of the Third international conference on Future Generation Information Technology

We introduce the notion of a bireduct, which is an extension of the notion of a reduct developed within the theory of rough sets. For a decision system <InlineEquation ID="IEq1"><InlineMediaObject><ImageObject FileRef="978-3-642-27142-7_9_Chapter_...
Recent Advances in Decision Bireducts: Complexity, Heuristics and Streams
Proceedings of the 8th International Conference on Rough Sets and Knowledge Technology - Volume 8171

We continue our research on decision bireducts. For a decision system $\mathbb{A}$ = U,A ï ź { d }, a decision bireduct is a pair B , X , where B ⊆ A is a subset of attributes discerning all pairs of objects in X ⊆ U with different values on the decision ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information Sciences: an International Journal

Information Sciences: an International Journal Volume 645, Issue C

Oct 2023

904 pages

ISSN:0020-0255

Issue’s Table of Contents

Elsevier Inc.

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 October 2023

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jiang FHu QYang ZLiu JDu J(2025)A neighborhood rough sets-based ensemble method, with application to software fault predictionExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.125919264:COnline publication date: 10-Mar-2025
https://dl.acm.org/doi/10.1016/j.eswa.2024.125919
Fallahnejad ZKarimian MLashkari FBeigy H(2023)T-shaped expert mining: a novel approach based on skill translation and focal lossJournal of Intelligent Information Systems10.1007/s10844-023-00831-y62:2(535-554)Online publication date: 28-Nov-2023
https://dl.acm.org/doi/10.1007/s10844-023-00831-y

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents