research-article

Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking

Authors:

Gabriele Tolomei,

Fabrizio Silvestri,

Andrew Haines,

Mounia LalmasAuthors Info & Claims

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 465 - 474

https://doi.org/10.1145/3097983.3098039

Published: 04 August 2017 Publication History

Get Access

Abstract

Machine-learned models are often described as "black boxes". In many real-world applications however, models may have to sacrifice predictive power in favour of human-interpretability. When this is the case, feature engineering becomes a crucial task, which requires significant and time-consuming human effort. Whilst some features are inherently static, representing properties that cannot be influenced (e.g., the age of an individual), others capture characteristics that could be adjusted (e.g., the daily amount of carbohydrates taken). Nonetheless, once a model is learned from the data, each prediction it makes on new instances is irreversible - assuming every instance to be a static point located in the chosen feature space. There are many circumstances however where it is important to understand (i) why a model outputs a certain prediction on a given instance, (ii) which adjustable features of that instance should be modified, and finally (iii) how to alter such a prediction when the mutated instance is input back to the model.

In this paper, we present a technique that exploits the internals of a tree-based ensemble classifier to offer recommendations for transforming true negative instances into positively predicted ones. We demonstrate the validity of our approach using an online advertising application. First, we design a Random Forest classifier that effectively separates between two types of ads: low (negative) and high (positive) quality ads (instances). Then, we introduce an algorithm that provides recommendations that aim to transform a low quality ad (negative instance) into a high quality one (positive instance). Finally, we evaluate our approach on a subset of the active inventory of a large ad network, Yahoo Gemini.

Supplementary Material

MP4 File (silvestri_interpretable_predictions.mp4)

Download
424.90 MB

References

[1]

Nicola Barbieri, Fabrizio Silvestri, and Mounia Lalmas. 2016. Improving Post-Click User Engagement on Native Ads via Survival Analysis WWW '16. International World Wide Web Conferences Steering Committee, 761--770.

Google Scholar

[2]

Leo Breiman. 2001. Random Forests. Machine Learning, Vol. 45, 1 (Oct. 2001), 5--32. ibinfopersonRyan Stevens, Apostolis Zarras, Richard Kemmerer, Chris Kruegel, and Giovanni Vigna 2011. Understanding Fraudulent Activities in Online Ad Exchanges IMC '11. ACM, 279--294.

Digital Library

Google Scholar

[3]

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. CoRR (2013).

Google Scholar

[4]

Qiang Yang, Jie Yin, Charles Ling, and Rong Pan. 2007. Extracting Actionable Knowledge from Decision Trees. IEEE TKDE, Vol. 19, 1 (Jan. 2007), 43--56.

Crossref

Google Scholar

[5]

Qiang Yang, Jie Yin, Charles X. Ling, and Tielin Chen. 2003. Postprocessing Decision Trees to Extract Actionable Knowledge ICDM '03. IEEE Computer Society, 685--688.

Google Scholar

[6]

Hsiang-Fu Yu, Fang-Lan Huang, and Chih-Jen Lin. 2011. Dual Coordinate Descent Methods for Logistic Regression and Maximum Entropy Models. Machine Learning, Vol. 85, 1--2 (Oct. 2011), 41--75.

Digital Library

Google Scholar

[7]

Ke Zhou, Miriam Redi, Andrew Haines, and Mounia Lalmas. 2016. Predicting Pre-click Quality for Native Advertisements WWW '16. International World Wide Web Conferences Steering Committee, 299--310.

Google Scholar

Cited By

View all

Jiang JLeofante FRago AToni FDastani MSichman JAlechina NDignum V(2024)Recourse under Model Multiplicity via Argumentative EnsemblingProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662950(954-963)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3662950
Theodore Armand TBhattacharjee SKim H(2024)Overview of the Potentials of Multiple Instance Learning in Cancer Diagnosis: Applications, Challenges, and Future Directions2024 26th International Conference on Advanced Communications Technology (ICACT)10.23919/ICACT60172.2024.10471995(419-425)Online publication date: 4-Feb-2024
https://doi.org/10.23919/ICACT60172.2024.10471995
Wang ZHuang CYao X(2024)A Roadmap of Explainable Artificial Intelligence: Explain to Whom, When, What and How?ACM Transactions on Autonomous and Adaptive Systems10.1145/370200419:4(1-40)Online publication date: 24-Nov-2024
https://dl.acm.org/doi/10.1145/3702004
Show More Cited By

Index Terms

Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking

Recommendations

Hybrid feature tweaking: Combining random forest similarity tweaking with CLPFD
ICCDE '21: Proceedings of the 2021 7th International Conference on Computing and Data Engineering

When using prediction models created from data, it is in certain cases not sufficient for the users to only get a prediction, sometimes accompanied with a probability of the predictive outcome. Instead, a more elaborate answer is required, like given ...
Classifier Ensembles with the Extended Space Forest

The extended space forest is a new method for decision tree construction in which training is done with input vectors including all the original features and their random combinations. The combinations are generated with a difference operator applied to ...
Confidence in Predictions from Random Tree Ensembles
ICDM '11: Proceedings of the 2011 IEEE 11th International Conference on Data Mining

Obtaining an indication of confidence of predictions is desirable for many data mining applications. Such confidence levels, together with the predicted value, can inform on the certainty or extent of reliability that may be associated with the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 2017

2240 pages

ISBN:9781450348874

DOI:10.1145/3097983

General Chairs:
Stan Matwin
Dalhousie University
,
Shipeng Yu
LinkedIn
,
Faisal Farooq
IBM

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '17

Sponsor:

KDD '17: The 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 13 - 17, 2017

NS, Halifax, Canada

Acceptance Rates

KDD '17 Paper Acceptance Rate 64 of 748 submissions, 9%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

111
Total Citations
View Citations
1,460
Total Downloads

Downloads (Last 12 months)90
Downloads (Last 6 weeks)6

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Jiang JLeofante FRago AToni FDastani MSichman JAlechina NDignum V(2024)Recourse under Model Multiplicity via Argumentative EnsemblingProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662950(954-963)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3662950
Theodore Armand TBhattacharjee SKim H(2024)Overview of the Potentials of Multiple Instance Learning in Cancer Diagnosis: Applications, Challenges, and Future Directions2024 26th International Conference on Advanced Communications Technology (ICACT)10.23919/ICACT60172.2024.10471995(419-425)Online publication date: 4-Feb-2024
https://doi.org/10.23919/ICACT60172.2024.10471995
Wang ZHuang CYao X(2024)A Roadmap of Explainable Artificial Intelligence: Explain to Whom, When, What and How?ACM Transactions on Autonomous and Adaptive Systems10.1145/370200419:4(1-40)Online publication date: 24-Nov-2024
https://dl.acm.org/doi/10.1145/3702004
Zhang JZhou WUjcich B(2024)Provenance-Enabled Explainable AIProceedings of the ACM on Management of Data10.1145/36988262:6(1-27)Online publication date: 20-Dec-2024
https://dl.acm.org/doi/10.1145/3698826
Verma SBoonsanong VHoang MHines KDickerson JShah C(2024)Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A ReviewACM Computing Surveys10.1145/367711956:12(1-42)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3677119
Ciatto GSabbatini FAgiollo AMagnini MOmicini A(2024)Symbolic Knowledge Extraction and Injection with Sub-symbolic Predictors: A Systematic Literature ReviewACM Computing Surveys10.1145/364510356:6(1-35)Online publication date: 8-Feb-2024
https://dl.acm.org/doi/10.1145/3645103
Bernini ASilvestri FTolomei GBaeza-Yates RBonchi F(2024)Evading Community Detection via Counterfactual Neighborhood SearchProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671896(131-140)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671896
VanNostrand PHofmann DMa LRundensteiner E(2024)Actionable Recourse for Automated Decisions: Examining the Effects of Counterfactual Explanation Type and Presentation on Lay User UnderstandingProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3658997(1682-1700)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3630106.3658997
Ansari AWang KXiong PSerra ESpezzano F(2024)Out-of-Distribution Aware Classification for Tabular DataProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679755(65-75)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679755
Nenova ZBartelt V(2024)Identifying influential individuals and predicting future demand of chronic kidney disease patientsDecision Sciences10.1111/deci.12650Online publication date: 13-Oct-2024
https://doi.org/10.1111/deci.12650
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Hybrid feature tweaking: Combining random forest similarity tweaking with CLPFD

Classifier Ensembles with the Extended Space Forest

Confidence in Predictions from Random Tree Ensembles