research-article

Open access

Multi-objective Feature Attribution Explanation For Explainable Machine Learning

Authors:

Ziming Wang,

Changwu Huang,

Yun Li,

Xin YaoAuthors Info & Claims

ACM Transactions on Evolutionary Learning and Optimization, Volume 4, Issue 1

Article No.: 2, Pages 1 - 32

https://doi.org/10.1145/3617380

Published: 23 February 2024 Publication History

PDF eReader

Abstract

The feature attribution-based explanation (FAE) methods, which indicate how much each input feature contributes to the model’s output for a given data point, are one of the most popular categories of explainable machine learning techniques. Although various metrics have been proposed to evaluate the explanation quality, no single metric could capture different aspects of the explanations. Different conclusions might be drawn using different metrics. Moreover, during the processes of generating explanations, existing FAE methods either do not consider any evaluation metric or only consider the faithfulness of the explanation, failing to consider multiple metrics simultaneously. To address this issue, we formulate the problem of creating FAE explainable models as a multi-objective learning problem that considers multiple explanation quality metrics simultaneously. We first reveal conflicts between various explanation quality metrics, including faithfulness, sensitivity, and complexity. Then, we define the considered multi-objective explanation problem and propose a multi-objective feature attribution explanation (MOFAE) framework to address this newly defined problem. Subsequently, we instantiate the framework by simultaneously considering the explanation’s faithfulness, sensitivity, and complexity. Experimental results comparing with six state-of-the-art FAE methods on eight datasets demonstrate that our method can optimize multiple conflicting metrics simultaneously and can provide explanations with higher faithfulness, lower sensitivity, and lower complexity than the compared methods. Moreover, the results have shown that our method has better diversity, i.e., it provides various explanations that achieve different tradeoffs between multiple conflicting explanation quality metrics. Therefore, it can provide tailored explanations to different stakeholders based on their specific requirements.

1 Introduction

With the recent development and success of artificial intelligence (AI), AI-enabled systems have become very popular and prevalent in more and more areas, such as healthcare, financial services, transportation, retail, and so forth. AI has become an increasingly important part of our lives, from image and facial recognition systems [Qiu et al. 2021], medical diagnosis [Kleppe et al. 2021], and financial risk control [Zhang 2020], to autonomous vehicles [Vishnukumar et al. 2017], and hence has a more and more significant impact on human lives. Nowadays, as decisions from AI systems play important roles in human lives, researchers are increasingly focusing on explainability, transparency, and many other ethical issues related to AI [Huang et al. 2022]. There is a crucial need to understand how AI systems make decisions so people can trust these AI-based systems. However, many machine learning (ML) models, especially the most popular deep neural networks, are “black boxes” and are quite opaque. Consequently, the decisions made by AI systems powered by black-box ML models are difficult to comprehend or understand by humans, which leads people to not fully trust these AI systems, especially in high-risk areas such as healthcare, finance, and law. The opacity of AI/ML has stimulated the rise of eXplainable AI (XAI), which refers to methods and techniques to provide insights into the decision-making of AI systems and hence contribute to improving the interpretability of AI [Arrieta et al. 2020].

Additionally, explainability or interpretability has become one of the requirements in AI regulations. For example, the General Data Protection Regulation (GDPR), adopted by the European Parliament in 2018, stipulates that the algorithms of AI should be able to explain their decision logic [Union 2018]. Furthermore, the European Commission High-Level Expert Group on Artificial Intelligence, the Chinese New Generation AI Governance Expert Committee, the US Food and Drug Administration, and others have also issued documents requiring the development of transparent and explainable AI systems [Committee 2019; Intelligence 2019; Food et al. 2021]. These regulations clarify the need for explaining algorithmic decisions, which promote the development of XAI.

As XAI has received widespread attention and interest from the community, many efforts have been put into improving the explainability of AI, and several XAI methods have been proposed during the past decade [Adadi and Berrada 2018; Das and Rad 2020; Zhang et al. 2021b; Guidotti et al. 2018; Thampi 2022]. The existing XAI methods can be classified into the following three perspectives [Kamath and Liu 2021]:

–

Based on the application stage, XAI methods can be categorized as pre-model, intrinsic, and post hoc (post-model) explanation methods.

–

Based on the dependence between XAI methods and the model, XAI methods can be divided into model-specific and model-agnostic explanation methods.

–

Based on the scope of the explanation, XAI methods can be classified into global and local explanation methods.

Among the existing XAI methods, feature attribution-based explanation (FAE) methods are a popular collection of interpretable ML techniques that explain the predictions of a black-box ML model by computing the attribution of each input feature (i.e., the importance of each input feature) to the model’s prediction [Wang et al. 2023]. According to the above taxonomy, the FAE methods discussed in this article belong to both post hoc and local explainability approaches. FAE methods can be further divided into gradient-based and perturbation-based [Molnar 2020]. Usually, gradient-based FAE methods are model-specific, as they compute the attribution or importance of input features based on some evaluation of gradient signals passed from the back-propagation of neural networks, including Integrated Gradients (IG) [Sundararajan et al. 2017], SmoothGrad (SG) [Smilkov et al. 2017], and GradCAM [Selvaraju et al. 2017]. While perturbation-based FAE methods are model-agnostic, they identify the attribution of input features by evaluating the impact of feature perturbation on the model’s output, and such methods include SHAP [Lundberg and Lee 2017], LIME [Ribeiro et al. 2016], and its related variants (e.g., K-LIME [Hall et al. 2017], DLIME [Zafar and Khan 2019], and BayLIME [Zhao et al. 2021]). Obviously, perturbation-based FAE methods are more general and flexible than gradient-based FAE methods, because they do not require any information internal to the model, and they are model-agnostic, i.e., they can be applied to any model.

Recently, FAE methods have received more and more attention, because they can provide easy-to-comprehend explanations through feature importance scores. In addition, much of the literature states that FAE is by far the most vibrant, widely used, and well-studied explainability technique [Bhatt et al. 2020; Arrieta et al. 2020]. Moreover, the FAE method has been widely used in various fields such as healthcare [Dave et al. 2020; Shaikh et al. 2022], finance [Ohana et al. 2021; Jaeger et al. 2021], and law [Górski and Ramakrishna 2021]. Therefore, this work focuses on FAE methods.

With the widespread use of FAE methods, various metrics or criteria have been proposed to evaluate the quality of the explanations generated by FAE methods, including faithfulness [Samek et al. 2016], stability [Fel et al. 2022], complexity [Bhatt et al. 2021], monotonicity [Luss et al. 2021], and so on. However, so far, there is still no common agreement on which metric(s) is/are suitable to assess the explanations. Many papers point out that different stakeholders require different explanations [Rosenfeld and Richardson 2019; Mohseni et al. 2021; Gerlings et al. 2022]. For example, a study by Gerlings et al. [2022] found that different stakeholders require different tradeoffs between faithfulness and complexity of explanation. The more closely stakeholders work with the AI model, the more faithful explanation is required. Specifically, AI experts might prefer a more faithful, stable, and detailed (i.e., relatively complex) explanation to help them optimize and diagnose their models [Rosenfeld and Richardson 2019; Mohseni et al. 2021]. However, for non-AI experts, explanations that AI experts can understand may be too complex to comprehend for them due to the lack of relevant background and knowledge [Jiang and Senge 2021]. Similarly, Poursabzi-Sangdeh et al. [2021] found that providing overly detailed explanations to non-AI experts can instead cause information overload, making them difficult to understand. Therefore, regular users would prefer a shorter, clearer, and less complex explanation that can intuitively help them understand how that AI system is making decisions [Mohseni et al. 2021; Rosenfeld and Richardson 2019]. In addition, Fel et al. [2022] pointed out that faithfulness is only the first step toward a good explanation, and the sensitivity of the explanations needs to be considered under the premise of good faithfulness.

The above examples indicate that although different stakeholders all have needs for explanations, their focus and the explanations they want to obtain are different. Therefore, the FAE method needs to consider different aspects when generating explanations and should even be able to trade off between the various aspects to provide different explanations to different stakeholders to meet their needs. However, to the best of our knowledge, no work has considered multiple aspects simultaneously in the process of generating explanations. Specifically, gradient-based FAE methods use gradient information to attribute each feature without explicitly considering the performance of the explanation in various aspects such as faithfulness, sensitivity, and complexity; perturbation-based FAE methods locally minimize the distance between the explainable model g and the explained model f in the process of generating explanations, thus ensuring the faithfulness of the provided explanation. However, such methods consider only the faithfulness of the explanation in the process of generating explanations, ignoring other aspects of the explanation, such as sensitivity, complexity, and so on.

In summary, none of the existing methods considers multiple aspects of the explanation, especially ignoring the sensitivity and complexity of the explanation in the process of generating explanations. Thus, this motivates us to consider multiple aspects of explanation in the process of generating explanations, i.e., multiple metrics to evaluate the explanation quality. To achieve this, we formulate the problem as a multi-objective learning problem [Chandra and Yao 2006; Chen and Yao 2010; Minku and Yao 2013] and propose a framework to address it. The framework considers different explanation quality metrics of FAE methods as objectives to be optimized to obtain a set of explanations with different tradeoffs across the metrics, thus achieving the goal of providing different explanations to different stakeholders.

Our research can be broken down into three more specific questions:

Q1.

Do the FAE explainable model’s faithfulness, sensitivity, and complexity conflict with each other?

Q2.

Can our method simultaneously optimize these conflicting metrics and be competitive with other state-of-the-art FAE methods?

Q3.

Can our method find a set of explainable models (i.e., explanations) with different tradeoffs among the objectives (i.e., metrics), potentially providing different explanations for different people?

To answer the above three questions, this article proposes a multi-objective feature attribution explanation (MOFAE) framework that simultaneously considers multiple metrics used to evaluate explanation quality. The main contributions of this article are as follows:

(1)

We analyze the relationship among several metrics used to evaluate the explanation quality experimentally and show them to be in conflict with each other.

(2)

We define the problem of generating FAE explainable models as a multi-objective learning problem and propose a multi-objective feature attribution explanation (MOFAE) framework to solve the new problem based on existing optimization algorithms, which simultaneously considers several metrics for evaluating the explanation quality during the process of generating explanations. We also implement the framework with a specific instantiation that uses three metrics to evaluate the explanation results’ faithfulness, sensitivity, and complexity as optimization objectives. This is the first time that sensitivity and complexity are considered in an experimental study.

(3)

The comparison results with six state-of-the-art FAE methods on eight well-known benchmark datasets show that our MOFAE is highly competitive. The proposed MOFAE method can obtain explanations with higher faithfulness and lower sensitivity and complexity and can complement other FAE methods.

(4)

MOFAE has better diversity than other FAE methods, i.e., it can achieve different tradeoffs among different explanation quality metrics, and we have illustrated that our method has the potential to provide tailored explanations to different stakeholders by showing the explanation results.

The rest of this article is organized as follows: Section 2 describes the relevant FAE methods and their evaluation criteria. Section 3 defines the multi-objective explanation problem under consideration and presents our MOFAE framework and a practical instantiation of the framework. Section 4 answers the three questions presented above through various experimental studies, evaluates the computational efficiency of MOFAE, and performs robustness tests. Section 5 concludes this article and indicates some future research directions.

2 Related Work

As FAE methods are popular in XAI, researchers have put a lot of effort into designing different FAE techniques. This section presents some representative FAE methods, which are relevant to this work, and summarizes the evaluation criteria and metrics of FAE methods.

2.1 FAE Methods

The FAE method indicates how much each input feature contributes to the model’s output for a given data point [Bhatt et al. 2021; Wang et al. 2023]. Considering f is a black-box model that maps an input \({\bf {\it x}}=[x_1, \ldots , x_d]^T\in \mathbb {R}^d\) to an output \(f({\bf {\it x}})\). In classification, \(f({\bf {\it x}})\) is the probability that \({\bf {\it x}}\) belongs to the class that needs to be explained. Generally, FAE methods ultimately return a vector \(\mathbf {\omega }=[\omega _1, \ldots , \omega _d]^T \in \mathbb {R}^{d}\), where each value \(\omega _i\) represents the importance of the corresponding feature \(x_i\) to the model’s prediction \(f({\bf {\it x}})\). In recent years, different FAE methods have been proposed, which can be classified into gradient-based and perturbation-based according to the method of generating explanations [Molnar 2020]. The most representative gradient-based and perturbation-based FAE methods are described below.

2.1.1 Gradient-based FAE Methods.

This kind of FAE method typically uses the gradients computed on a data point to explain the model’s prediction [Agarwal et al. 2021]. However, it is worth noting that this is also a prerequisite and limitation of using the gradient-based FAE methods, which can only be used to explain neural networks and require the use of gradient information in the model being explained. This article considers the following four most representative gradient-based FAE methods:

Gradient Saliency (GD) [Simonyan et al. 2014] calculates the gradient (i.e., derivative) of \(f({\bf {\it x}})\) with respect to the input \({\bf {\it x}}\) to indicate the attribution of input features to the model output. The explanation of the GD method is defined as

\begin{equation} \mathbf {\omega }^{GD}({\bf {\it x}})=\bigg |\frac{\partial {\it f}({\bf {\it x}})}{\partial {\bf {\it x}}}\bigg |, \end{equation}

(1)

where \(\partial {\it f}({\bf {\it x}})\) represents the derivative of \(f({\bf {\it x}})\).

Integrated Gradients (IG) [Sundararajan et al. 2017] consists of averaging the gradient values over the path from the baseline state \(\overline{{\bf {\it x}}}\) to the current value \({\bf {\it x}}\). This method satisfies two major axioms, namely, sensitivity and implementation invariance. The choice of the baseline is defined by the user, and in this article, the mean over all the training data is used as the baseline. The explanation of the IG method is defined as

\begin{equation} \mathbf {\omega }^{IG}({\bf {\it x}})=({\bf {\it x}}-\overline{{\bf {\it x}}}) \int _{0}^{1} \frac{\partial {\it f}(\overline{{\bf {\it x}}}+\alpha ({\bf {\it x}}-\overline{{\bf {\it x}}}))}{\partial {\bf {\it x}}} d\alpha . \end{equation}

(2)

Grad*Input (GI) [Shrikumar et al. 2016] was originally proposed as a technique to improve the sharpness of attribution maps. Its explanation results are obtained by calculating the input multiplied by the partial derivative of the output with respect to the input. The explanation of the GI method is defined as

\begin{equation} \mathbf {\omega }^{GI}({\bf {\it x}})={\bf {\it x}}\cdot \frac{\partial {\it f}({\bf {\it x}})}{\partial {\bf {\it x}}}. \end{equation}

(3)

SmoothGrad (SG) [Smilkov et al. 2017] is explained by calculating the average gradient of several data points obtained by perturbing the data points being explained using Gaussian noise \(\mathcal {N}(0,\sigma ^2)\) with standard deviation \(\sigma\). By calculating the average of the gradients of several data points in a local range, noise can be effectively reduced to obtain a higher quality explanation. The explanation of the SG method is defined as

\begin{equation} \mathbf {\omega }^{SG}({\bf {\it x}})=\mathop {\mathbb {E}}\limits _{\varepsilon \sim \mathcal {N}(0,{\bf {\it I}}\sigma ^2)}\bigg [\frac{\partial {\it f}({\bf {\it x}}+\varepsilon)}{\partial {\bf {\it x}}} \bigg ]. \end{equation}

(4)

The most significant advantage of gradient-based methods is their relatively low computational cost. However, since they are model-specific explanation methods, this essentially limits their application range. Moreover, Adebayo et al. [2018] indicated that gradient-based FAE methods may produce explanations that are not faithful to the model being explained. In other words, the obtained explanations do not correctly reflect the degree of importance of each feature. And Ancona et al. [2019] pointed out that gradient-based FAE methods generate explanations that are often noisy, especially when used to explain neural networks with particularly complex structures.

2.1.2 Perturbation-based FAE Methods.

This type of FAE method constructs an interpretable model (e.g., a linear model) that can locally approximate the black box model by perturbing individual data points and interprets the individual predictions of the black box model through the interpretable model [Agarwal et al. 2021]. This article considers two popular perturbation-based FAE methods, LIME and SHAP, as described below.

Local Interpretable Model-agnostic Explanations (LIME) [Ribeiro et al. 2016] uses explainable models as local surrogate models that are trained to approximate the local decision logic of the black box model to explain individual predictions. First, LIME constructs a new dataset that consists of data points generated by perturbing the explained data point \({\bf {\it x}}\) and the corresponding predictions of the black box model. Then, LIME trains an explainable model to minimize the prediction differences between it and the explained model on this new dataset. Furthermore, the explainable model can be any model that is easily understood by people, such as a linear model or a decision tree. And the trained explainable model should be a good approximation of the black box model locally (around the data point \({\bf {\it x}}\)). Finally, since this local surrogate model is human-understandable, the decisions of the explained model at the data point \({\bf {\it x}}\) are explained through this local surrogate model.

Specifically, LIME first converts the explained data point \({\bf {\it x}}\in \mathbb {R}^d\) into a more understandable binary vector \({\bf {\it x}}^{\prime }\in \lbrace 0,1\rbrace ^{d^{\prime }}\). Then, a new dataset \({\it Z}\) consisting of perturbed samples \({\bf {\it z}}^{\prime }\in \lbrace 0,1\rbrace ^{d^{\prime }}\) is generated by uniformly perturbing the data points \({\bf {\it x}}^{\prime }\) at random. Afterwards, the perturbed samples \({\bf {\it z}}^{\prime }\) are converted back to their original representation \({\bf {\it z}}\in \mathbb {R}^d\), and their predictions \({\it f}({\bf {\it z}})\) on the explained model are calculated. In addition, the similarity between data points \({\bf {\it x}}\) and \({\bf {\it z}}\) is evaluated by \(\pi _{\bf {\it x}}({\bf {\it z}})\). Eventually, an explainable model \({\it g}\in {\it G}\) is obtained by training the weighted dataset Z and its label, where \({\it G}\) is a class of explainable models, such as linear models, decision trees, and so on. The explanation of the LIME method is defined as

\begin{equation} \mathbf {\omega }^{LIME}({\bf {\it x}})=\mathop {\rm argmin}\limits _{{\it g}\in G} L({\it f},{\it g},\pi _{\bf {\it x}})+\Omega ({\it g}), \end{equation}

(5)

where L is the chosen loss function that measures how unfaithful g is to the approximation of \({\it f}\) in the local neighborhood of x. We want to minimize this loss function to make g and \({\it f}\) as similar as possible in the local region. \(\Omega ({\it g})\) is used to measure the complexity of the explanation model g, and we also want it to be as small as possible. It is worth noting that \(\Omega ({\it g})\) is different from the complexity of the explanation described in the rest of the article, which describes the complexity of a specific explanation. In contrast, \(\Omega ({\it g})\) describes that explainable model’s complexity (or degree of comprehensibility). For example, when the selected explainable model is a decision tree, \(\Omega ({\it g})\) is expressed as the degree to which one can understand the decision logic of the decision tree model. That is, \(\Omega ({\it g})\) ensures that the explainable model g we have chosen is easy for people to understand (e.g., linear model, decision tree). In the original LIME paper [Ribeiro et al. 2016], the explainable model g was chosen as a linear model, and the authors used a squared loss function L with an \(\ell _2\) penalty, as in the following equation:

\begin{equation} L({\it f},{\it g},\pi _{\bf {\it x}})=\sum _{{\bf {\it z}},{\bf {\it z}}^{\prime }\in Z}^{}\pi _{\bf {\it x}}({\bf {\it z}}) \left({\it f}({\bf {\it z}})-{\it g}({\bf {\it z}}^{\prime }) \right) ^2. \end{equation}

(6)

Kernel SHapley Additive exPlanations (Kernel SHAP) [Lundberg and Lee 2017] is referred to as SHAP in this article, which is a LIME-based method with three desirable properties, i.e., local accuracy, missingness, and consistency. SHAP can obtain Shapley values more efficiently by appropriately setting up perturbation functions, similarity kernels, and explainable models in the LIME framework. It is worth mentioning that Shapley values are a method from coalitional game theory that calculates the average marginal contribution of a feature value over all possible coalitions [Molnar 2020].¹ The explanation of the SHAP method is defined as

\begin{equation} \mathbf {\omega }^{SHAP}({\bf {\it x}})=\mathop {\rm argmin}\limits _{{\it g}\in G} L({\it f},{\it g},\pi _{\bf {\it x}})+\Omega ({\it g}), \end{equation}

(7)

\begin{equation} \Omega ({\it g})=0, \end{equation}

(8)

\begin{equation} \pi _{\bf {\it x}}({\bf {\it z}}^{\prime })=\frac{(d^{\prime }-1)}{(d^{\prime }\,choose\,|{\bf {\it z}}^{\prime }|)|{\bf {\it z}}^{\prime }|(d^{\prime }-|{\bf {\it z}}^{\prime }|)}, \end{equation}

(9)

\begin{equation} L({\it f},{\it g},\pi _{\bf {\it x}})=\sum _{{\bf {\it z}},{\bf {\it z}}^{\prime }\in Z}^{}\pi _{\bf {\it x}}({\bf {\it z}}) (f({\bf {\it z}})-g({\bf {\it z}}^{\prime }))^2. \end{equation}

(10)

Same as LIME, \({\bf {\it x}}\in \mathbb {R}^d\) is the data point being explained, \({\bf {\it x}}^{\prime }\in \lbrace 0,1\rbrace ^{d^{\prime }}\) is a more understandable binary vector converted from \({\bf {\it x}}\), and g is an explainable linear model. However, unlike LIME, Z consisting of perturbed samples \({\bf {\it z}}^{\prime }\in \lbrace 0,1\rbrace ^{d^{\prime }}\) is defined in SHAP as the power set of all non-zero indices in \({\bf {\it x}}^{\prime }\), and Z has a size of \(2^{d^{\prime }}\) if all possible subsets are exhausted. \({\bf {\it z}}\in \mathbb {R}^d\) is the original representation of \({\bf {\it z}}^{\prime }\), and \(|{\bf {\it z}}^{\prime }|\) is the number of non-zero elements in \({\bf {\it z}}^{\prime }\).

The perturbation-based FAE method can be used to explain almost all types of ML models, since it does not require the use of internal information of the model. However, its computational cost is relatively high. For example, the computational complexity of KernelSHAP increases exponentially with the number of features, so it is difficult to use for large feature sets [Aas et al. 2021]. In addition, many works pointed out that the interpretation generated by the perturbation-based FAE methods may be unstable. That is, a slight perturbation in the data points may significantly change the interpretation results [Ghorbani et al. 2019; Slack et al. 2020].

2.2 Evaluation Criteria of FAE Methods

Since several FAE methods have been proposed in the literature, there is an eagerness to know how the appropriate FAE method should be evaluated and chosen in specific scenarios. Therefore, after obtaining the explanation results provided by the FAE method, some criteria and methods have been provided to evaluate the explanation quality of that result. In the literature [Doshi-Velez and Kim 2017], these evaluation criteria were divided into three categories: application-based, human-based, and function-based evaluations.

Application-based evaluation often considers whether the explanation helps humans perform the task better or compares the explanation results with expert experience within an actual application [Doshi-Velez and Kim 2017]. Human-grounded evaluation is about conducting simpler human-subject experiments that maintain the essence of the target application. The experiments in this kind of evaluation are conducted by lay people, which are more concerned with the comprehensibility and complexity of the explanation [Doshi-Velez and Kim 2017]. The above two evaluation methods evaluate the explanation quality through human beings.

However, Herman [2017] pointed out that we should be wary of using human evaluations of explainability to evaluate explanation methods, because human evaluations imply strong specific bias and can not objectively reflect the explanation quality. In contrast, function-based evaluation methods can quantitatively evaluate the explanation quality, including faithfulness [Alvarez Melis and Jaakkola 2018; Bhatt et al. 2021; Jacovi and Goldberg 2020; Samek et al. 2016], sensitivity [Bhatt et al. 2021; Fel et al. 2022], and complexity [Bhatt et al. 2021; Nguyen and Martínez 2020], and we will describe some of the representative metrics below. Before this, let us assume that \({\bf {\it x}}\in \mathbb {R}^d\) is a data point being explained with d dimensions, and f is a black box predictor that maps an input \({\bf {\it x}}\) to an output \({\it f}({\bf {\it x}})\). In classification, \(f({\bf {\it x}})\) is the probability that \({\bf {\it x}}\) belongs to the class that needs to be explained. \({\it g}\) is an explanation function that takes in a predictor \({\it f}\) and a data point \({\bf {\it x}}\) being explained and returns importance scores \({\it g}({\it f},{\bf {\it x}})\) for all features.

2.2.1 Faithfulness Metrics.

Faithfulness quantifies the extent to which the explanation methods are faithful to the results predicted by the black-box model being explained. Specifically, faithfulness in the FAE methods was defined in the literature [Yeh et al. 2019] as the feature importance score from interpretable model g should correspond to the important features of the explained data point \({\bf {\it x}}\) for the explained model f.

Area Over Perturbation Curve (AOPC) [Samek et al. 2016] is the first method used to measure faithfulness, which is evaluated by calculating the change in the predicted outcome after removing the pixels considered important by the explanation method. First, it defines an ordered set of positions for the image based on the importance score of each pixel in descending order, that is, \(O=({\bf {\it r}}_1,{\bf {\it r}}_2, \ldots ,{\bf {\it r}}_L)\), where each position \({\bf {\it r}}_i\) is a two-dimensional vector to represent the position of the pixel, and L is the number of pixels in the image. Then, the image is recursively perturbed according to the ordered sequence of locations O, and this process is called the most relevant first (MoRF). The recursive formula is

\begin{equation} \begin{split} {\bf {\it x}}^{(0)}_{MoRF}&={\bf {\it x}}, \\ \forall 1\le k\le L:{\bf {\it x}}^{(k)}_{MoRF}&=h\left({\bf {\it x}}^{(k-1)}_{MoRF},{\bf {\it r}}_k\right), \end{split} \end{equation}

(11)

where the function h removes information about the image \({\bf {\it x}}^{(k-1)}_{MoRF}\) at a specified location \({\bf {\it r}}_k\) (i.e., a single pixel or a local neighborhood) in the image.

Finally, the faithfulness of the interpretation is evaluated by calculating the area over the MoRF perturbation curve (AOPC):

\begin{equation} AOPC=\frac{1}{L+1}\left\langle \sum _{k=0}^{L}f({\bf {\it x}}^{(0)}_{MoRF}) -f({\bf {\it x}}^{(k)}_{MoRF}) \right\rangle _{p({\bf {\it x}})}, \end{equation}

(12)

where \(\left\langle \cdot \right\rangle _{p({\bf {\it x}})}\) denotes the average over all the images being explained in the dataset, and f is the black-box model being explained. Since the pixels have been sorted according to the importance score of the explanation, the more important pixels will be removed first. Therefore, if the interpretation results are more faithful, the \(f({\bf {\it x}}^{(k)}_{MoRF})\) falls faster, and therefore \(f({\bf {\it x}}^{(0)}_{MoRF}) -f({\bf {\it x}}^{(k)}_{MoRF})\) rises faster, then the AOPC is also larger.

Faithfulness \(\mu _F\) [Bhatt et al. 2021] iteratively replaces a random subset of the feature set with a baseline value and then evaluates the correlation between the sum of the importance scores of the feature subsets and the difference in the model output.

\begin{equation} \mu _F({\it f},{\it g};{\bf {\it x}})=\underset{S\in \binom{[d]}{|S|}}{correlate}\left(\sum _{i\in S}^{}{\it g}({\it f},{\bf {\it x}})_i,{\it f}({\bf {\it x}})-{\it f}({\bf {\it x}}_{[{\bf {\it x}}_S=\bar{{\bf {\it x}}}_S]}) \right), \end{equation}

(13)

where \({\bf {\it x}}={\bf {\it x}}_S\cup {\bf {\it x}}_C\), and for a subset of indices \(S\subseteq \lbrace 1,2,\dots ,d\rbrace , {\bf {\it x}}_S=\lbrace x_i,i\in S\rbrace\) denotes a sub-vector of input features that partitions the input. \(\bar{{\bf {\it x}}}_S\) is the baseline value of \({\bf {\it x}}_S\), and \({\bf {\it x}}_{[{\bf {\it x}}_S=\bar{{\bf {\it x}}}_S]}\) denotes for a data point \({\bf {\it x}}\) where \({\bf {\it x}}_S\) is set to a reference baseline while \({\bf {\it x}}_C\) remains unchanged. And the parameter \(|S|\) is the subset size.

2.2.2 Sensitivity Metrics.

Sensitivity measures the stability of the interpretation when the input is slightly perturbed.

Max Sensitivity \(\mu _M\) [Bhatt et al. 2021] uses a Monte Carlo sampling-based approximation to measure the maximum sensitivity of an explanation.

\begin{equation} \mu _M({\it f},{\it g},r;{\bf {\it x}})=\max _{{\bf {\it z}}\in \mathcal {N}_r} D({\it g}({\it f},{\bf {\it x}}),{\it g}({\it f},{\bf {\it z}})), \end{equation}

(14)

where D denotes a distance metric over explanations, \(\mathcal {N}_r=\lbrace {\bf {\it z}}\in {\Phi }_{train}|\rho ({\bf {\it x}},{\bf {\it z}})\le r,{\it f}({\bf {\it x}})={\it f}({\bf {\it z}})\rbrace\), \({\Phi }_{train}\) is the training dataset, \(\rho\) denotes a distance metric over the inputs, and the parameter r determines the size of the neighborhood.

Average Sensitivity \(\mu _A\) [Bhatt et al. 2021] uses a Monte Carlo sampling-based approximation to measure the average sensitivity of an explanation.

\begin{equation} \mu _A({\it f},{\it g},r;{\bf {\it x}})=\frac{1}{|\mathcal {N}_r|}\sum _{{\bf {\it z}}\in \mathcal {N}_r}^{}\frac{D({\it g}({\it f},{\bf {\it x}}),{\it g}({\it f},{\bf {\it z}}))}{\rho ({\bf {\it x}},{\bf {\it z}})}, \end{equation}

(15)

where the definitions of D, \(\mathcal {N}_r\), \({\Phi }_{train}\), \(\rho\), and r are consistent with those in the max sensitivity \(\mu _M\).

2.2.3 Complexity Metrics.

Complexity is often used to evaluate the simplicity of the explanation, which for FAE methods means the ability to explain the model’s predictions with fewer features.

Effective Complexity (EC) [Nguyen and Martínez 2020] first defines an ordered set (in ascending order) based on the importance score of each feature, that is, \(A=\lbrace a_1,a_2,\dots ,a_d\rbrace\) with d dimensions, where \(a_i\) is the importance score obtained by feature \(x_i\). Let \(M_k=\lbrace x_{d-k+1},\dots ,x_d\rbrace\) be the set of the most important k features. Given a chosen tolerance \(\epsilon \gt 0\), the effective complexity is defined as

\begin{equation} EC=\mathop {\rm argmin}_{k\in \lbrace 1,\dots ,d\rbrace }|M_k|\,s.t.\,\mathbb {E } (l(f({\bf {\it x}}),f({\bf {\it x}}^{\prime }_{[{\bf {\it x}}^{\prime }_{M_k}={\bf {\it x}}_{M_k}]})))\lt \epsilon , \end{equation}

(16)

where \({\bf {\it x}}^{\prime }_{[{\bf {\it x}}^{\prime }_{M_k}={\bf {\it x}}_{M_k}]}\) means that for a data point \({\bf {\it x}}^{\prime }\), the features in \(M_k\) are fixed to the corresponding feature values in the data point \({\bf {\it x}}\) being explained, and the rest of the feature values are derived from other data points in the dataset, and l is a performance measure of interest (e.g., cross-entropy).

Nguyen and Martínez [2020] stated that a low EC metric means that we can ignore some of the features even though they do have an effect because the effect is actually small. It also means that we can use fewer features to explain to a great extent how the model is making decisions at that data point. Therefore, explanations with low EC metrics are simple and easy to understand.

Complexity \(\mu _C\) [Bhatt et al. 2021] considers that a complex explanation means that all d features are used to explain which features of \({\bf {\it x}}\) are important to f. It first defines a fractional contribution distribution:

\begin{equation} \mathbb {P}_{{\it g}}(i)=\frac{|{\it g}({\it f},{\bf {\it x}})_i|}{\sum _{j\in [d]}|{\it g}({\it f},{\bf {\it x}})_j|}, \end{equation}

(17)

where \(|\cdot |\) denotes absolute value. Then, let \(\mathbb {P}_{{\it g}}(i)\) as the fractional contribution of feature \({\bf {\it x}}_i\) to the total magnitude of the attribution and defines complexity as the entropy of \(\mathbb {P}_{{\it g}}\):

\begin{equation} \begin{split} \mu _C({\it f},{\it g};{\bf {\it x}})=-\sum _{i=1}^{d}\mathbb {P}_{{\it g}}(i){\rm ln}(\mathbb {P}_{{\it g}}(i)). \end{split} \end{equation}

(18)

If each feature has the same importance score, then the explanation will be complex, and the \(\mu _C\) metric will be at its maximum value. In contrast, the simplest explanation is that the importance scores are concentrated on one feature, and then the \(\mu _C\) metric will be at the minimum value, i.e., 0 [Bhatt et al. 2021].

Although various metrics have been proposed to evaluate the explanation quality, no single metric could capture different aspects of the explanations. Furthermore, different conclusions were obtained when the various FAE methods were evaluated using different metrics. For example, Bhatt et al. [2021] compared multiple FAE methods and found that SHAP [Lundberg and Lee 2017] has the best faithfulness for smaller datasets, Fel et al. [2022] found the RISE [Petsiuk et al. 2018] to be more consistent and representative by comparison, and Nguyen and Martínez [2020] compared several gradient-based FAE methods and found that IG [Sundararajan et al. 2017] generally provided the least complex explanation. Therefore, it is necessary to consider different metrics when generating explanations. However, none of the existing FAE methods considers multiple metrics in the process of generating explanations. Even after obtaining the explanation results, researchers tend to evaluate the quality of the explanation only by qualitative means, often ignoring the quantitative approach, which may lead to one-sided and biased evaluation results.

Furthermore, Liao and Varshney [2021] pointed out that the users of XAI are not a uniform group, and their needs for explanations can vary greatly depending on their goals, backgrounds, and usage contexts. For example, an expert may need to spend a significant amount of time scrutinizing the inner workings of the decision process and therefore needs to provide a relatively more detailed and complex explanation. Whereas for the regular user, whose goal is to build trust and understanding of the decision, a shorter, very straightforward explanation may be more beneficial [Rosenfeld and Richardson 2019]. In this article, we consider multiple explanation quality metrics simultaneously in the optimization process and can obtain a set of explanatory sets that can be traded off between different metrics, potentially providing different explanations for different people, which is significantly different from existing FAE methods.

3 Multi-objective Evolutionary Learning to Find Better Feature Attribution Explanation

In this section, we first define the multi-objective learning problem under consideration and then describe the MOFAE framework used to solve the problem, which considers multiple explanation quality metrics simultaneously and uses a multi-objective evolutionary learning approach [Zhang et al. 2021a; Chandra and Yao 2006] to train FAE models. Then, an instantiation of our framework is provided.

3.1 Formal Definition of the Multi-objective Explanation Problem

As described in Section 1, different stakeholders have different needs for different aspects of the explanations. Therefore, the FAE method needs to be able to consider multiple aspects of the explanations when generating explanations, which can be considered as a multi-objective learning problem. In this section, we formally define the multi-objective explanation problem under consideration. For a black-box model f trained on the dataset \({\Phi }_{train}\), a data point of interest \({\bf {\it x}}\), i.e., the prediction on which data point to be explained (in this article, randomly selected from the test set \({\Phi }_{test}\)), and a feature attribution-based explainable model g, the multi-objective explanation problem can be defined as:

\begin{equation} \mathop {\rm Maximize}_{g(f, {\bf {\it x}})}\quad \mathbf {\mu }(g;f,{\bf {\it x}})=[\mu _{1}(g;f,{\bf {\it x}}),\ldots , \mu _{k}(g;f,{\bf {\it x}})], \end{equation}

(19)

where \(g(f,{\bf {\it x}}) \in \mathbb {R}^d\) is the feature importance vector optimized to maximize the set of chosen explanation quality metrics \(\mathbf {\mu }(\cdot)\), while \(\mu _{i}(\cdot)\) for \(i=1,\ldots , k\) is one of the explanation quality metrics, which can be the metrics described in Section 2.2.

3.2 MOFAE Framework

In our MOFAE framework, we optimize multiple quality metrics during training the explainable model using an evolutionary multi-objective optimization approach [Chandra and Yao 2006], and each explainable model is encoded as a real-valued vector. The general process of the MOFAE framework is presented in Algorithm 1. The MOFAE framework provides a set of explainable models (i.e., explanations) with different tradeoffs among different quality metrics for a data point.

Given a black-box model f that was trained on the dataset \({\Phi }_{train}\), a data point \({\bf {\it x}}\) from the test set \({\Phi }_{test}\) on which the prediction \(f(\mathbf {x})\) requires to be explained, a feature attribution-based explainable model g, and a set of quality metrics \(\mathbf {\mu } = \lbrace \mu _1, \ldots , \mu _k \rbrace\) that are used to evaluate the explanations, MOFAE generates a set of explanations by training the model g through the following process: First, an initial population of explainable models (i.e., a set of explainable model g with different model parameters or feature importance vectors) \(G=\lbrace g_1, \ldots , g_{\tau }\rbrace\) are randomly generated (line 1). Then, the quality metrics’ values \(\mathbf {\mu }_i\) of each model \(g_i \in G\) are evaluated based on \({\Phi }_{train}\) and \({\mathbf {x}}\) (lines 2–4). After this, the population enters the evolutionary loop (lines 5–11). In each iteration (also known as each generation), a new population (offspring) of explainable models \(G^{\prime }=\lbrace g_{1}^{\prime }, \ldots , g_{\tau }^{\prime }\rbrace\) are generated based on G through reproduction strategies \(\xi\). Specifically, the reproduction strategies are to generate new explainable models as offspring by modifying the parameters of the parent explainable models, where crossover and mutation are two commonly used operators (line 6). Subsequently, the quality metrics’ values \(\mathbf {\mu }_i^{\prime }\) of each new model \(g_{i}^{\prime } \in G^{\prime }\) are evaluated based on \({\Phi }_{train}\) and \({\mathbf {x}}\) (lines 7–9). After evaluating the new population \(G^{\prime }\), a portion of \(\tau\) promising models are selected from \(G \cup G^{\prime }\) based on \(\mathbf {\mu }_1,\dots ,\mathbf {\mu }_{\tau }\) and \(\mathbf {\mu }_1^{\prime },\dots ,\mathbf {\mu }_{\tau }^{\prime }\) through an environment selection strategy \(\pi\), and then update \(G_1,\dots ,G_\tau\) and \(\mathbf {\mu }_1,\dots ,\mathbf {\mu }_{\tau }\) (line 10). The above steps are repeated until the iteration termination conditions, such as the maximal iteration or generation, have been reached. Finally, a set of optimized explainable models \(G^{*}\) is returned, that is, a set of non-dominant models that achieve different tradeoffs among different quality metrics in \(\mathbf {\mu }\). (line 12).

The key to our framework is the generation of ideal offspring explainable models and environment selection based on multiple metrics for evaluating explanation quality. Multi-objective evolutionary algorithms (MOEAs) have many desirable reproduction and environment selection strategies that are able to trade off between different metrics to generate a set of solutions with good convergence and diversity so they can provide ideal strategies for our framework [Li et al. 2015; Tian et al. 2021]. The output explainable models can be further selected according to the decision maker’s preferences.

3.3 An Instantiation of Our MOFAE Framework

In practical applications, the multi-objective optimization algorithm and the metrics used to evaluate the explanation quality employed in our framework can be decided based on problem requirements and preferences. Here, we provide an instantiation of our framework, and the experimental evaluation of this instantiation will be conducted in Section 4.

3.3.1 Metrics for Evaluating the Explanation Quality.

Faithfulness, sensitivity, and complexity are three different evaluation criteria used to evaluate FAE methods [Bhatt et al. 2021]. The three metrics proposed in the literature [Bhatt et al. 2021] will be used as evaluation criteria in this article, that is, Equations (13), (15), and (18) above. Among them, \({\it g}\) is an explanation function that takes in a predictor \({\it f}\) and a data point \({\bf {\it x}}\) being explained and returns importance scores \({\it g}({\it f},{\bf {\it x}})=\mathbf {\omega }^{MOFAE}_{\bf {\it x}}\in \mathbb {R}^d\) for all features, which are variables to be optimized in our MOFAE method. In Equation (13), the parameter \(|S|\) is the subset size, and \(\bar{{\bf {\it x}}}_S\) is the baseline value of \({\bf {\it x}}_S\), which was set as the mean value of feature S in the training dataset in our experiments. Moreover, the correlation is calculated using Pearson’s correlation coefficient, referring to the literature [Bhatt et al. 2021].

Overall, \(\mu _F\) removes certain features considered significant by the explanation method and observes their impact on the prediction results to evaluate whether the explanation results are faithful to the model being explained; \(\mu _A\) evaluates the stability of the explanation when the input is slightly perturbed; \(\mu _C\) introduces the concept of entropy to evaluate the complexity of the explanation results where the explanation is complex if every feature has the same attribution, while the simplest explanation would be concentrated on one feature [Bhatt et al. 2021]. In the actual optimization process of this instantiation, since the sensitivity and complexity metrics are the smaller the better and the faithfulness metric is the larger the better, therefore, we express the faithfulness metric as its opposite number to unify as a minimization problem for each objective and use the three metrics mentioned above as optimization objectives.

3.3.2 Explained Models and Data Points and Explainable Models.

In our framework, the model being explained can be any black-box machine learning model (that is, like SHAP and LIME, MOFAE also belongs to the model-agnostic explanation method). In our instantiation, we construct an artificial neural network (ANN) on each dataset as the model being explained so MOFAE can compare with several model-specific XAI methods (i.e., GD, IG, GI, and SG methods). The data points being explained are often the ones of particular interest [Clement et al. 2023]. In this instantiation, we randomly selected a number of data points from the test set \({\Phi }_{test}\) as the data points being explained. Furthermore, the FAE methods evaluate the degree of contribution of each input feature to the model output by assigning an importance score to the data point being explained [Bhatt et al. 2021], and thus, we encode each explainable model as a real-valued vector. Specifically, for a given explained data point, each trained explainable model eventually provides a vector of real values with the same dimension as that data point (i.e., the explanation result of that data point). The real value on each vector dimension represents the feature’s importance score on the corresponding dimension of that explained data point [Ribeiro et al. 2016; Lundberg and Lee 2017].

3.3.3 Multi-objective Optimizer and Optimized Variables.

The non-dominated sorting genetic algorithm-III (NSGA-III) [Deb and Jain 2013] was used in our instantiation. In other words, the environment selection strategy and the recombination strategies of the NSGA-III algorithm were used as the corresponding strategies for our instantiation. This algorithm was chosen for two main reasons: (1) NSGA-III is a widely recognized and used algorithm; (2) Although there are only three objectives in this instantiation, there are actually many other metrics that can be used to evaluate the explanation quality, so NSGA-III was chosen owing to that it can be easily extended to more objectives. Obviously, other MOEAs [Li et al. 2015] can also be used. Each individual in the population represents an explainable model, encoded as a real-valued vector \(\mathbf {\omega }^{MOFAE}_{\bf {\it x}}\) with the same dimensionality as the data point \({\bf {\it x}}\) being explained. Moreover, each real value on that vector takes values in the range \([-1, 1]\) and represents the importance score of the corresponding feature of the data point \({\bf {\it x}}\) being explained. And during population initialization, each individual in the population is randomly initialized to a vector that takes values between [\(-\)1, 1] and has the same dimension as the data point \({\bf {\it x}}\).

4 Experimental Studies

To answer the questions we proposed in Section 1, we performed several experiments on eight benchmark datasets. On each dataset, the instantiation of the MOFAE framework described in Section 3.3 was used to optimize the three metrics described above simultaneously when training the explainable models. We randomly divided each dataset into two parts with a ratio of 4:1 as the training and test sets. Except for two datasets with fewer data, where 20% of the data points (i.e., 35 data points for the Wine dataset and 42 data points for the Glass dataset) were randomly selected from the test set to explain, 50 data points were randomly selected from the test set to explain on all datasets. In each case, we conducted 30 independent runs.

4.1 Experimental Setup

Our experimental study used the following eight widely used tabular classification datasets: Adult, Iris, German, Wine, Mushroom, Car, Bank, and Glass [Dua and Graff 2017]. We used the same preprocessing approach for all the above datasets, that is, label encoding of categorical attributes and scaling all attributes to the 0–1 range using min-max normalization. In the experiments, we compared MOFAE with the following six state-of-the-art FAE methods: GD [Simonyan et al. 2014], IG [Sundararajan et al. 2017], GI [Shrikumar et al. 2016], SG [Smilkov et al. 2017], LIME [Ribeiro et al. 2016], and SHAP [Lundberg and Lee 2017]. Following the literature [Bhatt et al. 2021], for all datasets, we trained a two-layer ANN as the model being explained on each dataset with leaky-ReLU activation using the Adam optimizer, and all the inputs were fully connected to a hidden layer with 40, 10, 9, 5, 10, 15, 40, and 5 nodes, respectively (the number of hidden layer nodes is set according to the size of the data, our experience, and suggestions in the literature [Bhatt et al. 2021]), and have 84.5%, 98%, 77%, 100%, 99.8%, 94.5%, 90.4%, and 72.1% accuracy on the test set, respectively. The \(\eta _c\) and \(\eta _m\) of NSGA-III were set to 20, the population size was set to 91, the crossover probability and mutation probability were set to 1.0 and \(\frac{1}{n}\), respectively, and the termination condition was set to the maximum of 1,000 generations, as recommended in the literature [Deb and Jain 2013; Zhang and Li 2007]. In Equation (13), the parameter \(|S|\) was fixed at 2 in our experiments (consistent with the literature [Bhatt et al. 2021]), and we considered all \(\binom{[d]}{|S|}\) subsets in our calculation of faithfulness. In Equation (15), we set D to be the Euclidean distance and \(\rho\) to be the Chebyshev distance, and the parameter r of the eight datasets was set to 2.0, 0.2, 2.0, 2.0, 2.0, 1.0, 2.0, and 1.0, respectively, as recommended in the literature [Bhatt et al. 2021].

Evaluating the quality of the solution set obtained by multi-objective optimization includes two aspects: convergence and diversity. And the diversity of the solution set is further divided into spread and uniformity [Li and Yao 2019]. Among them, the spread of the solution set considers the size of the area covered by the set, and the uniformity of the solution set refers to how evenly the solutions are distributed in the set [Li and Yao 2019]. In this article, we focus on evaluating the convergence and the spread in the diversity of the solution set. Therefore, we used three popular indicators—hypervolume (HV) [Zitzler and Thiele 1998], maximum spread (MS) [Zitzler et al. 2000], and pure diversity (PD) [Solow et al. 1993]—to evaluate the quality of the obtained solution sets. Among them, only the HV indicator is Pareto-compliant [Li and Yao 2019]. The HV indicator calculates the volume of the union of the hypercubes determined by each of its solutions and the reference point, which is a widely used indicator to evaluate the overall performance of the solution sets in terms of convergence and diversity. The MS and PD indicators emphasize the diversity of the solution sets, especially on the spread of diversity, where the MS indicator measures the scope of a solution set by considering the maximum extension of each objective, while the PD metric sums up how dissimilar each solution is to the rest of the solutions in the solution set [Li and Yao 2019]. In this work, since we did not know the true Pareto front, for each data point being explained, we collected all non-dominated solutions from all solutions obtained from 30 independent runs as a pseudo-Pareto front. Furthermore, we normalized all solutions using 1.1 times the nadir point \(z^{nadir}\) and then used \((1.0, 1.0, \ldots , 1.0)^T\) as the reference point \(z^*\) to calculate the HV indicator value, as recommended in the literature [Li et al. 2016]. HV was calculated using Equation (9) of Li and Yao [2019], MS was calculated using Equation (19) of Zitzler et al. [2000], and PD was calculated using Equation (5) Solow et al. [1993].

4.2 Experimental Results and Discussions

In this section, the experimental results are used to answer the research questions proposed above one-by-one and to discuss our approach’s computational efficiency and robustness.

4.2.1 Q1. Do the Explainable Model’s Faithfulness, Sensitivity, and Complexity Conflict with Each Other?.

To answer this question, for a data point randomly selected from the data points set being explained in each dataset, we optimized every two objectives (i.e., two metrics selected among faithfulness, sensitivity, and complexity) using NSGA-III and obtained their Pareto front approximations. It is worth noting that our experimental setup for optimizing every two objectives using NSGA-III was consistent with the experimental setup described above for optimizing all three objectives, except that two objectives were used instead of three. Some of the results are shown in Figure 1, where the results on sub-figures in the left column show that there is a conflict between faithfulness and sensitivity. The sub-figures in the middle column show that faithfulness and complexity conflict with each other. The right column indicates that sensitivity and complexity are in conflict. Thus, the results in Figure 1 indicate that the faithfulness, sensitivity, and complexity of the explanation conflict with each other. Due to space limitations, we only show results for the Adult, German, and Mushroom datasets, but all datasets yield the same conclusions as the three datasets above.

Fig. 1.

In addition, there are relevant papers that point out conflicts between metrics for evaluating explainable models [Mohseni et al. 2021; Gilpin et al. 2018; Fleisher 2021]. For example, Fleisher [2021] pointed out that for a complex system, there is a direct conflict between the goal of representing that complex system accurately (high faithfulness) and the goal of representing it simply enough to understand (low complexity). The conflict between these quality metrics makes it impossible to provide a good overall explanation when considering just a single metric.

4.2.2 Q2. Can Our Method Simultaneously Optimize These Conflicting Metrics and Be Competitive with Other State-of-the-art FAE Methods?.

To answer this question, we show the convergence curves of the HV indicators of the MOFAE method and compare the MOFAE with the state-of-the-art FAE methods, which can be specifically divided into the following three aspects: (1) showing the convergence curves of the HV values of MOFAE for all the data points being explained in eight datasets; (2) comparing the performance of each FAE method on the above three metrics through 3D plots; (3) comparing the dominance relationship between MOFAE and the other FAE methods on the above three metrics.

(1) Convergence curves of HV values from MOFAE.

We show the convergence curves of HV values of all the data points being explained. For each data point being explained, we performed 30 independent runs and calculated the average trend of HV value from 30 runs, and the final results are shown in Figure 2. Figure 2 shows that the HV values increase with the generations of the evolutionary optimization process, which indicates that the three metrics are improved during the optimization process. In other words, better overall explanations can be generated as the optimization progress. Therefore, it can be concluded that MOFAE can simultaneously optimize multiple conflicting metrics used to evaluate explanation quality.

Fig. 2.

(2) Comparing the performance of investigated FAE methods through visualization.

We compared the faithfulness, sensitivity, and complexity metrics of the explanation results obtained by each FAE method for one data point randomly selected from the set of data points being explained on eight datasets using 3D plots. Among them, for the MOFAE method, the performance results of 91 solutions on each metric obtained from one independent run randomly selected are shown. And, since the other FAE methods only produced one explanation at a time, 91 independent runs were performed. And the results are shown in Figure 3, where both red and yellow points represent the solutions obtained by MOFAE. The yellow point means that the solution represented by that point can dominate all solutions obtained by all other FAE methods (i.e., that solution has better faithfulness, sensitivity, and complexity than all solutions obtained by all other FAE methods run 91 times each independently). As we can see in Figure 3, overall, the set of solutions obtained by MOFAE has better faithfulness, sensitivity, and complexity, as specified in the figure by being closer to the point (\(-\)1, 0, 0) and having some of the solutions that can dominate all solutions obtained by all other FAE methods (except for the Iris and German datasets). This indicates that the proposed MOFAE has the potential to generate explanations (solutions) that are better than all the six compared methods.

Fig. 3.

Here, it is worth mentioning that obtaining a solution that can dominate all the six FAE methods simultaneously on three metrics is not easy, since each FAE method may have its own advantage in one or more aspects. For instance, on the Iris dataset, the SG method usually generates explanations with very good sensitivity but poor faithfulness and complexity, which makes it difficult for MOFAE to dominate these solutions from SG even if MOFAE can dominate the other five methods.

(3) Comparing the dominance relationship between MOFAE and the other FAE methods.

We first compared the dominance relationship between the solution sets obtained by MOFAE and other FAE methods. Specifically, for each data point being explained, the dominance relationship between the 91 solutions obtained by MOFAE and the 91 solutions obtained by running each other FAE method 91 times independently were compared one-by-one. To ensure the accuracy of the results, MOFAE was run 30 times independently. Finally, Table 1 shows the comparison results on all the data points being explained. Here is an example to illustrate the calculation process for one of the results in Table 1. There are 50 explained data points in the Adult dataset, and for each explained data point, LIME is run 91 times independently to generate 91 explained results. Then, MOFAE is run 30 times independently, and 91 explanation results generated in each independent run of the MOFAE and 91 explanation results generated by the LIME are compared one-by-one for their dominance relationship. That is, \(12,421,500\) \(\,(50\times 30\times 91\times 91)\) comparisons are made between MOFAE and LIME on the Adult dataset, of which \(9,415,497\) comparisons are made for MOFAE dominating LIME, so the percentage of MOFAE dominating LIME on the Adult dataset is shown to be 75.8\(\%\) (\(9,415,497\)/\(12,421,500\)).

Table 1.

Dataset	Relationship	GD	IG	GI	SG	SHAP	LIME
Adult	Dominate	73.38\(\%\)	9.43 \(\%\)	61.39\(\%\)	81.52\(\%\)	52.50\(\%\)	75.80\(\%\)
	Non-dominate	26.62\(\%\)	90.57\(\%\)	38.61\(\%\)	18.48\(\%\)	47.50\(\%\)	24.20\(\%\)
	Be dominated	0 \(\%\)	0 \(\%\)	0 \(\%\)	0 \(\%\)	0 \(\%\)	0 \(\%\)
Iris	Dominate	65.81\(\%\)	3.86 \(\%\)	58.20\(\%\)	72.83\(\%\)	46.75\(\%\)	50.55\(\%\)
	Non-dominate	34.19\(\%\)	96.12\(\%\)	41.80\(\%\)	27.17\(\%\)	53.25\(\%\)	49.44\(\%\)
	Be dominated	0 \(\%\)	0.02 \(\%\)	0 \(\%\)	0 \(\%\)	0 \(\%\)	0.01\(\%\)
German	Dominate	59.81\(\%\)	0.03 \(\%\)	35.42\(\%\)	65.00\(\%\)	0.02 \(\%\)	22.34\(\%\)
	Non-dominate	40.19\(\%\)	99.97\(\%\)	64.58\(\%\)	35.00\(\%\)	99.98\(\%\)	77.66\(\%\)
	Be dominated	0 \(\%\)	0 \(\%\)	0 \(\%\)	0 \(\%\)	0 \(\%\)	0 \(\%\)
Mushroom	Dominate	81.83\(\%\)	73.45\(\%\)	78.68\(\%\)	93.92\(\%\)	54.05\(\%\)	87.63\(\%\)
	Non-dominate	18.17\(\%\)	26.55\(\%\)	21.32\(\%\)	6.08 \(\%\)	45.95\(\%\)	12.37\(\%\)
	Be dominated	0 \(\%\)	0 \(\%\)	0 \(\%\)	0 \(\%\)	0 \(\%\)	0 \(\%\)
Wine	Dominate	30.68\(\%\)	15.51\(\%\)	25.52\(\%\)	80.02\(\%\)	25.72\(\%\)	40.79\(\%\)
	Non-dominate	69.32\(\%\)	84.49\(\%\)	74.46\(\%\)	19.98\(\%\)	74.28\(\%\)	59.21\(\%\)
	Be dominated	0 \(\%\)	0 \(\%\)	0.02 \(\%\)	0 \(\%\)	0 \(\%\)	0 \(\%\)
Car	Dominate	28.09\(\%\)	20.84\(\%\)	22.64\(\%\)	79.34\(\%\)	25.46\(\%\)	31.66\(\%\)
	Non-dominate	71.64\(\%\)	79.10\(\%\)	77.29\(\%\)	20.65\(\%\)	74.42\(\%\)	67.88\(\%\)
	Be dominated	0.27 \(\%\)	0.06 \(\%\)	0.07 \(\%\)	0.01 \(\%\)	0.12 \(\%\)	0.46 \(\%\)
Bank	Dominate	76.27\(\%\)	11.44\(\%\)	52.45\(\%\)	83.15\(\%\)	34.13\(\%\)	58.59\(\%\)
	Non-dominate	23.73\(\%\)	88.56\(\%\)	47.55\(\%\)	16.85\(\%\)	65.87\(\%\)	41.41\(\%\)
	Be dominated	0 \(\%\)	0 \(\%\)	0 \(\%\)	0 \(\%\)	0 \(\%\)	0 \(\%\)
Glass	Dominate	75.37\(\%\)	18.24\(\%\)	58.90\(\%\)	85.81\(\%\)	31.10\(\%\)	62.48\(\%\)
	Non-dominate	24.60\(\%\)	81.61\(\%\)	41.04\(\%\)	14.18\(\%\)	68.78\(\%\)	37.39\(\%\)
	Be dominated	0.03 \(\%\)	0.15 \(\%\)	0.06 \(\%\)	0.01 \(\%\)	0.12 \(\%\)	0.13 \(\%\)
Mean	Dominate	61.41\(\%\)	19.10\(\%\)	49.15\(\%\)	80.20\(\%\)	33.72\(\%\)	53.73\(\%\)
	Non-dominate	38.56\(\%\)	80.87\(\%\)	50.83\(\%\)	19.80\(\%\)	66.25\(\%\)	46.20\(\%\)
	Be dominated	0.03 \(\%\)	0.03 \(\%\)	0.02 \(\%\)	0 \(\%\)	0.03 \(\%\)	0.07 \(\%\)

Table 1. Dominate Relationship between MOFAE and Other FAE Methods

“Dominate” means that the percentage of solutions from MOFAE dominates that from other methods, “Non-dominate” means that the percentage of solution from MOFAE and other FAE methods do not dominate each other, and “Be dominated” means that the percentage of solution from MOFAE is dominated by solution from other FAE methods. And the average results of the eight datasets are counted at the bottom.

The following three conclusions can be drawn from Table 1: (1) any solution obtained by MOFAE is hardly dominated by any other FAE methods, implying that any solution obtained by MOFAE is at least no worse than any solution obtained by any other FAE method; (2) many solutions obtained by MOFAE can dominate those of other FAE methods, which means that many solutions obtained by MOFAE have higher faithfulness and lower sensitivity and complexity compared to other FAE methods; (3) many solutions in MOFAE are non-dominated with other FAE methods, which indicates that MOFAE obtains some explanation results that are difficult to obtain by other FAE methods, filling the gaps of other FAE methods and providing more diverse explanations that can be used as a supplement to other FAE methods.

In addition, we further counted how many solutions obtained by MOFAE dominate all the solutions generated by all the other FAE methods run 91 times independently. Likewise, for each data point being explained, MOFAE was run 30 times independently, and the final statistics results of all the explained data points are shown in Table 2. From Table 2, we can see that (1) many solutions from MOFAE can dominate all solutions of all other FAE methods on the Mushroom and Glass datasets, and (2) besides the Iris and German datasets, MOFAE can find some solutions that dominate all other FAE methods. Specifically for our instantiation, on average, MOFAE can find at least one solution that can dominate all other FAE methods in each independent run for each explained data point in each dataset, except for the Iris and German datasets.

Table 2.

Dataset	Adult	Iris	German	Mushroom	Wine	Car	Bank	Glass
Percentage	3.664\(\%\)	0.527\(\%\)	0.003\(\%\)	27.604\(\%\)	2.648\(\%\)	2.768\(\%\)	6.147\(\%\)	16.395\(\%\)

Table 2. The Percentage of Solutions Obtained by MOFAE that Can Dominate the Solutions Produced by All Other FAE Methods Run 91 Times Independently

4.2.3 Q3. Can Our Method Find a Set of Explainable Models (i.e., Explanations) with Different Tradeoffs among the Objectives (i.e., Metrics), Potentially Providing Different Explanations for Different People?.

Q3 is answered from three perspectives: (1) compare the performance of MOFAE with other FAE methods on MS and PD indicators, (2) show all the solutions obtained by MOFAE in 3D graphs where each dimension indicates a quality metric, and (3) show the explanation results on three extreme points and one “middle” point.

(1) Comparing MOFAE with other FAE methods based on MS and PD indicators.

First, we compared MOFAE with SG [Smilkov et al. 2017], SHAP [Lundberg and Lee 2017], and LIME [Ribeiro et al. 2016] on MS and PD indicators. Here, we focus more on the spread of diversity, because we want to obtain a set of solution sets that can achieve tradeoffs between different objectives, potentially providing different explanations for different people. MOFAE was run 30 times independently on each data point being explained, calculating the diversity of the obtained 91 solutions each time. In contrast, since other FAE methods produce only one explanation at a time, the experiment was repeated 91 times for each data point being explained, and the diversity among these 91 solutions was evaluated. In addition, the explanation results of the GD [Simonyan et al. 2014], IG [Sundararajan et al. 2017], and GI [Shrikumar et al. 2016] do not change with the repetition of the experiment, which means that their explanation is not diverse and therefore does not need to compare with these three FAE methods.

Due to space limitations, we are unable to show the performance results of each FAE method at each data point being explained on MS and PD indicators, so we calculated the mean values of all MS and PD indicators calculated on all the data points being explained, and the results are shown in Table 3. In addition, we further showed the distribution of MS and PD indicators for each data point being explained in each dataset through box plots in Figure 4.

Table 3.

Indicator	Dataset	MOFAE	SG	SHAP	LIME
MS	Adult	1.9345	0.2717	0.0705	0.6006
	Iris	1.6477	0.5328	0	1.1175
	German	2.4647	0.0510	0.0765	0.2547
	Mushroom	1.7467	0.2353	0.2387	0.3792
	Wine	2.1523	0.6306	0.0610	0.4456
	Car	1.6205	0.6830	0	0.1996
	Bank	1.9964	0.5406	0.1312	0.2547
	Glass	2.0692	0.4555	0	0.3281
PD	Adult	46472	18000	1688.1	22175
	Iris	34849	10914	0	33037
	German	95416	3401.9	985.51	14679
	Mushroom	12924	11530	5726	21260
	Wine	55594	17551	2700.2	22889
	Car	40554	47117	0	10777
	Bank	44373	32286	2400.4	12602
	Glass	61630	28956	0	19327

Table 3. The Average Performance of Each FAE Method over All Data Points Being Explained Is Compared

On the MS and PD indicators, with larger MS and PD values implying better performance, and the best values are marked in bold.

Fig. 4.

Table 3 and Figure 4 indicate that the overall performance of the MOFAE method is significantly better than the other FAE methods, according to the MS indicators on each dataset. For the PD indicators, the MOFAE method significantly outperforms the other FAE methods on all the datasets except for the Iris and Mushroom datasets, which are comparable to LIME and SG methods, respectively, and slightly worse than LIME and SG methods on the Mushroom and Car datasets, respectively. Overall, MOFAE significantly outperforms other FAE methods, on average, for diversity indicators across all data points being explained.

To further analyze the performance of MOFAE and other FAE methods through MS and PD indicators at each data point being explained, we counted the number of data points where the average of 30 independent runs of the MOFAE method was better/worse than other FAE methods on MS and PD indicators, and the statistical results of all the explained data points are shown in Table 4. Also, to more accurately understand the performance of the MOFAE method relative to other FAE methods on MS and PD indicators for each data point being explained, we further counted the number of times MOFAE was better/worse than other FAE methods in each run (i.e., a run on a data point), and the statistical results of all the explained data points are shown in Table 5. (The reason for not using statistical tests here is that for each data point being explained, although MOFAE performed 30 independent runs and obtained 30 results for MS and PD indicators, for the other FAE methods, only one result for MS and PD indicators could be obtained after 91 independent runs, because only one explanation result could be obtained for each run, so statistical tests could not be performed between MOFAE and the other FAE methods.)

Table 4.

Indicator	Dataset	SG			SHAP			LIME
Indicator	Dataset	better	/	worse	better	/	worse	better	/	worse
MS	Adult	50	/	0	50	/	0	50	/	0
	Iris	48	/	2	50	/	0	41	/	9
	German	50	/	0	50	/	0	50	/	0
	Mushroom	49	/	1	49	/	1	49	/	1
	Wine	35	/	0	35	/	0	35	/	0
	Car	50	/	0	50	/	0	50	/	0
	Bank	50	/	0	50	/	0	50	/	0
	Glass	42	/	0	42	/	0	42	/	0
PD	Adult	48	/	2	50	/	0	44	/	6
	Iris	45	/	5	50	/	0	28	/	22
	German	50	/	0	50	/	0	50	/	0
	Mushroom	16	/	34	35	/	15	8	/	42
	Wine	34	/	1	35	/	0	35	/	0
	Car	18	/	32	50	/	0	50	/	0
	Bank	34	/	16	50	/	0	49	/	1
	Glass	39	/	3	42	/	0	41	/	1

Table 4. Counting the Number of Explained Data Points for which the Average Performance of the MOFAE Method Is Better/worse than the Other FAE Methods on MS and PD Indicators

For each explained data point, compare the mean values of MS and PD indicators obtained from 30 independent runs of the MOFAE method that are better or worse than the values of MS and PD indicators obtained from other FAE methods.

Table 5.

Indicator	Dataset	SG			SHAP			LIME
Indicator	Dataset	better	/	worse	better	/	worse	better	/	worse
MS	Adult	1,500	/	0	1,500	/	0	1,500	/	0
	Iris	1,410	/	90	1,500	/	0	1,177	/	323
	German	1,500	/	0	1,500	/	0	1,500	/	0
	Mushroom	1,500	/	0	1,500	/	0	1,500	/	0
	Wine	1,050	/	0	1,050	/	0	1,050	/	0
	Car	1,473	/	27	1,500	/	0	1,500	/	0
	Bank	1,500	/	0	1,500	/	0	1,500	/	0
	Glass	1,260	/	0	1,260	/	0	1,260	/	0
PD	Adult	1,433	/	67	1,500	/	0	1,290	/	210
	Iris	1,352	/	148	1,500	/	0	767	/	733
	German	1,500	/	0	1,500	/	0	1,500	/	0
	Mushroom	549	/	951	993	/	507	264	/	1,236
	Wine	985	/	65	1,050	/	0	1,017	/	33
	Car	554	/	946	1,500	/	0	1,480	/	20
	Bank	1,041	/	459	1,500	/	0	1,441	/	59
	Glass	1,155	/	105	1,260	/	0	1,239	/	21

Table 5. Counting the Number of Explained Data Points for which the MS and PD Indicators Obtained by Each Independent Run of the MOFAE Method Are Better/worse than Other Methods

For each explained data point, compare the values of the MS and PD indicators obtained from each independent run of the MOFAE method that is better or worse than the values of the MS and PD indicators obtained from the other FAE methods.

Tables 4 and 5 show that the MOFAE method is significantly better than the other FAE methods in light of MS indicators at almost all the data points being explained for each dataset. For the PD indicators, the MOFAE method outperforms other FAE methods for almost all the data points being explained in the Adult, German, Wine, Bank, and Glass datasets. It slightly outperforms the LIME method in the Iris dataset, worse than the SG and LIME methods in the Mushroom dataset, and worse than the SG method in the Car dataset, but both significantly outperform the other FAE methods. Overall, it can be seen that the diversity of MOFAE methods is significantly better than other FAE methods and can find a set of explainable models (i.e., explanations) with different tradeoffs among the objectives.

(2) Visualization of quality metrics of solutions obtained by MOFAE.

To demonstrate more visually the diversity of the MOFAE method, the results of one randomly selected independent run obtained by the MOFAE method for one randomly selected data point from the set of data points being explained in each dataset are shown in a 3D plot. Furthermore, to provide a preliminary suggestion to the user for selecting the appropriate explanation from the solution set, we filter extreme and knee points from the obtained non-dominated solutions. Knee points represent the most concave region of the obtained Pareto front, and in the absence of a specific preference, knee points are most likely to be preferred [Zhang et al. 2014]. In this article, the method of selecting knee points is referred to Zhang et al. [2014] with the parameter \(r_g\) set to 0.1. The final result is shown in Figure 5, where the red points represent knee points, the green points represent extreme points, and the blue points represent other points. Figure 5 shows more intuitively through visualization that MOFAE has good diversity and enables different tradeoffs between different metrics. In addition, as can be seen in Figure 3, which compares different FAE methods, the solution sets obtained by other FAE methods are clustered together, and their diversity is poor. On the contrary, MOFAE provides diverse explanation results. Therefore, Figures 3 and 5 can show more intuitively through visualization that the MOFAE method can trade off between different metrics, and its diversity is significantly better than the other FAE methods.

Fig. 5.

(3) Illustrations of explanation results of the extreme points and a “middle” point from MOFAE.

As we described in Section 1, XAI stakeholders are not a homogeneous, uniform group, and the explanations they require are different, depending on their identity, goals, context, and environment [Gerlings et al. 2022; Doshi-Velez and Kim 2017; Mohseni et al. 2021; Liao and Varshney 2021]. Therefore, we further illustrate how MOFAE can potentially provide tailored explanations to different stakeholders by showing the explanation results obtained by MOFAE in this section. Specifically, on a randomly selected data point from the set of explained data points, we show the explanation results (in absolute values) for the three extreme points and a selected “middle” point near the middle region among all the knee points obtained by the MOFAE method. A simple example is given below to illustrate how to select a “middle” point from all the knee points. Suppose there are three knee points whose distances from the three extreme values (i.e., the minimal values on the three objectives) are \([0, 0.3, 1.0]\), \([0.5, 0.5, 0.5]\), and \([1.0, 0.6, 0]\), respectively. Then, the ranks are assigned according to their distances from the three extreme values, i.e., (1, 1, 3), (2, 2, 2), and (3, 3, 1). The middle rank is (1+3)/2=2 (half of the number of knee points plus one), then the distances of the three points from the middle rank are \(|1-2|+|1-2|+|3-2|=3\), \(|2-2|+|2-2|+|2-2|=0\), and \(|3-2|+|3-2|+|1-2|=3\). Therefore, the second point is closer to the middle rank and is chosen as the “middle” point to show its explanation. Due to space constraints, we only show the explanation results on four datasets with a small number of features, that is, Iris, Car, Glass, and Wine datasets (a small number of features may help the reader to understand better and capture the information), and the results are shown in Table 6. It is worth mentioning that the explanation results of the extreme points presented in Table 6 correspond to the explanation results of the three extreme points (the three green points) of the corresponding dataset in Figure 5. Similarly, the “middle” point shown in Table 6 corresponds to one of the knee points of the corresponding dataset in Figure 5.

Table 6.

As we can see from Table 6, MOFAE can trade off between faithfulness, sensitivity, and complexity. Among them, explanations with high faithfulness (e.g., explanations with maximum \(\mu _F\) in Table 6) can provide detailed and faithful explanations for ML experts, while explanations with low complexity (e.g., explanations with minimum \(\mu _C\) in Table 6) can help regular users understand which input features most influence the final decision of the model and provide simple and clear explanations for regular users. And the “middle” point provides a moderate tradeoff between the three metrics, providing a more concise explanation than the extreme point of faithfulness, a more complex but faithful explanation than the extreme point of complexity, and a medium performance on the sensitivity metric. In addition, when different stakeholders have their specific needs for explanations, they can also select the appropriate explanation results from the set of solutions obtained by MOFAE. Overall, therefore, the MOFAE has good diversity and has the potential to provide tailored explanations to different stakeholders.

4.2.4 Running Time Comparison.

This section reports the average runtimes for each FAE method. For each dataset, we performed 30 independent runs on all the data points being explained and calculated the average time for one independent run on one data point, and the results are shown in Table 7. The computing environment was a Linux server equipped with an AMD Ryzen Threadripper PRO 3995WX 64-core processor with 512 GB RAM.

Table 7.

From the average results, we can see that model-specific FAE methods (i.e., GD, IG, GI, SG) are significantly faster than model-agnostic FAE methods (i.e., MOFAE, SHAP, LIME), but they require the use of gradient information in the model. Among the model-agnostic methods, our method is slower than LIME but faster than SHAP, but overall, their time overheads are in the same order of magnitude.

4.2.5 Parameter Robustness Test.

In this part, we explore the impact on the performance of MOFAE when the parameters in the metrics being optimized differ, i.e., whether the difference in the value of the parameter r in Equation (15) and the parameter \(|S|\) in Equation (13) affect the conclusions we obtained above. Specifically, in the experimental parameter settings above, we took the parameter r between [0.2, 2.0] and set the parameter \(|S|\) to 2. Therefore, in this section, we took five different values of parameter r between [0.2, 2.0], i.e., 0.2, 0.6, 1.0, 1.5, and 2.0, and took the values of parameter \(|S|\) to 1, 2, and 3, respectively, to explore whether different parameter settings (i.e., two parameters with a total of 15 different sets of parameter values) affect the HV values of the solution set obtained by MOFAE and the dominance relationship with other FAE methods. Thirty independent runs were performed for each data point using each set of parameters.

We first show the average results of the trend of HV values for all explained data points with different parameters r and \(|S|\), as shown in Figure 6. Due to space constraints, we only show the results for the first three datasets, but similar results are observed for the remaining datasets. From Figure 6, we can see that the HV values of MOFAE are very robust against different parameters r and \(|S|\), i.e., the different values of parameters r and \(|S|\) do not affect the convergence and diversity of our method.

Fig. 6.

In addition, we compared the degree of fluctuation in the change of the dominance relationship between the solution sets obtained by MOFAE and other FAE methods with different parameters r and \(|S|\). Specifically, under each set of parameters, we compared the dominance relationship between the 91 solutions obtained by MOFAE and the 91 solutions obtained by each of the other FAE methods run 91 times independently, one-by-one. Finally, the fluctuation range of the dominance relationship between MOFAE and other methods under 15 different sets of parameters was counted separately, as shown in Table 8 below (the calculation of the percentage of the dominance relationship between MOFAE and other FAE methods is consistent with Section 4.2.2). In addition, we also show the fluctuation curves of the dominance relationship between MOFAE and other FAE methods under 15 different sets of parameters (average results over eight datasets), as shown in Figure 7.

Table 8.

Dataset	Relationship	GD	IG	GI	SG	SHAP	LIME
Adult	Dominate	[ 72 \(\%\), 74 \(\%\)]	[7.2\(\%\), 14 \(\%\)]	[ 59 \(\%\), 62 \(\%\)]	[ 79 \(\%\), 82 \(\%\)]	[ 50 \(\%\), 55 \(\%\)]	[ 73 \(\%\), 76 \(\%\)]
	Non-dominate	[ 26 \(\%\), 28 \(\%\)]	[ 86 \(\%\), 93 \(\%\)]	[ 38 \(\%\), 41 \(\%\)]	[ 18 \(\%\), 21 \(\%\)]	[ 45 \(\%\), 50 \(\%\)]	[ 24 \(\%\), 27 \(\%\)]
	Be dominated	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.3\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]
Iris	Dominate	[ 63 \(\%\), 68 \(\%\)]	[3.8\(\%\), 16 \(\%\)]	[ 55 \(\%\), 66 \(\%\)]	[ 73 \(\%\), 81 \(\%\)]	[ 40 \(\%\), 48 \(\%\)]	[ 50 \(\%\), 57 \(\%\)]
	Non-dominate	[ 32 \(\%\), 37 \(\%\)]	[ 84 \(\%\), 96 \(\%\)]	[ 34 \(\%\), 45 \(\%\)]	[ 19 \(\%\), 27 \(\%\)]	[ 52 \(\%\), 60 \(\%\)]	[ 43 \(\%\), 50 \(\%\)]
	Be dominated	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]
German	Dominate	[ 58 \(\%\), 61 \(\%\)]	[0.0\(\%,0.5\%\)]	[ 34 \(\%\), 37 \(\%\)]	[ 64 \(\%\), 67 \(\%\)]	[0.0\(\%,0.3\%\)]	[ 22 \(\%\), 27 \(\%\)]
	Non-dominate	[ 39 \(\%\), 42 \(\%\)]	[ 99 \(\%,100\%\)]	[ 63 \(\%\), 66 \(\%\)]	[ 33 \(\%\), 36 \(\%\)]	\([\)\(\hspace{-0.55542pt}1\hspace{-0.55542pt}0\hspace{-0.55542pt}0\hspace{-0.55542pt}\)\(\%\)\(\hspace{-0.55542pt},\hspace{-0.55542pt}1\hspace{-0.55542pt}0\hspace{-0.55542pt}0\hspace{-0.55542pt}\)\(\%\)\(\hspace{-0.55542pt}\)\(]\)	[\(\hspace{2.77771pt}\)73 \(\%\), 78 \(\%\)]
	Be dominated	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]
Mushroom	Dominate	[ 79 \(\%\), 90 \(\%\)]	[ 73 \(\%\), 86 \(\%\)]	[ 78 \(\%\), 89 \(\%\)]	[ 91 \(\%\), 98 \(\%\)]	[ 40 \(\%\), 54 \(\%\)]	[ 86 \(\%\), 92 \(\%\)]
	Non-dominate	[ 10 \(\%\), 21 \(\%\)]	[ 14 \(\%\), 27 \(\%\)]	[ 11 \(\%\), 22 \(\%\)]	[1.7\(\%,8.5\%\)]	[ 46 \(\%\), 60 \(\%\)]	[7.9\(\%\), 14 \(\%\)]
	Be dominated	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]
Wine	Dominate	[ 29 \(\%\), 32 \(\%\)]	[ 15 \(\%\), 38 \(\%\)]	[ 23 \(\%\), 27 \(\%\)]	[ 75 \(\%\), 83 \(\%\)]	[ 24 \(\%\), 42 \(\%\)]	[ 39 \(\%\), 49 \(\%\)]
	Non-dominate	[ 68 \(\%\), 71 \(\%\)]	[ 62 \(\%\), 85 \(\%\)]	[ 73 \(\%\), 77 \(\%\)]	[ 17 \(\%\), 25 \(\%\)]	[ 58 \(\%\), 76 \(\%\)]	[ 51 \(\%\), 61 \(\%\)]
	Be dominated	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]
Car	Dominate	[ 27 \(\%\), 31 \(\%\)]	[ 18 \(\%\), 36 \(\%\)]	[ 21 \(\%\), 30 \(\%\)]	[ 69 \(\%\), 81 \(\%\)]	[ 24 \(\%\), 36 \(\%\)]	[ 30 \(\%\), 39 \(\%\)]
	Non-dominate	[ 69 \(\%\), 73 \(\%\)]	[ 64 \(\%\), 82 \(\%\)]	[ 70 \(\%\), 79 \(\%\)]	[ 19 \(\%\), 31 \(\%\)]	[ 64 \(\%\), 76 \(\%\)]	[ 61 \(\%\), 70 \(\%\)]
	Be dominated	[0.0\(\%\), 0.3\(\%\)]	[0.0\(\%\), 0.1\(\%\)]	[0.0\(\%\), 0.1\(\%\)]	[0.0\(\%\), 0.1\(\%\)]	[0.0\(\%\), 0.1\(\%\)]	[0.0\(\%\), 0.5\(\%\)]
Bank	Dominate	[ 75 \(\%\), 80 \(\%\)]	[ 10 \(\%\), 24 \(\%\)]	[ 51 \(\%\), 62 \(\%\)]	[ 75 \(\%\), 84 \(\%\)]	[ 34 \(\%\), 44 \(\%\)]	[ 55 \(\%\), 63 \(\%\)]
	Non-dominate	[ 20 \(\%\), 25 \(\%\)]	[ 76 \(\%\), 90 \(\%\)]	[ 38 \(\%\), 49 \(\%\)]	[ 16 \(\%\), 25 \(\%\)]	[ 56 \(\%\), 66 \(\%\)]	[ 37 \(\%\), 45 \(\%\)]
	Be dominated	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]
Glass	Dominate	[ 73 \(\%\), 77 \(\%\)]	[ 16 \(\%\), 26 \(\%\)]	[ 57 \(\%\), 62 \(\%\)]	[ 85 \(\%\), 90 \(\%\)]	[ 29 \(\%\), 34 \(\%\)]	[ 59 \(\%\), 66 \(\%\)]
	Non-dominate	[ 23 \(\%\), 27 \(\%\)]	[ 74 \(\%\), 84 \(\%\)]	[ 38 \(\%\), 43 \(\%\)]	[ 10 \(\%\), 15 \(\%\)]	[ 66 \(\%\), 71 \(\%\)]	[ 34 \(\%\), 41 \(\%\)]
	Be dominated	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.1\(\%\)]	[0.0\(\%\), 0.1\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.1\(\%\)]	[0.0\(\%\), 0.1\(\%\)]
Mean	Dominate	[ 60 \(\%\), 64 \(\%\)]	[ 18 \(\%\), 30 \(\%\)]	[ 47 \(\%\), 54 \(\%\)]	[ 76 \(\%\), 83 \(\%\)]	[ 30 \(\%\), 39 \(\%\)]	[ 52 \(\%\), 59 \(\%\)]
	Non-dominate	[ 36 \(\%\), 40 \(\%\)]	[ 70 \(\%\), 82 \(\%\)]	[ 46 \(\%\), 53 \(\%\)]	[ 17 \(\%\), 24 \(\%\)]	[ 61 \(\%\), 70 \(\%\)]	[ 41 \(\%\), 48 \(\%\)]
	Be dominated	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.0\(\%\)]	[0.0\(\%\), 0.1\(\%\)]

Table 8. The Fluctuation Range of the Dominance Relationship between MOFAE and Other FAE Methods

The results in each table cell represent the fluctuation range of the dominance relationship between MOFAE and the corresponding FAE method for 15 sets of parameters. And the average results of the eight datasets are counted at the bottom.

Fig. 7.

From Table 8 and Figure 7, we can conclude that (1) the values of parameters r and \(|S|\) have little effect on the percentage of MOFAE dominated by other FAE methods, because MOFAE is hardly dominated by other FAE methods regardless of the parameters taken; (2) in some cases, there are fluctuations in the percentage of MOFAE dominating other FAE methods and non-domination with other FAE methods, especially in the comparison with the IG method. However, the average results show that the dominance relationship between MOFAE and other FAE methods is relatively stable under different parameters; (3) in general, the different parameters do not affect our conclusion obtained in Section 4.2.2 that the solutions obtained by MOFAE are hardly dominated by other methods, in contrast, many MOFAE solutions can dominate other FAE methods, and many solutions are non-dominated with other FAE methods.

5 Conclusion and Future Work

In this article, we first define the generation of feature attribution explainable models as a multi-objective learning problem and propose a multi-objective feature attribution explanation (MOFAE) framework to solve the problem, which can simultaneously consider multiple explanation quality metrics during training explainable models and thus can potentially provide explanations with different tradeoffs for different people. We first identified that the faithfulness, sensitivity, and complexity metrics of FAE conflict with each other. Then our framework was instantiated to optimize these three metrics simultaneously. Our method was compared with six state-of-the-art FAE methods on eight well-known benchmark datasets. The experimental results show that our method can simultaneously optimize multiple conflicting metrics and outperform other FAE methods with higher faithfulness, lower sensitivity, and complexity. Moreover, the solution set obtained by our method has good diversity, enabling tradeoffs between different explanation quality metrics and potentially emphasizing tailored explanations for different users. In addition, our method has a comparable computational time overhead to similar FAE methods, and the conclusions drawn are not sensitive to the parameters in the optimized objective.

In the future, we plan to (1) conduct user survey experiments to evaluate the explanation quality of MOFAE, (2) validate the effectiveness of our framework on more datasets and explanation quality metrics, (3) study ensemble strategies for generating better explainable models [Chandra and Yao 2006; Chen and Yao 2010], and (4) investigate fair machine learning models [Zhang et al. 2022;, 2021a; Alikhademi et al. 2021] that need to be explained better (e.g., explaining why or why not a model is fair).

Footnote

For more details on the concept of Shapley values, see Chapter 9.5 in Molnar [2020] or Lundberg and Lee [2017].

References

[1]

Kjersti Aas, Martin Jullum, and Anders Løland. 2021. Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Artif. Intell. 298 (2021), 103502.

Abstract

1 Introduction

2 Related Work

2.1 FAE Methods

2.1.1 Gradient-based FAE Methods.

2.1.2 Perturbation-based FAE Methods.

2.2 Evaluation Criteria of FAE Methods

2.2.1 Faithfulness Metrics.

2.2.2 Sensitivity Metrics.

2.2.3 Complexity Metrics.

3 Multi-objective Evolutionary Learning to Find Better Feature Attribution Explanation

3.1 Formal Definition of the Multi-objective Explanation Problem

3.2 MOFAE Framework

3.3 An Instantiation of Our MOFAE Framework

3.3.1 Metrics for Evaluating the Explanation Quality.

3.3.2 Explained Models and Data Points and Explainable Models.

3.3.3 Multi-objective Optimizer and Optimized Variables.

4 Experimental Studies

4.1 Experimental Setup

4.2 Experimental Results and Discussions

4.2.1 Q1. Do the Explainable Model’s Faithfulness, Sensitivity, and Complexity Conflict with Each Other?.

4.2.2 Q2. Can Our Method Simultaneously Optimize These Conflicting Metrics and Be Competitive with Other State-of-the-art FAE Methods?.

4.2.3 Q3. Can Our Method Find a Set of Explainable Models (i.e., Explanations) with Different Tradeoffs among the Objectives (i.e., Metrics), Potentially Providing Different Explanations for Different People?.

4.2.4 Running Time Comparison.

4.2.5 Parameter Robustness Test.

5 Conclusion and Future Work

Footnote

References

Cited By

Index Terms

Recommendations

Feature Attribution Explanation Based on Multi-Objective Evolutionary Learning

Explanation sets: A general framework for machine learning explainability

Explanation Strategies as an Empirical-Analytical Lens for Socio-Technical Contextualization of Machine Learning Interpretability

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations