1 Introduction
With the recent development and success of
artificial intelligence (AI), AI-enabled systems have become very popular and prevalent in more and more areas, such as healthcare, financial services, transportation, retail, and so forth. AI has become an increasingly important part of our lives, from image and facial recognition systems [Qiu et al.
2021], medical diagnosis [Kleppe et al.
2021], and financial risk control [Zhang
2020], to autonomous vehicles [Vishnukumar et al.
2017], and hence has a more and more significant impact on human lives. Nowadays, as decisions from AI systems play important roles in human lives, researchers are increasingly focusing on explainability, transparency, and many other ethical issues related to AI [Huang et al.
2022]. There is a crucial need to understand how AI systems make decisions so people can trust these AI-based systems. However, many
machine learning (ML) models, especially the most popular deep neural networks, are “black boxes” and are quite opaque. Consequently, the decisions made by AI systems powered by black-box ML models are difficult to comprehend or understand by humans, which leads people to not fully trust these AI systems, especially in high-risk areas such as healthcare, finance, and law. The opacity of AI/ML has stimulated the rise of
eXplainable AI (XAI), which refers to methods and techniques to provide insights into the decision-making of AI systems and hence contribute to improving the interpretability of AI [Arrieta et al.
2020].
Additionally, explainability or interpretability has become one of the requirements in AI regulations. For example, the
General Data Protection Regulation (GDPR), adopted by the European Parliament in 2018, stipulates that the algorithms of AI should be able to explain their decision logic [Union
2018]. Furthermore, the European Commission High-Level Expert Group on Artificial Intelligence, the Chinese New Generation AI Governance Expert Committee, the US Food and Drug Administration, and others have also issued documents requiring the development of transparent and explainable AI systems [Committee
2019; Intelligence
2019; Food et al.
2021]. These regulations clarify the need for explaining algorithmic decisions, which promote the development of XAI.
As XAI has received widespread attention and interest from the community, many efforts have been put into improving the explainability of AI, and several XAI methods have been proposed during the past decade [Adadi and Berrada
2018; Das and Rad
2020; Zhang et al.
2021b; Guidotti et al.
2018; Thampi
2022]. The existing XAI methods can be classified into the following three perspectives [Kamath and Liu
2021]:
–
Based on the application stage, XAI methods can be categorized as pre-model, intrinsic, and post hoc (post-model) explanation methods.
–
Based on the dependence between XAI methods and the model, XAI methods can be divided into model-specific and model-agnostic explanation methods.
–
Based on the scope of the explanation, XAI methods can be classified into global and local explanation methods.
Among the existing XAI methods,
feature attribution-based explanation (FAE) methods are a popular collection of interpretable ML techniques that explain the predictions of a black-box ML model by computing the attribution of each input feature (i.e., the importance of each input feature) to the model’s prediction [Wang et al.
2023]. According to the above taxonomy, the FAE methods discussed in this article belong to both post hoc and local explainability approaches. FAE methods can be further divided into gradient-based and perturbation-based [Molnar
2020]. Usually, gradient-based FAE methods are model-specific, as they compute the attribution or importance of input features based on some evaluation of gradient signals passed from the back-propagation of neural networks, including
Integrated Gradients (IG) [Sundararajan et al.
2017],
SmoothGrad (SG) [Smilkov et al.
2017], and GradCAM [Selvaraju et al.
2017]. While perturbation-based FAE methods are model-agnostic, they identify the attribution of input features by evaluating the impact of feature perturbation on the model’s output, and such methods include SHAP [Lundberg and Lee
2017], LIME [Ribeiro et al.
2016], and its related variants (e.g., K-LIME [Hall et al.
2017], DLIME [Zafar and Khan
2019], and BayLIME [Zhao et al.
2021]). Obviously, perturbation-based FAE methods are more general and flexible than gradient-based FAE methods, because they do not require any information internal to the model, and they are model-agnostic, i.e., they can be applied to any model.
Recently, FAE methods have received more and more attention, because they can provide easy-to-comprehend explanations through feature importance scores. In addition, much of the literature states that FAE is by far the most vibrant, widely used, and well-studied explainability technique [Bhatt et al.
2020; Arrieta et al.
2020]. Moreover, the FAE method has been widely used in various fields such as healthcare [Dave et al.
2020; Shaikh et al.
2022], finance [Ohana et al.
2021; Jaeger et al.
2021], and law [Górski and Ramakrishna
2021]. Therefore, this work focuses on FAE methods.
With the widespread use of FAE methods, various metrics or criteria have been proposed to evaluate the quality of the explanations generated by FAE methods, including faithfulness [Samek et al.
2016], stability [Fel et al.
2022], complexity [Bhatt et al.
2021], monotonicity [Luss et al.
2021], and so on. However, so far, there is still no common agreement on which metric(s) is/are suitable to assess the explanations. Many papers point out that different stakeholders require different explanations [Rosenfeld and Richardson
2019; Mohseni et al.
2021; Gerlings et al.
2022]. For example, a study by Gerlings et al. [
2022] found that different stakeholders require different tradeoffs between faithfulness and complexity of explanation. The more closely stakeholders work with the AI model, the more faithful explanation is required. Specifically, AI experts might prefer a more faithful, stable, and detailed (i.e., relatively complex) explanation to help them optimize and diagnose their models [Rosenfeld and Richardson
2019; Mohseni et al.
2021]. However, for non-AI experts, explanations that AI experts can understand may be too complex to comprehend for them due to the lack of relevant background and knowledge [Jiang and Senge
2021]. Similarly, Poursabzi-Sangdeh et al. [
2021] found that providing overly detailed explanations to non-AI experts can instead cause information overload, making them difficult to understand. Therefore, regular users would prefer a shorter, clearer, and less complex explanation that can intuitively help them understand how that AI system is making decisions [Mohseni et al.
2021; Rosenfeld and Richardson
2019]. In addition, Fel et al. [
2022] pointed out that faithfulness is only the first step toward a good explanation, and the sensitivity of the explanations needs to be considered under the premise of good faithfulness.
The above examples indicate that although different stakeholders all have needs for explanations, their focus and the explanations they want to obtain are different. Therefore, the FAE method needs to consider different aspects when generating explanations and should even be able to trade off between the various aspects to provide different explanations to different stakeholders to meet their needs. However, to the best of our knowledge, no work has considered multiple aspects simultaneously in the process of generating explanations. Specifically, gradient-based FAE methods use gradient information to attribute each feature without explicitly considering the performance of the explanation in various aspects such as faithfulness, sensitivity, and complexity; perturbation-based FAE methods locally minimize the distance between the explainable model g and the explained model f in the process of generating explanations, thus ensuring the faithfulness of the provided explanation. However, such methods consider only the faithfulness of the explanation in the process of generating explanations, ignoring other aspects of the explanation, such as sensitivity, complexity, and so on.
In summary, none of the existing methods considers multiple aspects of the explanation, especially ignoring the sensitivity and complexity of the explanation in the process of generating explanations. Thus, this motivates us to consider multiple aspects of explanation in the process of generating explanations, i.e., multiple metrics to evaluate the explanation quality. To achieve this, we formulate the problem as a multi-objective learning problem [Chandra and Yao
2006; Chen and Yao
2010; Minku and Yao
2013] and propose a framework to address it. The framework considers different explanation quality metrics of FAE methods as objectives to be optimized to obtain a set of explanations with different tradeoffs across the metrics, thus achieving the goal of providing different explanations to different stakeholders.
Our research can be broken down into three more specific questions:
Q1.
Do the FAE explainable model’s faithfulness, sensitivity, and complexity conflict with each other?
Q2.
Can our method simultaneously optimize these conflicting metrics and be competitive with other state-of-the-art FAE methods?
Q3.
Can our method find a set of explainable models (i.e., explanations) with different tradeoffs among the objectives (i.e., metrics), potentially providing different explanations for different people?
To answer the above three questions, this article proposes a
multi-objective feature attribution explanation (MOFAE) framework that simultaneously considers multiple metrics used to evaluate explanation quality. The main contributions of this article are as follows:
(1)
We analyze the relationship among several metrics used to evaluate the explanation quality experimentally and show them to be in conflict with each other.
(2)
We define the problem of generating FAE explainable models as a multi-objective learning problem and propose a multi-objective feature attribution explanation (MOFAE) framework to solve the new problem based on existing optimization algorithms, which simultaneously considers several metrics for evaluating the explanation quality during the process of generating explanations. We also implement the framework with a specific instantiation that uses three metrics to evaluate the explanation results’ faithfulness, sensitivity, and complexity as optimization objectives. This is the first time that sensitivity and complexity are considered in an experimental study.
(3)
The comparison results with six state-of-the-art FAE methods on eight well-known benchmark datasets show that our MOFAE is highly competitive. The proposed MOFAE method can obtain explanations with higher faithfulness and lower sensitivity and complexity and can complement other FAE methods.
(4)
MOFAE has better diversity than other FAE methods, i.e., it can achieve different tradeoffs among different explanation quality metrics, and we have illustrated that our method has the potential to provide tailored explanations to different stakeholders by showing the explanation results.
The rest of this article is organized as follows: Section
2 describes the relevant FAE methods and their evaluation criteria. Section
3 defines the multi-objective explanation problem under consideration and presents our MOFAE framework and a practical instantiation of the framework. Section
4 answers the three questions presented above through various experimental studies, evaluates the computational efficiency of MOFAE, and performs robustness tests. Section
5 concludes this article and indicates some future research directions.