1 Introduction

Artificial intelligence (AI) is increasingly applied in various domains, including medicine (Caruana et al. 2015; Jin et al. 2020), law (Jin et al. 2023), autonomous driving (Atakishiyev et al. 2021), and loan application approval (Sachan et al. 2020). AI-based systems often play a key role in the associated critical decision-making processes. Responsible individuals, such as physicians, judges, drivers, and bankers, require explanations to different extents for the output generated by such AI-based systems. With AI becoming more prevalent as a decision helper in high-stakes sectors, it is crucial to ensure that AI is understandable to its users (Longo et al. 2020), guaranteeing safe, responsible, and legally compliant usage (Lagioia et al. 2020).

Machine learning (ML) pipelines are capable of producing precise predictions, however, they frequently fail to incorporate two crucial phases, namely understanding and explaining. Understanding involves analyzing the problem domain, data, and model behavior, including training and quality assurance. Explaining is crucial throughout the ML lifecycle, especially when models are deployed in real-world applications. Both phases contribute to model interpretability, trust, and effective implementation (Dwivedi et al. 2023). The increasing autonomy and complexity of AI-based systems pose challenges for software engineers and domain experts to comprehend them fully (Lipton 2018). This necessity has led to the development of eXplainable AI (XAI) systems, where explanations play a pivotal role. Such systems help users understand decisions made by the AI, thereby increasing confidence and trustworthiness (Markus et al. 2021). Moreover, explanations serve to verify and validate actions taken, identify the causes of errors, and reduce the likelihood of human mistakes.

Explainability is an emerging non-functional requirement that has garnered attention as a critical quality aspect for AI-based systems (Chazette and Schneider 2020; Köhl et al. 2019). An existing study highlights that explainability significantly impacts the overall quality of software systems (Chazette et al. 2021), contributing to other essential quality features such as transparency and understandability. Notably, the Ethics Guidelines for Trustworthy AI (Lipton 2018), recommended by the High-Level Expert Group on Artificial Intelligence (AI HLEG), prioritize transparency as an essential requirement. These guidelines underscore the significance of traceability, explainability, and communication in AI-based systems and stress the importance of providing a clear account of the decision-making process from the perspectives of relevant stakeholders.

To facilitate the development of explainable AI-based systems, methodologies are essential for requirements engineers to analyze, delineate, and assess the requirements related to explainability. However, given that the notion of explainability has recently emerged as a critical quality factor among non-functional requirements (Köhl et al. 2019), there is currently no adequate guidance available to assist practitioners in this regard. To understand how practitioners deal with this lack of guidance, we conducted an interview study to gather insights from practitioners regarding their interpretations of explainability, the practices they actively employ, and the difficulties they encounter. These interviews are conducted within the context of various stakeholders and industry requirement engineering (RE) practices.

2 Background and Related Work

Several researchers have explored the impact of explainability on AI-based systems. According to Brunotte et al. (2021), explanations can enhance user privacy awareness, while (Kästner et al. 2021) argue that explainable systems increase trustworthiness. However, requests for explainability and the means to achieve it often lack clarity (Köhl et al. 2019). Sadeghi et al. (2021) proposed a taxonomy outlining different reasons for explanations, such as improving user interaction. Sheh (2021) categorized explainability requirements based on the explanation source, depth, and scope in the context of requirements engineering. Ethical guidelines for trustworthy AI, like fairness and explainability, have been emphasized by Ishikawa and Matsuno (2020) and Kuwajima and Ishikawa (2019), while Vogelsang and Vogelsang and Borg (2019) stressed the importance of eliciting explainability requirements from the users’ perspectives.

Despite the progress, challenges persist in achieving a comprehensive explainability norm. The lack of a consistent definition and varying stakeholder understandings represent major obstacles (Köhl et al. 2019; Suresh et al. 2021). Stakeholder-centric approaches are essential due to the disparity between AI-generated explanations and the comprehension of stakeholders (Jansen Ferreira and Monteiro 2021). Additionally, existing model interpretability methods (Carvalho et al. 2019) may not cater explicitly to end-users (Suresh et al. 2021), requiring more user-friendly explanations. Further challenges include going beyond existing AI explainability techniques (Dhanorkar et al. 2021), accommodating different levels of explanations arising from dynamic interactions between stakeholders, and tailoring methods to different explainees with unique interests (Henin and Le Métayer 2021).

Many researchers strived to explore different perspectives of explainability. In a qualitative study, Brennen (2020) conducted 40 interviews and 2 focus groups over nine months. Their goal was to gain a clearer understanding of how various stakeholders in both government and industry describe the issue of Explainable AI. The paper highlights two significant findings: (1) current discourse on Explainable AI is hindered by a lack of consistent terminology, and (2) there are multiple distinct use cases for Explainable AI. They placed their findings in the context of needs that existing tools cannot meet. Furthermore, Hoffman et al. (2023) focuses on the explanation needs of stakeholders in AI/XAI systems. The authors introduced the Stakeholder Playbook, designed to help system developers consider various ways stakeholders need to “look inside” these systems, such as understanding their strengths and limitations. The authors conducted an empirical investigation involving cognitive interviews with senior and mid-career professionals experienced in developing or using AI systems. Krishna et al. (2022) analyzes the disagreement problem in explainable ML, focusing on conflicts among post-hoc explanation methods. The authors conducted semi-structured interviews with 25 data scientists to define practical explanation disagreements; formalizing the concept and creating a framework to quantitatively assess these disagreements. Further, the authors applied the framework to measure disagreement levels across various datasets, data modalities, explanation methods, and predictive models. Lakkaraju et al. (2022) conducted a user evaluation study. The authors investigated the needs and desires for explainable AI among doctors, healthcare professionals, and policymakers, revealing a strong preference for interactive explanations in the form of natural language dialogues. The study highlighted that domain experts want to treat ML models as accountable colleagues, capable of explaining their decisions in an expressive and accessible manner. Based on these findings, the authors propose five guiding principles for researchers to follow when designing interactive explanations, providing a foundation for future work in this area.

While the presented studies provide valuable insights into the challenges of explainability, a comprehensive method for meeting end-users’ demands in terms of explainability and ensuring that the specified requirements align with those demands have not yet been fully established. Nevertheless, our goal is to gather insights from ML practitioners regarding their daily practices, challenges, and trade-offs. Our study offers the following observations:

  • We explored how practitioners define explainability, categorizing these definitions based on their intended use. We also synthesized how their roles influence their definitions and compared these practical definitions with those found in the literature.

  • We highlighted the main reasons practitioners need explainability and how the intended audience influences their choice of practices to address explainability.

  • We identified challenges faced by practitioners and, based on these challenges, suggested future directions for research and practice.

  • We examined the trade-offs practitioners must consider when implementing explainability.

This suggests that there is a need for a more robust approach to ensure that the end users’ needs for explainability are met and that the requirements can be validated accordingly. Addressing diverse stakeholder-centric perspectives, specifically non-technical, and providing understandable explanations are vital goals. As the field of AI continues to evolve, embracing a RE approach becomes paramount in addressing these challenges, facilitating effective communication, and ultimately ensuring the transparency and trustworthiness of AI-based systems.

3 Study Design

To structure our research efforts, we followed the five-step process suggested by Runeson and Höst (2009). Initially, we formulated our research objective and the associated research questions.

Our objective is to identify the practices, challenges, and trade-offs professionals face in the XAI field. Furthermore, we want to chart potential research avenues for the academic community to develop solutions addressing XAI challenges and providing effective support for practitioners. To establish a clear direction and scope for our study, we subsequently formulate four key research questions:

RQ1::

How do ML practitioners describe explainability from their perspective?

RQ2::

What practices do ML practitioners employ to evaluate the necessity of explainability, and which practices do they apply to address it?

RQ3::

Which explainability challenges do ML practitioners experience?

RQ4::

Which trade-offs between explainability and other quality attributes do ML practitioners consider?

These research questions aim to identify the need for understanding the diverse aspects of explainability in AI-based systems. RQ1 seeks to capture how ML practitioners conceptualize explainability, providing insight into diverse perspectives of primary stakeholders, and establishing a foundational understanding for the concept explainability. RQ2 aims to uncover the current practices used to evaluate the necessity of explainability and the strategies employed to achieve it. Further, identifying practical approaches and gaps in the implementation of these practices. RQ3, eventually, focuses on identifying the specific challenges faced by practitioners, which is crucial for addressing explainability effectively. Finally, RQ4 seeks to identify the trade-offs between explainability and other quality attributes, recognizing that balancing these aspects is essential for the practical deployment of AI-based systems. These questions as a whole provide a comprehensive exploration of explainability in the context of AI-based systems, guiding future research and development to enhance their transparency, trustworthiness, and usability.

To answer these research questions, we followed an exploratory and qualitative research approach by conducting semi-structured interviews (Runeson and Höst 2009). Without a preconceived hypothesis, we aim to investigate the topic in a preliminary and open-ended manner. This approach gave us a foundational structure while at the same time granting us the flexibility to dynamically adjust our inquiries in response to participants’ feedback. To guide our selection of interviewees, we established the following set of criteria:

  • The person has at least two years of experience working on AI/ML projects.

  • The person is currently working on an AI/ML project or has worked on one in the past two years.

We enlisted participants through personal industry connections within our research team and by contacting individuals in our LinkedIn network. We followed the referral chain sampling (Baltes and Ralph 2022) in which participants are initially selected through convenience sampling, and then asked to refer or recommend other potential participants. Eight participants were recruited through convenience sampling and six via referral chain sampling.

3.1 Interview Process

An interview preamble (Runeson and Höst 2009) was designed to explain the interview process and theme to the participants before conducting the interviews. This document was distributed to the participants in advance to acquaint them with the study. The preamble also delineated ethical considerations, such as confidentiality assurances, requests for consent for audio recordings, and a guarantee that recordings and transcripts would remain confidential and unpublished. Further, we created an interview guide (Seaman 2008) containing the questions organized into thematic categories. This guide helped to structure and organize the semi-structured interviews but was not provided to interviewees before the sessions. Additionally, we prepared a slide presentation with supplementary materials to provide contextual information related to our research. These slides provide information about the RE process and explore the intersection between XAI and RE. These slides were presented to the participants immediately before the interviews.

In total, we conducted a series of 14 individual interviews. All of them were conducted remotely via Webex in English and lasted 35 to 55 minutes. We loosely adhered to the structure provided in the interview guide but adapted our approach based on the participant’s responses. To establish rapport and initiate the discussions, we began by asking participants to introduce themselves, outlining their roles and the specific systems they were involved with. Subsequently, we transitioned into the topic of explainability and explored the practices followed by industry practitioners in this domain. The next segment focused on the challenges encountered by participants when endeavoring to make their systems more explainable. Lastly, we delved into inquiries regarding the trade-offs between explainability and other quality attributes. Following the interviews, we transcribed each audio recording to create a textual document for further analysis.

3.2 Data Analysis

Our data analysis started with coding each transcript, adhering to the constant comparison method rooted in grounded theory principles (Seaman 2008). Following the preliminary established set of codes, we assigned labels to relevant paragraphs within the transcript. Throughout this procedure, we generated new labels, revisited previously completed transcripts, and occasionally renamed, split, or merged labels as our understanding of the cases grew more comprehensive. After that, we conducted a detailed analysis of the intricacies and relationships among the codes within each transcript. This analysis resulted in the creation of a textual narrative for each case.

The supplementary materials created in this process, including study design and the interview preamble, can be found onlineFootnote 1.

4 Results

In this section, we present the interview results, grouped by our four initially stated research questions. The data resulted from interviewing a cohort of 16 participants. We decided to exclude the data obtained from two participants. The rationale for this exclusion stemmed from our observation that these two individuals were Ph.D. students and lacked sufficient industry experience according to our criteria. This left us with 14 participants, representing 9 German companies, one Nigerian company, and one Swiss company. Of the 14 participants, 9 were from companies that provide AI solutions, 4 were from the automotive domain and one participant was from a policymaking agency, as shown in Table 1. All participants were experienced with AI-based systems, and some had also prior experience in different software engineering roles.

4.1 How Do ML Practitioners Describe ‘Explainability’ from their Perspective? (RQ1)

We first asked practitioners to define explainability, thereby gathering diverse perspectives on this concept, as shown in Table 2. They also mentioned their current target audience to provide explainability. Based on their responses, we also identified the audience, i.e., the target users for whom an explanation is warranted. We further grouped these definitions into four categories. In the following, we summarize the interviewee’s definitions based on these four categories.

Explainability for Transparency and Trust

P1, P5, P8, P10, and P14 described explainability in terms of transparency and trust. During the interview, P1 defined explainability by stating that “explainability can be defined as a requirement for general acceptability of a system where lack of transparency makes it challenging.” Similarly, P14 stated that “explainability contributes to establishing trust in new applications.” In an ML context, P5 described that “explainability is making practitioners understand a pre-trained model to understand the constraints, limitations, and the opportunities for improvement and maybe even potential risks in the business context”. Further, from a feature-centric perspective, P8 mentioned that “trusting and being able to explain the features is crucial because if they cannot be trusted or explained, it may lead to issues with the ML model”. P10 specified explainability close to the core of ML development by stating that “explainability should be considered from various perspectives throughout the development process. It impacts system properties such as requirements, design, testing, implementation, and safety. It is crucial to address explainability in each stage to ensure accountability and responsibility.”

Explainability for Understanding Decision-making and Model Improvement

Another view of explainability, highlighted by six ML practitioners, was explainability for understanding the decision-making process of an AI-based system. P2, P4, P6, and P9 emphasized that explainability is essential for the end-user to understand how the system has arrived at a particular result. In this context, P6 stated that “we are still far away from removing the human-in-the-loop. So, at the end of the day, the business leader has to approve the decision.” P2, P12, and P13 emphasized that explaining and understanding the decision-making process of an AI-based system can serve as a debugging tool for developers. Lastly, P13 said that “explanations aid in informed decision-making and taking appropriate action.”

Table 1 Participant demographics
Table 2 Explainability insights by participants

Explainability for Model Insights

Participant P7 defined explainability as making complex concepts understandable to non-technical individuals by elucidating the types of data and attributes utilized in the process. Similarly, P11 defined explainability as “why the model is giving certain results or predictions by explaining how the model works and the impact of features or data.” P3 stated that he “would define [it] in a way that somebody could describe what happens inside the black box”.

Explainability for Safety and Bias Mitigation

Only one interviewee (P3) defined explainability in terms of safety and bias mitigation: “it is about safety requirements not to [be] biased against humans.”

These diverse perspectives on explainability highlight the need for a comprehensive approach that caters to both technical and non-technical stakeholders. We furthermore analyzed how the role of participants influences their perspective of explainability, as can be seen in Fig. 1. The biggest group of Data scientists generally prioritize transparency and acceptability, whith P1 adding a focus on clear communication of model operations and ensuring stakeholder understanding. Similarly, P2 emphasized the need for justifying decisions and using explainability for debugging. Participants P4 and P12 emphasized the importance of understanding system filtering and results, while P6 and P7 highlighted concerns about bridging the gap between technical and non-technical stakeholders. Building trust through feature explanation would be crucial, as stated by P8 and P11.

Solution Architect P3 in the automotive sector focuses on avoiding bias and understanding black box models. AI Solution Architect P5 aims to make complex systems more interpretable, while AI Ethics Responsible P9 aims at balancing ethical considerations with usability. AI Safety Expert P10 integrates trust, safety, and design into development. Data engineer P13, eventually, emphasizes user-friendly decoding of ML predictions, while Data Manager P14 stresses the importance of trust and transparency in data explanation.

We compared the definitions provided by our practitioners with the ones we found in literature. For that, we consulted five sources providing such a definition, summarized in Table 3. These definitions are generally rather generic and do not address a specific domain, stakeholder group, or specific requirements. Our practitioners, in contrast, define explainability more specifically based on their job roles, the systems they work on, or the targeted stakeholders. Based on our analysis, we identified two critical factors that should be considered when defining the term “explainability”: a) stakeholders and b) system domains.

4.2 What Practices Do ML Practitioners Employ to Evaluate the Necessity of Explainability, and Which Practices Do they Apply to Address it?(RQ2)

Practitioners face diverse requests for explainability across various channels and employ a range of practices to meet these demands. In Table 4, we categorized these requests and the corresponding practices into distinct factors. In the following, we summarize the interviewee’s statements for each request factor.

Fig. 1
figure 1

Influence of participants’ roles on aspects of explainability definition

Table 3 Explainability in AI literature (Habiba et al. 2022)
Table 4 Requirements and practices for explainability mentioned by participants

User-Centric Factors

When providing or implementing explainability, P1, P2, and P4 primarily consider the perspective of the client or end-user. If the client expresses a desire for explainability, it is essential to ensure that their specific requirements for explainability are met. This involves clarifying their expectations and understanding what they consider to be explainable. Additionally, feedback or requests for clarification may originate from upper-level management or stakeholders. These inputs should be considered when determining the origin of the requirement. P1 describes several practices to enhance explainability: “Firstly, I ensure that each stage of my work is well-justified, clearly linking features in model development. I use feature engineering, like Principal Component Analysis (PCA), to evaluate feature contributions, identifying and clarifying significant ones in relation to the model’s processes.” Where P2 stated that “the approach to explainability practices can vary depending on the project and the type of data involved.” Further, he added there are no universally defined best practices, as it often requires a case-by-case decision based on feasibility and usability. Overall, we have seen that user-centric practices for explainability depend on the specific project requirements and can involve combining unexplainable AI-based systems with supplementary statistical explanations after the decision has been made.

Ethical and Safety Considerations

P3, P5, and P6 emphasized that ethics and safety drive the need for explainability requirements. P5 added that a “need for explainability is assessed based on factors such as high-risk systems impacting human beings and ethical considerations.” Further, P6 stated that “It is ethical, to make sure that your system is unbiased, especially when it comes to end users.”. This involves ensuring unbiased systems, particularly in user interactions, where organizations building or providing AI services bear the responsibility for model explainability, eliminating biases, and addressing privacy concerns. P3 did not mention a specific practice for addressing explainability requirements in the context of ethical and safety considerations, while P5 and P6 described it as more of a translation process.

Legal and Regulatory Requirements

P8, P10, P11, and P13 stated in their current scenario that explainability is a legal and regulatory requirement. To address such explainability requirements, P8 and P11 mentioned the use of various tools, including figures, mathematics, and statistics to provide understandable explanations, e.g., SHAP values (Lundberg and Lee 2017). In addition to these tools, P13 mentioned using LIME (Ribeiro et al. 2016), plots, and rule-based explanations aid to address the need for explainability. However, all of them stated that stakeholders who are unfamiliar with these tools may still struggle to grasp the explanations fully. P10 stated “It depends on the scope of your explainability of this product. Do you want to share everything with the end-user or there are knowledge gaps?”

Client and Business Perspective

P4, P12, and P14 explained that they often receive requests for explainability from their customers, primarily to assist in making informed business decisions. To address such needs, P12 mentioned plots and rule-based explanations, including what if explanations, whereas P4 and P14 did not mention any specific practices.

Risk Management

P5 and P10 underscored that the risk involved in a system leads to the need for explainability. P10 emphasized that providing visualizations and using pilot tests with explainability can help users to comprehend the system.

Data Scientist and Technical Considerations

P6 and P13 listed internal standards and data scientists’ needs as a requirement for explainability. To fulfill such needs, they mostly rely on SHAP values (Lundberg and Lee 2017). Further, they emphasized on incorporating business-oriented explanations to non-technical stakeholders, making it easier to convey model predictions. For instance, P6 stated custom dashboards are created to display model insights in a more understandable manner for different audiences.

Furthermore, we notice a link between roles (Table 1) and the practices that are used for explainability (Table 4). A trending theme is how AI solution architects and safety engineers are predominantly associated with factors such as risk management, and ethical and safety considerations. Such practitioners use visualization and natural language in their explanations. However, data scientists tailor their explanations to the audience. For legal and technical stakeholders, they may rely on techniques like LIME and Shapley values. However, for end users, there are no such standard methods for explanations. This leaves a gap in addressing the explainability needs at the end-user level. We also analyzed the practices explicitly mentioned by practitioners to address explainability. Our findings indicate that practitioners P8 and P11 referred to the use of Shapley values and feature importance. Practitioners P12 and P13 highlighted rule-based explanations, plots, and decision trees, while P5 mentioned visualization techniques as their approach to explainability. Additionally, our analysis of existing literature on XAI practices (Dwivedi et al. 2023) reveals that only a limited number of techniques are being applied in practice. This highlights a significant gap between the techniques discussed in the literature and their actual adoption by practitioners.

In summary, our practitioners employ a variety of practices to address explainability, tailoring their approaches to the demands stemming from such as user-centric factors, ethical and safety considerations, legal and regulatory mandates, client and business perspectives, risk management, and data scientist and technical requirements. The adoption of these practices is influenced by the intended audience, “to whom it should be explainable”. These practices were regarded essential for ensuring transparency and facilitating informed decision-making in AI-based systems across various domains.

4.3 Which Explainability Challenges Do ML Practitioners Experience? (RQ3)

In our exploration of the third research question, we sought to identify the challenges encountered by ML practitioners when explaining their systems. The insights shared by participants highlight a spectrum of challenges, each shedding light on the intricacies of achieving explainability in AI-based systems, as illustrated in Fig. 2. In the following, we summarize the interviewee’s statements for each challenge category.

Communication with Non-Technical Stakeholders

A recurring challenge faced by ten participants centers on effective communication with non-technical stakeholders. P1, P4, and P6 emphasized that effectively conveying complex technical concepts and model explanations to non-technical stakeholders, such as business teams, regulators, or customers, is challenging. P7 stated “you need to understand users, their domain knowledge, their background, and how much they are willing to take from you as an explanation. The knowledge gap is the biggest gripe in the industry.” P12 confirmed that “there is obviously a knowledge gap that is hard to overcome.” P10 explained that challenges faced in explaining AI components involve the difficulty in conveying the intricacies of the algorithms to end-users. Further, he stated while it would be possible to define performance metrics such as false positives and negatives, explaining the inner workings of the AI-based system would be complex and could be overwhelming for users. Providing graphs and visualizations could offer only some level of explanation. Instead, he suggested defining robustness and performance metrics that users can grasp. Specific matrices, such as the AI model weight matrix, may be utilized to provide additional insights, and following research in this area can assist in addressing these challenges during the requirement engineering process. Furthermore, P11, P13, and P14 highlighted the challenge of translating technical concepts related to model operations, algorithms, and data into understandable explanations for non-technical users.

Fig. 2
figure 2

Explainability challenges as mentioned by interview participants

Lack of Standardized Approaches

A prevalent challenge, articulated by seven participants, pertains to the absence of standardized approaches for explainability. They stated that the absence of universally defined best practices for explainability makes it necessary to decide on a case-by-case basis, leading to varying approaches across projects. For example, P6 stated that “the most significant challenge [...] is the absence of readily available tools or solutions. It often requires custom implementations, as there isn’t a pre-existing solution.” Similarly, P11 stated that “different approaches, like using SHAP values, have limitations and may not provide full or comprehensive explanations. Some approaches may not cover all corner cases, depending on the specific model being used.” Furthermore, P13 mentioned that “clear requirements for extending the explanation features and determining the target audience for explanations is challenging.” Overall, the seven participants highlighted that due to a lack of standardized approaches, determining and fulfilling the explainability need for end-users is challenging, often leading to customized practices.

Understanding Black-Box Mode

The third challenge highlighted by six participants centers on the difficulty of comprehending and explaining the internal workings of complex AI models. These participants described it as a challenge to answer questions about model predictions beyond training data. Furthermore, attempting to comprehend the inner workings of the trained model to enhance explainability poses a significant obstacle, given the inherent opaqueness of such models. P14 emphasized that the black box nature of the model hinders meaningful explanation, stating “in cases where the model is considered a ’black box’, meaning the internal workings are not easily interpretable, generating meaningful explanations becomes more challenging.”

Balancing Explainability & Other Quality Attributes

Striking the right balance between model accuracy and providing simple, understandable explanations was also described as challenging. Five participants mentioned that highly accurate models may be less interpretable, while simpler models may sacrifice accuracy. P12 and P14 highlighted that improving algorithm accuracy while maintaining explainability is challenging. P7 described it as follows: “You have to find a good compromise on how much information you are going to provide to the stakeholders and how much not.” P8 and P10 highlighted the trade-off between model performance and explainability because the more complex models are less explainable, which leads to scale-back due to a lack of explainability. Moreover, they highlighted when models become more complex, they tend to be less explainable, and this lack of explainability can result in a reduction in their usage or adoption.

Bridging the Trust Gap

Building trust with stakeholders and end-users was highlighted by five participants as a pivotal challenge. Explaining how decisions are made is crucial for building trust. P1 added that “the challenge lies in effectively communicating the explainability of the model to different audiences, addressing their specific concerns, and building trust in the reliability and validity of the AI-based system.” Further, P4 and P5 mentioned that convincing end-users how these results are generated is a challenging part.

Resource Constraints

P8, P12, and P14 highlighted, that incorporating explainability within resource constraints can be a challenge. For example, providing explainability in large-scale systems with high data volumes and real-time processing requirements can pose challenges in terms of computational resources, and performance was seen as challenging, especially when there were extensive requests for explanations or limitations in the implementation effort.

Data Quality & Adaptation for Sustainable Explainability

The importance of data quality in achieving explainability was emphasized by P13 and P14. Further, these participants stressed that adapting explanations to cope with changing data distributions and ensuring relevance over time is challenging. P5 also mentioned that it is hard to answer any explanation request beyond the data on which the model has been trained.

Safety and Compliance

Ensuring that explainability meets safety and compliance requirements can be challenging, especially in safety-critical applications. P10 and P12 emphasized ensuring compliance with regulations while providing explainability, addressing concerns about the sensitivity and confidentiality of the algorithm, and dealing with the potential risks and liabilities associated with providing explanations is challenging.

In conclusion, we observed that developing standardized approaches for explainability and finding effective ways to communicate technical concepts to non-technical users is crucial to overcoming these challenges. Additionally, transparency and reliability concerns can be addressed through an improved understanding of the internal workings of complex models. Building trust with stakeholders and end-users by effectively communicating the explainability of the model is essential. Furthermore, the challenges encountered by ML practitioners in achieving explainability underscore the importance of adopting an RE perspective. Clear and comprehensive communication with non-technical stakeholders is essential, necessitating the identification of user-friendly ways to convey technical concepts. The absence of standardized approaches for explainability calls for the development of clear requirements that guide custom implementations based on specific project needs.

4.4 Which Trade-Offs Between Explainability and other Quality Attributes Do Practitioners Consider? (RQ4)

When investigating the trade-offs that ML practitioners consider between explainability and other quality attributes in AI-based systems, it becomes evident that these trade-offs are multifaceted and depend on various factors. In the fourth part of our interviews, we sheds light on the non-functional requirements and constraints associated with achieving explainability in AI-based systems, as shown in Table 5.

Table 5 Quality trade-offs regarding explainability as mentioned by interview participants

Security and privacy considerations emerged as a critical trade-off in the context of explainability. Nine participants highlighted the delicate balance between providing detailed explanations and safeguarding system security and user privacy. For instance, P2 elaborated on a car insurance fraud detection case. Providing extensive explanations for why a case is deemed fraudulent could potentially expose the system’s inner workings, enabling fraudsters to exploit it. This trade-off highlights the difficulty of balancing the need to provide explanations with the danger of giving more power to harmful individuals. However, he acknowledged that some level of explanation might still be necessary to address user concerns and potential legal implications. P2 further added that “one potential threat may be users exploiting the provided explanations to manipulate the system’s decision-making process.” Furthermore, P5 mentioned that “the security of the overall system is a concern when too much transparency is provided. In AI-based systems, transparency could mean access to training code, algorithms, and data” which would facilitate adversarial attacks on the trained model.

Legal requirements, privacy concerns, and the need for transparency between companies and end-users were also mentioned as factors influencing trade-offs related to explainability. Nine participants emphasized that it is crucial to communicate product limitations and worst-case scenarios to users to establish transparency and mitigate potential legal liabilities.

Seven participants emphasized that the pursuit of enhanced explainability often necessitates a compromise in accuracy. P7 said in this regard that “we have to scale down to make it explainable.” Similarly, P8 stated that “better explainability leads to poor results.” Moreover, he added that complex AI models tend to offer higher accuracy but may be less interpretable. Furthermore, improving model explainability can sometimes compromise performance, as simplification may reduce complexity needed for optimal results. This trade-off implies that in their efforts to make AI-based systems more interpretable, practitioners may have to employ simpler models or interpretable algorithms, which may result in a reduction in overall model accuracy. It was emphasized by both P7 and P8 that complex AI models, while delivering high accuracy and state-of-the-art performance in diverse tasks, tend to obscure the decision-making process, leading to a lack of transparency.

Moreover, another six participants pointed out that incorporating explainability into AI-based systems often requires sacrificing performance. This trade-off manifests in the need to limit or simplify models, potentially impacting prediction accuracy. Additionally, increased computational time and resource usage can lead to slower system performance and higher execution costs. The development process may also demand more effort and resources, further contributing to increased costs.

Lastly, two participants highlighted that the trade-offs between explainability and other quality attributes are context-dependent, varying based on the specific application area and its unique requirements. For sensitive applications involving access to private data, prioritizing explainability, even at the expense of reduced accuracy, may be essential to maintaining transparency and user trust. Conversely, for tasks such as internet search and retrieval, accuracy takes precedence, and the explainability aspect is less important because the focus is on achieving accurate and efficient results.

In conclusion, ML practitioners must navigate a complex landscape of trade-offs when considering explainability in AI-based systems. These trade-offs encompass accuracy, security, privacy, transparency, performance, and context-specific considerations. Balancing these factors requires a nuanced approach that aligns with the goals and requirements of the particular application.

5 Discussion

Our interview results revealed critical insights into the multifaceted realm of XAI practice, prompting us to contemplate the relevance of the RE perspective in tackling these challenges.

In our exploration of RQ1, we uncovered diverse perspectives among practitioners regarding the concept of explainability. This diversity highlights the absence of a unified definition, as already identified in our prior work (Habiba et al. 2022). The distinct categories we identified for explainability underscore the complexity of the term. Moreover, achieving a common understanding of explainability while recognizing and accommodating the diversity of requirements and contexts is considered essential for effectively addressing the challenges in AI explainability.

Furthermore, our investigation delved deeper into the practices practitioners employ to capture the requirements for explainability and how they put them into practice, as addressed in RQ2. Our findings reveal that explainability often arises from legal requirements or system performance failures. This demand for explainability emanates from various stakeholders in diverse contexts, suggesting that requirements engineering practices could adapt to accommodate these varied needs. A comprehensive process for capturing explainability requirements, however, is currently lacking. Practitioners typically rely on existing tools to clarify system behavior to fellow technical personnel, but bridging the knowledge gap to convey results to end-users poses challenges. Additionally, addressing emerging explainability requirements post-deployment presents difficulties, indicating the need for investigation to establish methods for specifying these requirements before system deployment.

Subsequently, in RQ3, we identified several challenges faced by our participants, when implementing explainability in AI-based systems. To address these challenges from an RE perspective, researchers can explore strategies and tools for improving the communication of complex technical concepts and model explanations to non-technical stakeholders. This may involve the development of user-friendly visualization techniques and interfaces to enhance understanding for business teams, regulators, or customers. Trust and reliability issues can be mitigated by establishing requirements and standards for incorporating trust-building mechanisms into system designs, including transparency, accountability, and trustworthiness indicators. Additionally, the development of reference models and frameworks can ensure consistency across projects. Further research should aim at improving the transparency of black-box models, develop hybrid models that balance accuracy and interpretability, and implement scalable explainability solutions. Adaptive systems and context-aware explanations can enhance data quality and sustainability, while compliance frameworks and risk management protocols are essential for ensuring safety and regulatory adherence.

Finally, in RQ4, we aimed to investigate the interaction between explainability and other quality attributes. While adding explainability, ML practitioners are required to consider trade-offs to other quality attributes. This indicates several important considerations for the RE process.

Requirements engineers should be aware that enhancing explainability might come at the cost of accuracy, performance, and potential security risks. Further, it is important to understand the specific application context and user needs. Different applications may prioritize either explainability or accuracy based on the sensitivity of the tasks and the users’ requirements. Moreover, requirements engineers need to engage with various stakeholders, including ML practitioners, domain experts, end-users, and legal or compliance teams to identify the optimal level of explainability while considering the trade-offs with other quality attributes.

6 Threats to Validity

Throughout our study, we employed a systematic approach to strengthen the credibility and integrity of our research. In the following, we point out the main threats to validity and our corresponding mitigation strategies.

Internal and Construct Validity

The phrasing chosen for our explanations and questions may introduce bias and misinterpretations, especially if participants understand concepts differently. To mitigate this, we initiated our research by conducting a series of pilot interviews within the academic community. They helped in refining our interview questions to accurately study the fundamental concepts under investigation.

Furthermore, participants may not have consequently revealed their true opinions. We consider this a low risk for our study, as the concepts were neither very sensitive nor required to reveal business-critical information. This was furthermore strengthened by guaranteeing confidentiality and anonymity. Additionally, it’s important to note that a subset of our participants lacked a software engineering background, and although we provided them with a presentation, there is still a potential limitation in their ability to grasp the complete picture of the study. Specifically, we aimed to establish a shared understanding of the concept of explainability among practitioners without influencing their responses. To achieve this, we carefully formulated our questions to encourage participants to define explainability from their own perspective, in the context of the specific systems they are working on, and within their unique working environments. This approach was intended to ensure that their answers were based on their personal experiences and interpretations, rather than being influenced by our presentation.

Conclusion Validity

To limit observer bias and interpretation bias, we implemented a meticulous coding process for the analysis of interview transcripts. This process was overseen by the first author, chosen for her specialized technical expertise and profound understanding of the research subject. Furthermore, all authors participated in rigorous reviews and validation of the coding outcomes. We are confident having identified the essential underlying causal relationships and derived meaningful conclusions.

External Validity

As we interviewed 14 professionals in total, representativeness of the collected data may be a potential issue. To mitigate the issue, we conducted a rigorous screening of our participant pool before and after the interviews. Despite efforts to recruit a diverse international participant pool, we primarily attracted respondents from German companies. To maintain sample diversity, we ensured representation across various companies, projects, and domains within Germany, while including a few participants located outside Germany. We identified individuals with a minimum of two years of experience in AI/ML projects who were currently engaged in or had recent involvement in such projects. Moreover, we decided to exclude two participants after the interviews were done. These deliberate steps were taken to cultivate a participant sample that is more representative and relevant. Furthermore, it’s crucial to acknowledge that participants bring their domain knowledge and experiences into the study, which could potentially impact their responses to the questions. Given that AI, particularly XAI, is governed by regulatory frameworks, few of our findings are specific to Germany’s cultural and legal context. The EU AI Act standardizes AI regulations across all European states, influencing how AI practices are implemented and monitored. Future research should include guidelines on legal and cultural aspects to define the study’s scope and enhance the generalizability of findings across different regulatory environments, aligning with the evolving AI governance landscape in the EU and beyond.

7 Conclusion

Our study highlighted practitioner perspectives on explainability in AI-based systems and the challenges they face in implementing it effectively. We identified four categories of explainability that were seen as necessary for making AI-based systems interpretable and transparent. Furthermore, our findings also revealed that there are no standard practices to address end-user needs for explainability, which poses a significant concern. The reasons for pursuing explainability vary among practitioners, with legal requirements being a prominent driver. Emerging regulatory and legal developments emphasized that AI-based systems must now incorporate certain core functionalities. For instance, the General Data Protection Regulation (GDPR) (European Parliament 2016) mandates transparency, accountability, and the “right to explanation” for decisions supported by AI. Additionally, the need for explainability arises when AI-based systems fail to provide desired results to stakeholders.

The challenges ML practitioners face in implementing explainability are many and varied. While participants often rely on tools such as SHapley values (Lundberg and Lee 2017) and LIME (Ribeiro et al. 2016) to explain their systems to technical personnel, communicating complex technical concepts and model explanations to non-technical stakeholders is a primary hurdle. Building trust and convincing end-users about the reliability of AI-supported decisions is another critical challenge, while demands for regulatory compliance in safety-critical applications further complicate the matter. A lack of standardized approaches for explainability and finding the right balance between other non-functional requirements and explainability were found to be prevalent issues too.

Our study highlights the need for further research and development to effectively address these challenges and enhance the explainability of AI-based systems across various domains. Doing so will ensure that AI-based systems meet the explainability requirements of users. Moreover, requirements engineers must be well-informed about the trade-offs between explainability and other quality attributes. They must consider the application context and domain, as well as the specific user needs to balance explainability and accuracy. Engaging with various stakeholders is crucial to identifying the optimal level of explainability while considering trade-offs with other quality attributes. In conclusion, improving the explainability of AI-based systems requires collaboration, further research, and careful consideration of trade-offs to ensure that these systems are transparent, trustworthy, and effectively meet the needs of all stakeholders involved.