WO2024098682A1

WO2024098682A1 - Xai model evaluation method and apparatus, device, and medium

Info

Publication number: WO2024098682A1
Application number: PCT/CN2023/091751
Authority: WO
Inventors: 杨一帆; 汪科; 谭顺予; 范豪钧; 夏正勋
Original assignee: 南京星环智能科技有限公司; 星环信息科技(上海)股份有限公司
Priority date: 2022-11-10
Filing date: 2023-04-28
Publication date: 2024-05-16
Also published as: CN115905558A

Abstract

The present application discloses an XAI model evaluation method and apparatus, a device and a storage medium. The method comprises: obtaining an explanation result pair corresponding to an XAI model to be evaluated; obtaining a node sequence pair matching the explanation result pair in a knowledge graph; according to the node sequence pair and the knowledge graph, determining a sub-graph set corresponding to each node sequence in the node sequence pair; and according to the sub-graph set corresponding to each node sequence, determining a corresponding score pair corresponding to the explanation result pair.

Description

XAI model evaluation method, device, equipment and medium

This application claims priority to the Chinese patent application filed with the China Patent Office on November 10, 2022, with application number 202211404037.1, the entire contents of which are incorporated by reference into this application.

Technical Field

Embodiments of the present application relate to the field of artificial intelligence technology, for example, to an Explainable Artificial Intelligence (XAI) model evaluation method, device, equipment and medium.

Background technique

With the large-scale application of artificial intelligence (AI), trusted AI technology has received more attention. XAI is an important branch of trusted AI technology. Similar to traditional machine learning models, XAI algorithms and models also need to be optimized and selected from a group of models. However, the traditional machine learning model selection method is not applicable to XAI models, because the evaluation indicators of XAI models not only include the evaluation of model accuracy, but also include the evaluation of the expression form and comprehensibility of the interpretation results. There are also multiple interpretation methods to evaluate the ability to interpret the same result. The evaluation content often involves the cognitive ability of the evaluator, so it is difficult to form a unified standard or universal method. In the research focusing on evaluating user effects, most studies focus on subjective measurement, that is, people are required to manually score the interpretation of the XAI model based on some given indicators. Some studies have measured both the subjective ease of use of the interpretation and the ability of participants to make correct inferences based on the interpretation, so that people can distinguish between the behavioral effect and the self-perception effect of the interpretation, which to a certain extent emphasizes the value of objective measurement.

Therefore, the XAI evaluation method in related technologies has major defects: on the one hand, there is a lack of standard quantitative measurement methods, and manual evaluation consumes a lot of resources and has low accuracy and efficiency; on the other hand, users' evaluation of the model is easily affected by many factors (such as subjective factors), and the validity of the evaluation data cannot be guaranteed.

Summary of the invention

The embodiments of the present application provide an XAI model evaluation method, apparatus, device and storage medium, which solve the problems of traditional XAI evaluation and selection methods being time-consuming and labor-intensive, lengthy processes, and low evaluation effectiveness.

The present application provides an XAI model evaluation method, including: obtaining an explanation result pair corresponding to the XAI model to be evaluated; obtaining a node sequence pair matching the explanation result pair in a knowledge graph; determining a subgraph set corresponding to each node sequence in the node sequence pair based on the node sequence pair and the knowledge graph; and determining a scoring pair corresponding to the explanation result pair based on the subgraph set corresponding to each node sequence.

The present application provides an XAI model evaluation device, which includes: an explanation result pair acquisition module, configured to obtain an explanation result pair corresponding to the XAI model to be evaluated; a node sequence pair acquisition module, configured to obtain a node sequence pair matching the explanation result pair in a knowledge graph; a subgraph set determination module, configured to determine a subgraph set corresponding to each node sequence in the node sequence pair based on the node sequence pair and the knowledge graph; and a scoring pair determination module, configured to determine a scoring pair corresponding to the explanation result pair based on the subgraph set corresponding to each node sequence.

The present application provides an electronic device, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores a computer program executable by the at least one processor, and the computer program is executed by the at least one processor so that the at least one processor can execute the XAI model evaluation method described in any embodiment of the present application.

The present application provides a computer-readable storage medium, which stores computer instructions, and the computer instructions are used to enable a processor to implement the XAI model evaluation method described in any embodiment of the present application when executed.

BRIEF DESCRIPTION OF THE DRAWINGS

The following is a brief introduction to the drawings required for use in the embodiments.

FIG1 is a flow chart of an XAI model evaluation method in Embodiment 1 of the present application;

FIG2 is a schematic diagram of a sub-atlas set G _a [] obtained by a matching result in Example 1 of the present application;

FIG3 is a schematic diagram of a sub-graph set G _b [] obtained from a matching result in Example 1 of the present application;

FIG4 is a schematic diagram of the structure of an XAI model evaluation device in Embodiment 2 of the present application;

FIG5 is a schematic diagram of the structure of an electronic device in Embodiment 3 of the present application.

Detailed ways

In order to enable those skilled in the art to understand the solution of the present application, the technical solution in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are only embodiments of a part of the present application.

It should be noted that the terms "first", "second", etc. in the specification and claims of the present application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the data used in this way can be interchanged where appropriate, so that the embodiments of the present application described herein can be implemented in an order other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions. For example, a process, method, system, product or device that includes a series of steps or units is not necessarily limited to those steps or units that are clearly listed. Rather, the processes, methods, products, or apparatus may include other steps or elements not expressly listed or inherent to such processes, methods, products, or apparatus.

Embodiment 1

Figure 1 is a flow chart of an XAI model evaluation method in Example 1 of the present application. This embodiment is applicable to the situation of comprehensive ranking evaluation of the interpretable performance of the XAI model in multiple random comparison tests. The method can be performed by the XAI model evaluation device in the embodiment of the present application, which can be implemented in software and/or hardware. As shown in Figure 1, the method includes the following steps.

S110, obtaining an explanation result pair corresponding to the XAI model to be evaluated.

The explanation result pairs corresponding to the XAI model to be evaluated may be explanation result pairs of the same XAI model under different training batches, or may be explanation result pairs of different XAI models.

The method for obtaining the explanation result pair corresponding to the XAI model to be evaluated may be: randomly selecting two explanation results to form an explanation result pair according to the explanation results of the same XAI model in different training batches. The method for obtaining the explanation result pair corresponding to the XAI model to be evaluated may also be: randomly selecting two explanation results to form an explanation result pair according to the explanation results of different XAI models.

The atomic representation of the interpretation result can be E = [X ₁ ...X _n →Y], referred to as the atomic interpretation result, where X is the cause factor, Y is the result factor, and n is the number of cause factors. Therefore, the interpretation result pair can be expressed as E _pair = {E _a [], E _b []}, where E _a [] and E _b [] are atomic interpretation result sequences with subscripts a and b, respectively. It should be noted that based on the definition and combination of atomic interpretation results and atomic interpretation result sequences, a variety of complex interpretations can be flexibly expressed, such as one effect and one cause, one effect and multiple causes, multiple effects and one cause, and multiple effects and multiple causes.

S120, obtaining a node sequence pair in the knowledge graph that matches the explanation result pair.

Knowledge Graph can be referred to as KG for short. The matching method can be based on the nodes in KG or the attributes of KG, which is not limited here. The node sequence pair is the corresponding result obtained after the interpretation result pair is matched through KG.

The method of obtaining the node sequence pair matching the explanation result pair in the knowledge graph can be: obtaining the explanation result pair, matching the explanation result pair with the node sequence and attribute sequence of the KG, and obtaining the node sequence pair matching the explanation result pair through KG semantic retrieval matching processing.

The nodes of KG can be denoted as V, the edges as E, the node attributes as A, the node sequence of KG is V[], and the attribute sequence is A[]. First, if the factors in E _a [] and E _b [] are represented without semantic information, then according to the metadata of the data, use the field name to replace the factors so that the sequences of E _a [] and E _b [] are sequences with semantic information E′ _a [] and E′ _b [], where the metadata of the data is the basic information in the data processing system and the data set management system. Then call the semantic retrieval interface of KG to query and match the semantic factors of E′ _a [] and E′ _b [], and obtain the node sequence pair corresponding to E _pair in KG as V _pair = {V _a [], V _b []}. It should be noted that in order to improve the matching efficiency, the method is adopted. A reverse traversal search algorithm from the final result to the original cause, and its matching implementation algorithm is as follows: first, call the KG semantic retrieval interface to match the i _n factor in the KG, and the matched node in the KG is V _n ; second, starting from V _n , match the i _n-1 factor within the range of path depth D. If the match is successful, repeat this substep starting from i _n-1. If the match fails, match the i _{n-2 factor within the depth of (the number of unsuccessful consecutive matches + 1)*D, that is, within the depth of 2} *D. This rule is used to complete the factor matching of all result sequences I[]; third, output the matching result V[]={...V _i ...} of the result sequence I[].

It should be noted that in the process of matching with the interpretation results, considering that the structure and volume of KG may vary in actual applications, the search depth can be adjusted according to actual needs when executing the reverse traversal search algorithm for node matching.

S130, determining a subgraph set corresponding to each node sequence in the node sequence pair according to the node sequence pair and the knowledge graph.

A subgraph set can be one or more subgraphs, and is the path corresponding to a node sequence in the KG. If a subgraph set has multiple subgraphs, it means that the path is broken.

The method of determining the subgraph set corresponding to each node sequence according to the node sequence pair and the knowledge graph is as follows: obtaining the node sequence pair, and obtaining the subgraph set of each node sequence through the path search function of the knowledge graph.

The node sequence pair is V _pair = {V _a [], V _b []}. Through the path search function of KG, the corresponding path of V _pair is obtained, recorded as G _pair = {G _a [], G _b []}, _Ga [] is the path corresponding to _Va [], which is one or more subgraphs, that is, the subgraph set corresponding to Va _[ ] is determined to be _Ga []. Similarly, G _b [] is the path corresponding to V _b [], that is, the subgraph set corresponding to V _b [] is determined to be G _b [].

S140, determining a score pair corresponding to the explanation result pair according to the subgraph set corresponding to each node sequence.

The score pair corresponding to the explanation result pair is a total score pair calculated based on the explanation coherence score, explanation complexity score and explanation credibility score of the subgraph set corresponding to each node sequence.

The method of determining the score pair corresponding to the explanation result pair according to the subgraph set corresponding to each node sequence can be: obtaining the subgraph set corresponding to each node sequence, determining the score of the coherence of the explanation, the complexity of the explanation, and the credibility of the explanation of each node sequence according to the subgraph set corresponding to each node sequence, and based on the score of the coherence of the explanation, the complexity of the explanation, and the credibility of the explanation of each node sequence, calculating the total score of the subgraph set corresponding to each node sequence, and then determining the score pair corresponding to the explanation result pair.

Optionally, a score pair corresponding to an explanation result pair is determined according to a subgraph set corresponding to each node sequence, including: determining an explanation coherence score, an explanation complexity score and an explanation credibility score corresponding to each node sequence according to the subgraph set corresponding to each node sequence; determining a target score corresponding to each node sequence according to the explanation coherence score, the explanation complexity score and the explanation credibility score corresponding to each node sequence; and determining a score pair corresponding to an explanation result pair according to the target score corresponding to each node sequence.

The target score is the total score calculated based on the explanation coherence score, explanation complexity score, and explanation credibility score corresponding to each node sequence.

The method of determining the explanation coherence score, explanation complexity score and explanation credibility score corresponding to each node sequence according to the subgraph set corresponding to each node sequence can be: obtaining the subgraph set corresponding to each node sequence, obtaining the explanation coherence score by measuring the number of subgraphs in the subgraph set corresponding to each node sequence, obtaining the explanation complexity score by measuring the number of nodes and edges of each subgraph in the subgraph set corresponding to each node sequence, and obtaining the explanation credibility score by calculating the sum of the edge weights of all subgraphs in the subgraph set corresponding to each node sequence.

The method of determining the target score corresponding to each node sequence according to the explanation coherence score, explanation complexity score and explanation credibility score corresponding to each node sequence can be: based on the explanation coherence score, explanation complexity score and explanation credibility score corresponding to each node sequence, the total score of each node sequence is calculated.

The total score of each node sequence can be calculated by taking _Ga [] as an example, determining the explanation coherence score S _split according to the number of subgraphs in _Ga [], determining the explanation complexity score S _{complexity_a_total} corresponding to _Ga [] according to the explanation complexity of _Ga [], determining the explanation credibility score S _credit according to the multiple edge weights of the subgraph _{Ga_target} containing the result factor, and calculating the total score as follows:

standard() is a data standardization function and will not be described here.

The method of determining the score pair corresponding to the interpretation result pair according to the target score corresponding to each node sequence can be: determining the total score corresponding to each node sequence according to the interpretation coherence score, the interpretation complexity score and the interpretation credibility score corresponding to each node sequence, and obtaining the total score pair according to the total score corresponding to each node sequence, that is, determining the score pair corresponding to the interpretation result pair.

It should be noted that after obtaining the scoring pair, the result set Rsc[i] is used to store the scoring results. The representation of one comparison result can be an ordered positive integer pair (a, b), where a and b are the labels of the selected models to be evaluated, and i is the number of the current comparison round. All comparison results are stored in the result set Rsc[], whose size is the hyperparameter N, that is, the total number of comparison rounds. Repeat steps S110 to S140 until the number of comparison rounds reaches the hyperparameter N.

Optionally, determining the explanation coherence score corresponding to each node sequence according to the subgraph set corresponding to each node sequence includes: obtaining the number of subgraphs in the subgraph set corresponding to each node sequence; and determining the explanation coherence score corresponding to each node sequence according to the number of subgraphs in the subgraph set.

The scoring strategy for explanation coherence is: for a logically coherent explanation result, its reasoning process should be coherent, and the corresponding nodes of its factors in the KG form a connected subgraph. The more disconnected subgraphs there are, the less coherent the explanation result is.

The method for obtaining the number of subgraphs in the subgraph set corresponding to each node sequence may be: obtaining the subgraph set corresponding to each node sequence, and then obtaining the number of subgraphs in the subgraph set.

The method of determining the explanation coherence score corresponding to each node sequence according to the number of subgraphs in the subgraph set may be: obtaining the number of subgraphs in the subgraph set, and the explanation coherence score corresponding to each node sequence is the inverse of the number of subgraphs. For example, taking _Ga [] as an example, the number of subgraphs of _Ga [] is | _Ga []|, and the explanation coherence score is:

That is, the interpretation coherence score is the inverse of the number of subgraphs.

Optionally, the explanation complexity score corresponding to each node sequence is determined based on the subgraph set corresponding to each node sequence, including: determining the complexity of each subgraph based on the number of nodes and the number of edges in each subgraph in the subgraph set corresponding to each node sequence; and determining the sum of the complexities of all subgraphs in the subgraph set corresponding to each node sequence as the explanation complexity score corresponding to each node sequence.

The scoring strategy for explanation complexity is: the smaller the ratio of nodes to edges in a subgraph, the more effective, concise, and convincing the explanation is. An edge can be understood as the connection between two nodes in each subgraph.

The method for determining the complexity of each subgraph based on the number of nodes and the number of edges of each subgraph in the subgraph set corresponding to each node sequence can be: obtain each subgraph in the subgraph set corresponding to each node sequence, obtain the number of nodes and the number of edges of each subgraph based on each subgraph in the subgraph set, and obtain the complexity of each subgraph based on the number of nodes/number of edges of each subgraph.

The method of determining the sum of the complexity of all subgraphs in the subgraph set corresponding to each node sequence as the interpretation complexity score corresponding to each node sequence can be: obtaining the complexity of each subgraph in the subgraph set, and determining the sum of the complexity of all subgraphs in the subgraph set as the interpretation complexity score corresponding to each node sequence. For example, taking _Ga [] as an example, each subgraph in _Ga [] is represented by a triple Represents, where V is the node set, E is the edge set, is the edge weight set, and the triple corresponding to the i-th subgraph _{Ga_i} of _Ga [] is The measure of the complexity of this subgraph is:

|V _{a_i} | is the number of nodes in the subgraph _{Ga_i} , |E _{a_i} | is the number of edges in _{Ga_i} (edges shared by multiple paths are counted multiple times), and S _{complexity_a_i} is the complexity score of the subgraph _{Ga_i} . The total score of the subgraph set _Ga [] is obtained by adding the scores of multiple subgraphs, and the calculation method is:

S _{complexity_a_total} is the total score of the explanation complexity of G _a [].

Optionally, the explanation credibility score corresponding to each node sequence is determined based on the subgraph set corresponding to each node sequence, including: obtaining a target subgraph in the subgraph set corresponding to each node sequence, wherein the target subgraph is a subgraph including a result factor; and determining the sum of the weights of all edges of the target subgraph as the explanation credibility score corresponding to each node sequence.

The credibility scoring strategy is explained as follows: if the edge weight reflects the strength of node entity connection and logical association, then the sum of all edge weights of the subgraph can be simply used to represent the credibility of the subgraph. Among them, the edge weight in KG comes from the KG construction process, using the relationship weight in the Bayesian class structure construction method.

A target subgraph in a subgraph set corresponding to each node sequence is obtained, wherein the target subgraph is a subgraph including a result factor by: obtaining a subgraph set corresponding to each node sequence, and selecting a subgraph including a result factor according to the subgraph set.

The method of determining the sum of weights of all edges of the target subgraph as the interpretation credibility score corresponding to each node sequence may be: selecting a subgraph including the result factor from the subgraph set corresponding to each node sequence, calculating the sum of weights of all edges in the selected subgraph to determine the interpretation credibility score corresponding to each node sequence. For example, taking _Ga [] as an example, selecting a subgraph including the result factor from _Ga [], denoted as _{Ga_target} , and calculating the sum of weights of all edges in _{Ga_target} :
S _credit = ∑ w _{e_i}

w _{e_i} represents the weight of the i-th edge in the subgraph _{Ga_target} .

Optionally, the explanation credibility score corresponding to each node sequence is determined based on the subgraph set corresponding to each node sequence, including: obtaining a target subgraph in the subgraph set corresponding to each node sequence, wherein the target subgraph is a subgraph including a result factor; and determining the sum of the weights of the edges in the target subgraph connected to the result factor as the explanation credibility score corresponding to each node sequence.

The method of determining the sum of the weights of the edges connected to the result factor in the target subgraph as the explanation credibility score corresponding to each node sequence can be: selecting the edges connected to the result factor in the target subgraph, calculating the sum of the weights of all the edges connected to the result factor in the target subgraph, and determining the sum of the weights of all the edges connected to the result factor as the explanation credibility score corresponding to each node sequence.

It should be noted that when determining the score pairs corresponding to the interpretation results, the technical personnel can adjust the calculation method and combination method in the scoring strategy according to the different evaluation scenarios and the actual utility of the scoring.

Optionally, the explanation results corresponding to the XAI model to be evaluated include the explanation results of different XAI models; after determining the score pairs corresponding to the explanation results according to the subgraph set corresponding to each node sequence, it also includes: inputting the score pairs into the target model to obtain the ranking order of the explanation ability of each XAI model in the different XAI models, or, the explanation results corresponding to the XAI model to be evaluated include the explanation results of the same XAI model under different training batches; after determining the score pairs corresponding to the explanation results according to the subgraph set corresponding to each node sequence, it also includes: inputting the score pairs into the target model to obtain the ranking order of the explanation ability of each XAI model in the different XAI models. The ranking order of the explanation ability of each explanation result in the pair of explanation results is obtained.

The target model can be a model that takes the score pair as input and outputs the ranking result of the explanation ability of the XAI model to be evaluated. It should be noted that in order to improve accuracy and reduce errors, the explanation results are compared in a random manner, so the number of comparisons of different XAI models is not the same. The target model is established to obtain the ranking order based on the unbalanced comparison results.

The score pairs are input into the target model to obtain the ranking order of the explanatory power of each XAI model to be evaluated, or the ranking order of the explanatory power of each explanation result can be as follows: if the XAI models to be evaluated initially selected are two different XAI models, the score pairs of the explanation result pairs corresponding to the two different XAI models are input into the target model to obtain the ranking order of the explanatory power of each XAI model; if the XAI models to be evaluated initially selected are models of the same XAI model in different training batches, the score pairs of the explanation result pairs of the same XAI model in different training batches are input into the target model to obtain the ranking order of the explanatory power of each explanation result.

For example, if the input of the target model is the score pair stored in the result set Rsc[], the output is the ranking result of the explanatory power of different XAI models, recorded as {...α _i ...α _j ...}, where α _i is the ranking order of the explanatory power of the i-th XAI model, and its order in the ranking result is the ranking order of its explanatory power. The implementation method is: first, the comparison results stored in the result set array Rsc[] are converted into a matrix A _n×n storing the comparison results, and the position of a _ij in the matrix is (i, j), which means the number of times the i-th XAI model outperforms the j-th XAI model; secondly, the value of the ranking order α ⁱ of the explanatory power of each model is calculated, and the solution method is: after arbitrarily specifying an α _i as the explanatory power benchmark value 1, solve ArgMax(L), and the calculation expression of L is as follows:

_αj is the ranking order of the explanation ability of the jth XAI model.

It should be noted that when the weights are equal, for any j, the sum of the terms of this loss function over i is a constant minus the "rank sum test", which is a non-parametric test that achieves the highest statistical power and is proportional to the common area under the curve (AUC) indicator. Therefore, optimizing this loss function can be understood as optimizing the weighted sum of AUC. Therefore, it has strong scalability and practicality. After solving ArgMax(L), the value of the explanatory power of each model can be obtained and sorted, and the sorted models are stored as the model's explanatory power sequence {α _i ...α _j ...}, where α _i corresponds to the i-th explanatory model MODEL _i ranked first in the explanatory power ranking. Based on the above approach, the ranking order of the explanatory power of each XAI model, or the ranking order of the explanatory power of each explanation result, can be obtained.

It should be noted that technicians can adjust the storage method of the model comparison results according to actual needs, and do not have to be limited to storage in matrix form.

In an example, if the sample library of the recommendation business department of an Internet company H stores 10,000 The data of a user is Data[]. For a user, the data is represented as P = {Pur[], Foo[], rec}, where Pur[] is the set of past purchase records, represented as a string (such as "umbrella", "banana"); Foo[] is the set of records that have been browsed (but not purchased), represented as a string; rec is the product recommended to the user by the model based on the above two arrays, represented as a string. The department wants to add explanations to the original recommendation system to improve the user's acceptance of the recommended products. It has selected several XAI explanation models to be evaluated, MODEL ₁ , MODEL ₂ , and MODEL ₃ , to make reasonable explanations for their recommendation results. Based on the information of the products that the user has purchased and browsed, it infers the products that the user is most likely to buy at present. The path explanation of this reasoning process is evaluated, and the model with strong explanation ability is selected for use. The system administrator can set the comparison round number parameter N=20. First, a random algorithm is used to select two XAI models MODEL ₁ and MODEL ₂ to be compared (hereinafter referred to as a and b respectively). Then, a user's data P is randomly selected from the sample library Data[] and sent to a and b to generate the interpretation result. The atomic representation of the output interpretation result is E=[ _X1 ... _Xn →Y], where _Xi is the factor selected by the interpretation model to participate in the result interpretation. Therefore, the interpretation result _pair can be expressed as Epair={ _Ea [], _Eb []}, where _Ea [] and _Eb [] are atomic interpretation result sequences with subscripts a and b respectively.

If the data of user P is Pur _a [] = {"diapers", "Microeconomics", "baby bed"...}, Foo[] = {"SK-II", "kettle", "maternity clothes"...}, rec = "milk powder", the data of _{E a} [] is as follows:

The data of E _b [] are as follows:

After obtaining the interpretation result sequence pair, the interpretation result sequence pair is matched with KG. To improve the matching efficiency, take the matching of E _a [] as an example:

1. Call the Sophon KG semantic retrieval interface to match the factor corresponding to Y in the KG. The matching node in the KG is V _y .

2. Starting from _Vy , match the node _V6 corresponding to _X6 within the path depth D. If the match is successful, repeat this step starting from _V6 ; if the match fails, match the node corresponding to the _X5 factor within the depth of (number of unsuccessful consecutive matches + 1)*D, and assume that the subgraph where the previous node is located is not connected to the subgraph where the next node is located. Follow this rule until all node matches are completed.

The final output matching result: The node sequence pair corresponding to E _pair in KG is V _pair = {V _a [], V _b []}.

Through the path search function of Sophon KG, we can get the subgraph set pair G _pair = {G _a [],G _b []} corresponding to the node sequence pair V _pair = {V _a [],V _b []}. The specific matching result is:
_Ga []＝{{ _X1 , _X2 },{ _X3 , _X4 , _X5 , _X6 ,Y}}
G _b [] = {{X ₁ ,X ₂ },{X ₃ ,X ₄ },{X ₅ ,Y}}

FIG2 is a schematic diagram of a subgraph set _Ga [] obtained from a matching result in Example 1 of the present application. As shown in FIG2 , there are 2 subgraphs in _Ga [].

FIG3 is a schematic diagram of a subgraph set G _b [] obtained from a matching result in the first embodiment of the present application. As shown in FIG3 , there are 3 subgraphs in G _b [].

Score each subgraph set according to G _pair = {G _a [], G _b []}, taking _Ga [] in Figure 2 as an example:

according to Measure the interpretation coherence score of G _a [].

The calculated result is accurate to five decimal places and is used to measure the explanation complexity score of G _a [].

The subgraph _{Ga_2} [] in Figure 2 is the subgraph where the result factor Y is located. The weights of multiple edges in the subgraph have been shown, and S _credit = 7.623 is calculated to measure the explanation credibility score of _Ga [].

The total score is calculated based on the explanation coherence score, explanation complexity score, and explanation credibility score of _Ga []. If the standard() function uses the logistic function The calculation result is Similarly, Accurate to five decimal places, after comparison, MODEL ₁ outperforms MODEL ₂ in this round.

After repeating N = 20 times, a result array Rsc[] containing 20 comparison results is obtained. The data is converted into a 3×3 matrix A. The element ^aij in A represents the number of times the i-th model outperforms the j-th model in all comparison rounds. The data of A is as follows:

Note that α ₁ , α ₂ , and α ₃ are the specific values of the explanatory power of MODEL ₁ , MODEL ₂ , and MODEL ₃ respectively, and the explanatory power benchmark value α ₁ =1.00 is specified. By solving ArgMax(L), the calculation expression of α _i , L can be obtained as follows:

Using the optimization algorithm such as the quasi-Newton algorithm (Broyden Fletcher Goldfarb Shanno, BFGS) (not limited to), we can obtain: when L reaches the maximum value, α ₁ =1.00, α ₂ =0.89, α ₃ =0.66, and the explanatory power sequence of the three models is {α ₁ =1.00, α ₂ =0.89, α ₃ =0.66}, so the XAI model MODEL ₁ with the strongest explanatory power is selected and put into use. It should be noted that only three XAI models are compared in this example, and the number of XAI models to be evaluated is not limited in actual applications.

The technical solution of this embodiment obtains the explanation result pair corresponding to the XAI model to be evaluated; obtains the node sequence pair matching the explanation result pair in the knowledge graph; determines the subgraph set corresponding to each node sequence according to the node sequence pair and the knowledge graph; and determines the score pair corresponding to the explanation result pair according to the subgraph set corresponding to each node sequence. It solves the problems of traditional XAI evaluation and selection methods being time-consuming and labor-intensive, lengthy processes, and low evaluation effectiveness, and can improve the accuracy and efficiency of the evaluation and selection methods and ensure the effectiveness of the evaluation data.

Embodiment 2

4 is a schematic diagram of the structure of an XAI model evaluation device in Embodiment 2 of the present application. This embodiment is applicable to the situation of comprehensive ranking evaluation of the interpretability performance of the XAI model in multiple random comparison tests. The device can be implemented in software and/or hardware, and the device can be integrated in any device that provides the function of XAI model evaluation. As shown in FIG4 , the XAI model evaluation device includes: an explanation result pair acquisition module 210, a node sequence pair acquisition module 220, a subgraph set determination module 230, and a score pair determination module 240.

The explanation result pair acquisition module 210 is configured to obtain the explanation result pair corresponding to the XAI model to be evaluated; the node sequence pair acquisition module 220 is configured to obtain the node sequence pair matching the explanation result pair in the knowledge graph; the subgraph set determination module 230 is configured to determine the subgraph set corresponding to each node sequence based on the node sequence pair and the knowledge graph; the scoring pair determination module 240 is configured to determine the scoring pair corresponding to the explanation result pair based on the subgraph set corresponding to each node sequence.

Optionally, the scoring pair determination module is configured to: Determine the explanation coherence score, explanation complexity score and explanation credibility score corresponding to each node sequence; determine the target score corresponding to each node sequence according to the explanation coherence score, explanation complexity score and explanation credibility score corresponding to each node sequence; determine the score pair corresponding to the explanation result pair according to the target score corresponding to each node sequence.

Optionally, the score pair determination module is configured to: obtain the number of subgraphs in a subgraph set corresponding to each node sequence; and determine an explanation coherence score corresponding to each node sequence according to the number of subgraphs in the subgraph set.

Optionally, the score determination module is set to: determine the complexity of each subgraph based on the number of nodes in each subgraph in the subgraph set corresponding to each node sequence and the number of edges in each subgraph; and determine the sum of the complexities of all subgraphs in the subgraph set corresponding to each node sequence as the explanation complexity score corresponding to each node sequence.

Optionally, the score pair determination module is configured to: obtain a target subgraph from a subgraph set corresponding to each node sequence, wherein the target subgraph is a subgraph including a result factor; and determine the sum of the weights of all edges of the target subgraph as the explanation credibility score corresponding to each node sequence.

Optionally, the score pair determination module is configured to: obtain a target subgraph from a subgraph set corresponding to each node sequence, wherein the target subgraph is a subgraph including a result factor; and determine the sum of the weights of the edges in the target subgraph connected to the result factor as the explanation credibility score corresponding to each node sequence.

Optionally, the device also includes: a ranking acquisition module, configured to input the score pair into the target model to obtain the ranking order of the explanatory power of each XAI model to be evaluated among the different XAI models, or to input the score pair into the target model to obtain the ranking order of the explanatory power of each explanation result in the explanation result pair.

The above-mentioned product can execute the method provided by any embodiment of the present application and has the corresponding functional modules for executing the method.

The technical solution of this embodiment obtains the explanation result pair corresponding to the XAI model to be evaluated; obtains the node sequence pair matching the explanation result pair in the knowledge graph; determines the subgraph set corresponding to each node sequence according to the node sequence pair and the knowledge graph; and determines the score pair corresponding to the explanation result pair according to the subgraph set corresponding to each node sequence. This solves the problems of traditional XAI evaluation and selection methods being time-consuming and labor-intensive, lengthy processes, and low evaluation effectiveness, and can improve the accuracy and efficiency of the evaluation and selection methods and ensure the effectiveness of the evaluation data.

Embodiment 3

FIG5 is a schematic diagram of the structure of an electronic device in the third embodiment of the present application. The electronic device 10 is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The sub-device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (such as helmets, glasses, watches, etc.) and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely examples and are not intended to limit the implementation of the present application described and/or claimed herein.

As shown in FIG5 , the electronic device 10 includes at least one processor 11, and a memory connected to the at least one processor 11, such as a read-only memory (ROM) 12, a random access memory (RAM) 13, etc., wherein the memory stores a computer program that can be executed by at least one processor, and the processor 11 can perform a variety of appropriate actions and processes according to the computer program stored in the ROM 12 or the computer program loaded from the storage unit 18 to the RAM 13. In the RAM 13, a variety of programs and data required for the operation of the electronic device 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other through a bus 14. An input/output (I/O) interface 15 is also connected to the bus 14.

A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16, such as a keyboard, a mouse, etc.; an output unit 17, such as various types of displays, speakers, etc.; a storage unit 18, such as a disk, an optical disk, etc.; and a communication unit 19, such as a network card, a modem, a wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special processing components with processing and computing capabilities. Some examples of the processor 11 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), a variety of dedicated artificial intelligence (AI) computing chips, a variety of processors running machine learning model algorithms, a digital signal processor (DSP), and any appropriate processor, controller, microcontroller, etc. The processor 11 executes the multiple methods and processes described above, such as the XAI model evaluation method.

In some embodiments, the XAI model evaluation method may be implemented as a computer program, which is tangibly contained in a computer-readable storage medium, such as a storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed on the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the XAI model evaluation method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the XAI model evaluation method in any other appropriate manner (e.g., by means of firmware).

Various implementations of the systems and techniques described above may be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard parts (ASSPs), system on chip (SoCs), and the like. SOC), complex programmable logic device (Complex Programmable Logic Device, CPLD), computer hardware, firmware, software, and/or a combination thereof. These various implementations may include: being implemented in one or more computer programs, which may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general programmable processor, which may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.

The computer programs for implementing the methods of the present application may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, so that when the computer programs are executed by the processor, the functions/operations specified in the flow charts and/or block diagrams are implemented. The computer programs may be executed entirely on the machine, partially on the machine, partially on the machine and partially on a remote machine as a stand-alone software package, or entirely on a remote machine or server.

In the context of the present application, a computer readable storage medium may be a tangible medium that may contain or store a computer program for use by or in conjunction with an instruction execution system, device, or apparatus. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any suitable combination of the foregoing. Alternatively, a computer readable storage medium may be a machine readable signal medium. Examples of machine readable storage media may include an electrical connection based on one or more wires, a portable computer disk, a hard disk, RAM, ROM, an erasable programmable read-only memory (EPROM) or flash memory, optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide interaction with a user, the systems and techniques described herein may be implemented on an electronic device having: a display device (e.g., a cathode ray tube (CRT) or a liquid crystal display (LCD) monitor) for displaying information to the user; and a keyboard and pointing device (e.g., a mouse or trackball) through which the user can provide input to the electronic device. Other types of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form (including acoustic input, voice input, or tactile input).

The systems and techniques described herein can be implemented in a computing system that includes a backend component (e.g., as a data server), or a computing system that includes a middleware component (e.g., an application server), or a computing system that includes a frontend component (e.g., a user interface with a graphical user interface or a web browser). A computer, a user can interact with embodiments of the systems and techniques described herein through a graphical user interface or a web browser), or a computing system including any combination of such backend components, middleware components, or frontend components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: a Local Area Network (LAN), a Wide Area Network (WAN), a blockchain network, and the Internet.

A computing system may include a client and a server. The client and the server are generally remote from each other and usually interact through a communication network. The client and server relationship is generated by computer programs running on the respective computers and having a client-server relationship with each other. The server may be a cloud server, also known as a cloud computing server or cloud host, which is a host product in the cloud computing service system to solve the defects of difficult management and weak business scalability in traditional physical hosts and virtual private servers (VPS) services.

It should be understood that the various forms of processes shown above can be used to reorder, add or delete steps. For example, the multiple steps recorded in this application can be executed in parallel, sequentially or in different orders, as long as the expected results of the technical solution of this application can be achieved, and this document is not limited here.

Claims

An interpretable artificial intelligence (XAI) model evaluation method, comprising:

Obtain the explanation result pair corresponding to the XAI model to be evaluated;

Obtaining a node sequence pair in the knowledge graph that matches the interpretation result pair;

Determine a subgraph set corresponding to each node sequence in the node sequence pair according to the node sequence pair and the knowledge graph;

The score pairs corresponding to the explanation result pairs are determined according to the subgraph set corresponding to each node sequence.
The method according to claim 1, wherein determining the score pair corresponding to the pair of interpretation results according to the subgraph set corresponding to each node sequence comprises:

Determine, according to the subgraph set corresponding to each node sequence, an explanation coherence score, an explanation complexity score, and an explanation credibility score corresponding to each node sequence;

Determine a target score corresponding to each node sequence according to the explanation coherence score, explanation complexity score, and explanation credibility score corresponding to each node sequence;

The score pair corresponding to the pair of interpretation results is determined according to the target score corresponding to each node sequence.
The method according to claim 2, wherein determining the interpretation coherence score corresponding to each node sequence according to the subgraph set corresponding to each node sequence comprises:

Get the number of subgraphs in the subgraph set corresponding to each node sequence;

The explanation coherence score corresponding to each node sequence is determined according to the number of subgraphs in the subgraph set.
The method according to claim 2, wherein determining the interpretation complexity score corresponding to each node sequence according to the subgraph set corresponding to each node sequence comprises:

Determine the complexity of each subgraph according to the number of nodes in each subgraph in the subgraph set corresponding to each node sequence and the number of edges of each subgraph;

The sum of the complexities of all subgraphs in the subgraph set corresponding to each node sequence is determined as the interpretation complexity score corresponding to each node sequence.
The method according to claim 2, wherein determining the interpretation credibility score corresponding to each node sequence according to the subgraph set corresponding to each node sequence comprises:

Obtain a target subgraph in the subgraph set corresponding to each node sequence, wherein the target subgraph is a subgraph including the result factor;

The sum of the weights of all the edges of the target subgraph is determined as the explanation credibility score corresponding to each node sequence.
The method according to claim 2, wherein determining the interpretation credibility score corresponding to each node sequence according to the subgraph set corresponding to each node sequence comprises:

Obtain a target subgraph in the subgraph set corresponding to each node sequence, wherein the target subgraph is a subgraph including the result factor;

The sum of the weights of the edges connected to the result factor in the target subgraph is determined as the explanation credibility score corresponding to each node sequence.
The method according to claim 1, wherein the explanation results corresponding to the XAI model to be evaluated include explanation results of different XAI models; after determining the score pairs corresponding to the explanation result pairs according to the subgraph set corresponding to each node sequence, the method further comprises: inputting the score pairs into the target model to obtain the ranking order of the explanation ability of each XAI model in the different XAI models; or

The explanation results corresponding to the XAI model to be evaluated include the explanation results of the same XAI model under different training batches; after determining the score pairs corresponding to the explanation results according to the subgraph set corresponding to each node sequence, it also includes: inputting the score pairs into the target model to obtain the ranking order of the explanation ability of each explanation result in the explanation result pair.
An interpretable artificial intelligence (XAI) model evaluation device, comprising:

An explanation result pair acquisition module, configured to acquire an explanation result pair corresponding to the XAI model to be evaluated;

A node sequence pair acquisition module, configured to acquire a node sequence pair in the knowledge graph that matches the interpretation result pair;

A subgraph set determination module, configured to determine a subgraph set corresponding to each node sequence in the node sequence pair according to the node sequence pair and the knowledge graph;

The scoring pair determination module is configured to determine the scoring pair corresponding to the pair of interpretation results according to the subgraph set corresponding to each node sequence.
An electronic device, comprising:

at least one processor; and

a memory communicatively connected to the at least one processor; wherein,

The memory stores a computer program executable by the at least one processor, and the computer program is executed by the at least one processor so that the at least one processor can execute the explainable artificial intelligence (XAI) model evaluation method described in any one of claims 1 to 7.
A computer-readable storage medium stores computer instructions, wherein the computer instructions are used to enable a processor to implement the explainable artificial intelligence (XAI) model evaluation method described in any one of claims 1 to 7 when executed.