In this section, we discuss a feasible extension to generally repair non-determinism. Additionally, we discuss the limitations of our approach and compare this study with other existing studies.
6.1 Modifications on Non-determinism
As previously mentioned, the modification operator eliminates invariant violations triggered by deterministic state transitions. To repair non-determinism, we suggest a new modification operator. For an operation, given a pre-state
p and an erroneous post state
q, a modification operator modifies
q to a correct post state
r. The modification operator can be described using a triple
\([u,``modification,^{\prime \prime }r]\), where
u is the concatenation of
p and
q. The triple means that the faulty state pair
\((p,q)\) is rewritten as a correct state pair
\((p,r)\) using the modification operator. To control the application domain of modification,
u is converted to the following condition:
where
\(v^{pre}_i~(i=1,2,\ldots ,n)\) is a pre-variable identifier of
\(v_i\),
\(v^{post}_i~(i=1,2,\ldots ,n)\) is a post-variable identifier of
\(v_i\),
\(p[v_i]\) is the value of
\(v_i\) in
p, and
\(q[v_i]\) is the value of
\(v_i\) in
q. Moreover,
r can be converted to the following substitution:
where
\(v^{post}_i~(i=1,2,\ldots ,n)\) is a post-variable identifier of
\(v_i\), and
\(r[v_i]\) is the value of
\(v_i\) in
r. The
Non-Determinism Modification (NDM) operator to repair non-determinism is defined as follows: Given
M modification operators that are described using
M triples
\([u_j,``modification,^{\prime \prime }r_j]~(j=1,\ldots ,M)\), we use Equation (
5) to convert each
\(u_j\) to a condition
\(U_j\) and use Equation (
6) to convert each
\(r_j\) to a substitution
\(R_j\). Given an operation
\(\alpha = {\rm {\bf PRE}}~~P~~{\rm {\bf THEN}}~~S~~{\rm {\bf END}}\), where
P is a pre-condition, and
S is a substitution, an NDM can be applied to
\(\alpha\) using the following template:
In particular, if the original operation does not have a pre-condition, then the repaired operation does not have the outer “PRE-THEN-END” statement. The above template uses the “VAR-IN-END” construct with two blocks of temporary variables, i.e., “\(\lt\)pre-variables\(\gt\)” and “\(\lt\)post-variables\(\gt ,\)” to store pre-values and post-values. Before applying the substitution S, the values of the state variables are the pre-values, so “\(\lt\)pre-variables\(\gt\) := \(\lt\)state-variables\(\gt\)” can assign the pre-values to the pre-variables. After applying S, the values of the state variables become the post-values, so “\(\lt\)post-variables\(\gt\) := \(\lt\)state-variables\(\gt\)” can assign the post-values to the post-variables. As a result, the pre-values and the post-values are obtained. After using the conditions \(U_j~(j=1,\ldots ,M)\) and the substitutions \(R_j~(j=1,\ldots ,M)\) to modify the values of the post-variables, the modified values are assigned to state variables, leading to a new post-state.
The use of NDM requires a slight change on Algorithm
1, i.e., the
Update function is adapted to use the NDM template to repair faulty operations. The following example of autonomous vehicle control model will demonstrate the use of NDM. The model describes how to control a bus in a city. Figure
7 visualises a city map, which has a set
Location containing five bus stations, i.e., S0, S1, S2, S3, and S4, and five other locations, i.e., C0, C1, C2, C3, and C4. The locations are linked by a set
Street containing bidirectional edges. The model has a variable “loc” recording the location of the bus. The value of loc is changed using the following “
move” operation:
Suppose that the bus station S4 is temporarily unavailable. The following invariant is used to specify the constraint:
The invariant is violated, because the move operation can change the value of loc from S3, C3, and C4 to S4. Corresponding faulty state transitions and feasible modifications (indicated by “\(\hookrightarrow\)”) are listed below.
The meaning of the modifications is that if the bus is scheduled to move from S3, C3, or C4 to S4, the bus will be rescheduled to move from S3 to S2, from C3 to C4 or from C4 to C3, respectively. They are applied to the move operation using the NDM operator, leading to the following repaired operation:
The above example demonstrates that even though multiple state transitions can share a common post-state that triggers an invariant violation, the NDM can repair each state transition separately. This is because the conditions of NDM can ensure that each repair acts on one and only one state transition.
6.2 Limitations
As discussed in Section
2.2, the repair evaluators cannot deal with unseen elements in infinite sets. When training the repair evaluators, unseen elements are avoided by reducing infinite sets into finite sets that contain all elements occurred in a state space. For example, if a state space only includes integers 0, 1, 2, and 4, then the finite set {0, 1, 2, 4} is used to generate one-hot encodings, e.g., {1, 4} is encoded as [0, 1, 0, 1]. However, unseen elements such as 3 and 5 are not preserved in one-hot encodings, e.g., {1, 3, 4, 5} is treated as {1, 4} and encoded as [0, 1, 0, 1]. Consequently, the repair evaluators cannot distinguish between {1, 4} and {1, 3, 4, 5}. Besides, to avoid unseen elements, AMBM can control the constraint solver to find modifications that only contain elements occurring in a given state space, but such a restriction may disable a few required modifications. For example, suppose that a faulty operation Inc attempts to increase a set of integers by 1, the operation yields correct state transitions such as
\(\lbrace 0\rbrace \xrightarrow {Inc} \lbrace 1\rbrace\),
\(\lbrace 1\rbrace \xrightarrow {Inc} \lbrace 2\rbrace\) and
\(\lbrace 0, 1\rbrace \xrightarrow {Inc} \lbrace 1, 2\rbrace\) and faulty state transitions such as
\(\lbrace 2\rbrace \xrightarrow {Inc} \lbrace \rbrace\) and
\(\lbrace 1, 2\rbrace \xrightarrow {Inc} \lbrace 2\rbrace\). The faulty state transitions should be repaired as
\(\lbrace 2\rbrace \xrightarrow {Inc} \lbrace 3\rbrace\) and
\(\lbrace 1, 2\rbrace \xrightarrow {Inc} \lbrace 2, 3\rbrace\), respectively. Unfortunately, AMBM cannot suggest the required repairs, because 3 does not occur in the given state transitions. A possible approach to solve this problem is model repair based on inductive programming [
33]. Given a component library containing essential symbols such as 0, 1, +,
\(-\), * and =, inductive programming can synthesise the function
\(x^{\prime } = x + 1\), where
x and
\(x^{\prime }\) are integers, to generalise the correct state transitions of Inc. When
\(x = 2\), a new element
\(x^{\prime } = 3\) can be inferred by the function, so the required modifications can be constructed. The above discussion indicates that a merger of inductive programming and AMBM can be a way to solve the restrictions of infinite sets.
The repair of higher-order sets, e.g., sets of sets, is another limitation to the applicability of AMBM. The number of candidate modifications with higher-order sets can be huge due to combinatorial explosions. For example, if a set of sets is constructed by integers \(1, 2, \ldots , n\), then its value will be a member of \(\mathcal {P}^2(\lbrace 1, 2, \ldots , n\rbrace)\), where \(\mathcal {P}\) represents the power set. As the cardinality of \(\mathcal {P}^2(\lbrace 1, 2, \ldots , n\rbrace)\) is \(2^{2^n}\), it is unrealistic to enumerate the members when n is large. Instead of enumeration, AMBM only uses components that occurred in a given state space to generate modifications. As a result, a required modification will be missing if it includes a component not in the state space. For example, suppose that a faulty operation Inc_Send attempts to increase sets of integers by 1 and transfer the sets from a sender to a receiver, with two sets of sets \([Sender,~Receiver]\) to represent states, the operation yields correct state transitions such as:
•
\([\lbrace \lbrace 0\rbrace ,~\lbrace 1\rbrace ,~\lbrace 0,~1\rbrace \rbrace ,~\lbrace \rbrace ] \xrightarrow {Inc\_Send} [\lbrace \lbrace 1\rbrace ,~\lbrace 0,~1\rbrace \rbrace ,~\lbrace \lbrace 1\rbrace \rbrace ],\)
•
\([\lbrace \lbrace 1\rbrace ,~\lbrace 0,~1\rbrace \rbrace ,~\lbrace \lbrace 1\rbrace \rbrace ] \xrightarrow {Inc\_Send} [\lbrace \lbrace 0,~1\rbrace \rbrace ,~\lbrace \lbrace 1\rbrace ,~\lbrace 2\rbrace \rbrace ],\)
•
\([\lbrace \lbrace 0,~1\rbrace \rbrace ,~\lbrace \lbrace 1\rbrace ,~\lbrace 2\rbrace \rbrace ] \xrightarrow {Inc\_Send} [\lbrace \rbrace ,~\lbrace \lbrace 1\rbrace ,~\lbrace 2\rbrace ,~\lbrace 1,~2\rbrace \rbrace ],\)
•
\([\lbrace \lbrace 1\rbrace ,~\lbrace 2\rbrace ,~\lbrace 1,~2\rbrace \rbrace ,~\lbrace \rbrace ] \xrightarrow {Inc\_Send} [\lbrace \lbrace 2\rbrace ,~\lbrace 1,~2\rbrace \rbrace ,~\lbrace \lbrace 2\rbrace \rbrace ],\)
and faulty state transitions such as:
•
\([\lbrace \lbrace 2\rbrace ,~\lbrace 1,~2\rbrace \rbrace ,~\lbrace \lbrace 2\rbrace \rbrace ] \xrightarrow {Inc\_Send} [\lbrace \lbrace 1,~2\rbrace \rbrace ,~\lbrace \lbrace 2\rbrace ,~\lbrace \rbrace \rbrace ],\)
•
\([\lbrace \lbrace 1,~2\rbrace \rbrace ,~\lbrace \lbrace 2\rbrace ,~\lbrace \rbrace \rbrace ] \xrightarrow {Inc\_Send} [\lbrace \rbrace ,~\lbrace \lbrace 2\rbrace ,~\lbrace \rbrace \rbrace ].\)
The faulty state transitions should be repaired as follows:
•
\([\lbrace \lbrace 2\rbrace ,~\lbrace 1,~2\rbrace \rbrace ,~\lbrace \lbrace 2\rbrace \rbrace ] \xrightarrow {Inc\_Send} [\lbrace \lbrace 1,~2\rbrace \rbrace ,~\lbrace \lbrace 2\rbrace ,~\lbrace 3\rbrace \rbrace ],\)
•
\([\lbrace \lbrace 1,~2\rbrace \rbrace ,~\lbrace \lbrace 2\rbrace ,~\lbrace 3\rbrace \rbrace ] \xrightarrow {Inc\_Send} [\lbrace \rbrace ,~\lbrace \lbrace 2\rbrace ,~\lbrace 3\rbrace ,~\lbrace 2,~3\rbrace \rbrace ].\)
Unfortunately, AMBM cannot suggest the required repairs, because {3} and {2, 3} are unseen in the given state space. To suggest repairs with such unseen components, inductive programming is probably feasible, because it can construct functions to infer unseen components, as discussed above.
Besides, as the repair evaluators treat sets in sets as string elements, any unseen sets in sets cannot be encoded appropriately. To encode such unseen components, a feasible method is to introduce repair evaluators based on graphs. Figure
8 shows an example of the conversion from sets of sets (or higher-order sets) to a graph. Compared to the string representation, the advantage of graph representation is that graphs can link two components at multiple levels so unseen components can be represented by linking with their sub-components. For example, both {{1}, {2}, {1, 2}} and {{2}, {3}, {2, 3}} have a direct link from the set {2} and two indirect links from the element 2. Even though {{2}, {3}, {2, 3}} is unseen in the state space, its encoding can still be partially inferred from the encoding of {{1}, {2}, {1, 2}} using graph learning models [
36]. As there are a large number of graph learning models available, their application to B model repair can be considered future work.
6.3 Comparisons to Related Work
Automated B model repair originates from two studies [
32,
33]. To eliminate faults in abstract machines, they have proposed four methods, including the strengthening of pre-conditions, the relaxation of pre-conditions, the relaxation of invariants, and the synthesis of new operations. When relaxing pre-conditions and invariants and synthesising new operations, users are required to manually produce positive and negative I/O examples for program synthesis. The difference between their work and our work is that we focus on repairing substitutions, while they focus on repairing pre-conditions. Moreover, our AMBM algorithm uses repair evaluators and constraint solving to automatically synthesise repairs, while their methods require users to manually produce I/O examples for repair synthesis.
AMBM is an improved implementation of our previous work, B-repair [
10]. When revising faulty abstract machines, B-repair uses a constraint solver to generate candidate repairs and uses learned quality estimation functions to rank repairs. As B-repair eliminates one and only one fault during each loop of repair, it takes considerably longer to repair models with a large number of faults. In this work, as multiple faults are eliminated using fewer compound repairs during each loop of repair, AMBM is significantly more efficient than the previous B-repair, and the resulting corrections are simpler in terms of the predicate structure.
With regard to the previous work on automatic imperative program repair such as RSRepair [
28], GenProg [
21], CASC [
37], and SearchRepair [
17], automated B model repair is conceptually different, because B is a design modelling language for constructing formal specifications at the abstract design level, while imperative programs are at the concrete implementation level [
2]. Functions in B design models are usually represented as operations with pre-conditions and substitutions (which play the role of post-conditions) that declare the facts between the before and after values (states) of the variables used in the system. Consequently, operation executions are decided by the current states of variables but not decided by the control flows of the program. Thus, repair evaluators can separately analyse each operation, and repairs can be applied to faulty operations asynchronously without considering the execution orders of operations. In automatic imperative program repair, however, execution orders that are decided by control flows are important factors to be considered.
There are a number of similarities between the repair of B models and imperative program repair. First, both of them have fault localisation functions to reduce the search space of the repair. B model repair can rely on a model checker to detect faulty operations, while imperative program repair can rely on
Spectrum-based Fault Localisation (SFL) [
1] functions that find suspicious code blocks by counting successful and failed paths of program executions. Second, the concept of inductive programming can be used to synthesise patches of imperative programs [
18] and refactor atomic modifications to compound modifications (this work). Third, conditional statements are often used to avoid the side effects of repair. For example,
Staged Program Repair (SPR) [
24] can use a number of instances to generate conditions that distinguish between correct and failed executions, so side effects to the correct executions can be minimised. Similarly, our AMBM uses conditional substitutions to avoid the side effects of modifications. Finally, as repairs are not definitive, evaluation functions seem to be inevitable to estimate the appropriateness of candidate repairs and select the best repairs. For example, GenProg [
21] evaluates candidate repairs by observing successful and failed executions related to the repairs, while AMBM uses classification models and repair scores to select repairs. The above similarities between imperative program repair and B model repair indicate that the technologies in the two fields may be used by each other in the future.