research-article

Open access

Fast Automated Abstract Machine Repair Using Simultaneous Modifications and Refactoring

Authors:

Scott Uk-Jin LeeAuthors Info & Claims

Formal Aspects of Computing, Volume 34, Issue 2

Article No.: 8, Pages 1 - 31

https://doi.org/10.1145/3536430

Published: 20 September 2022 Publication History

PDF eReader

Abstract

Automated model repair techniques enable machines to synthesise patches that ensure models meet given requirements. B-repair, which is an existing model repair approach, assists users in repairing erroneous models in the B formal method, but repairing large models is inefficient due to successive applications of repair. In this work, we improve the performance of B-repair using simultaneous modifications, repair refactoring, and better classifiers. The simultaneous modifications can eliminate multiple invariant violations at a time so the average time to repair each fault can be reduced. Further, the modifications can be refactored to reduce the length of repair. The purpose of using better classifiers is to perform more accurate and general repairs and avoid inefficient brute-force searches. We conducted an empirical study to demonstrate that the improved implementation leads to the entire model process achieving higher accuracy, generality, and efficiency.

1 Introduction

Automatic Software Repair (ASR) [4, 13] aims to use verification, testing, and program synthesis techniques to assist humans to repair erroneous programs. In general, repairing software requires machines to locate faults before generating the repairs. Spectrum-based fault localisation has been heavily used to locate faults in imperative programs [1, 20]. It uses a set of I/O pairs to test a program and records traces of successful cases and failed cases. According to the occurrences of operations in the traces, suspicious code that causes the failed cases can be found, and candidate repairs can be generated using various techniques. For example, GenProg [21] and SCRepair [16] can generate mutation repairs; PASAN [34], AutoFix-E [26], and SPR [24] can use pre-defined template repairs to generate repairs; CASC [37], pyEDB [3], and GenProg [21] can use genetic programming to generate repairs. However, these ASR tools only focus on traditional test-based software development at the concrete code level. Little work has been done on ASR for correct-by-construction software development at the abstract design level, as much work in this field has focused on the computer-assisted diagnosis of faulty models. For example, a theory for identifying consistent behavioural modes in abstract models has been proposed by Reference [11]. Moreover, Linear Temporal Logic (LTL) specifications, which are used to specify sequential properties of programs at the abstract design level, can be diagnosed using SAT encodings and reasoning [27]. However, model repair after diagnosis requires more investigation. In this work, we study automatic model repair techniques based on the B method.

The B method [2] is a formal software development method at the abstract design level, where design specifications are represented as abstract machines (called “models”). B has been used to develop a number of automatic railway control systems in France, Sweden, and USA [6], formalise the security properties of the L4 microkernel [15], and verify industrial PLC controllers [5]. The idea of B model repair is proposed by Schmidt et al. [32], and the goal of B model repair is to automatically (or semi-automatically) eliminate invariant violations and deadlocks in abstract machines. Moreover, Schmidt et al. [33] have developed a model repair approach that eliminates invariant violations by strengthening pre-conditions and relaxing invariants and eliminates deadlocks by weakening pre-conditions and generating new operations. This approach is semi-automatic because users are required to manually give I/O examples to synthesise new operations, and the code of new operations is constructed using a pre-defined program component library. Another automated B model repair approach is called B-repair [10], which uses machine learning techniques to learn the state spaces of abstract machines and select well-behaved repairs that preserve the original state spaces as much as possible. However, B-repair eliminates only one fault during each loop of repair, which means that repairing a large number of faults is time-consuming.

In this article, we improve B-repair by implementing Abstract Machine Batch Modification (AMBM), which is more automatic and efficient than the previous B-repair. AMBM can repair multiple invariant violations at a time using simultaneous modifications, repair refactoring, and better classifiers. AMBM inherits the concept of B-repair, which aims to repair design models at a high level of abstraction. The design models consist of operations describing changes in model states and invariants describing model properties, and the operations are expected to satisfy the properties with respect to the system requirements [2]. If the design models are logically verified to be correct, then they can be further developed into executable software via different techniques, e.g., they can be rewritten as concrete models by refinement and finally converted to concrete programs. If faults exist in a design model, then succeeding concrete models and final concrete programs can be faulty. Thus, repairing faulty design models is of great importance to the secure software development process.

The workflow of AMBM is revealed in Figure 1. First, an initial abstract machine that specifies the design model, which possibly has invariant violations, is constructed by the user. The machine consists of initialisation and a number of operations. In the learning module, the state space of the machine is analysed using a model checker and learned using a classifier, leading to a repair evaluator. We have the following two assumptions: First, high-quality repairs are expected to retain the state space of the original model as much as possible. Second, instead of manual estimation, the quality of repair can be automatically estimated using the repair evaluator. Based on the above assumptions, the repair evaluator will be used to maximise the similarities of state space before and after repair. Next, in the modification module, the model checker is used to detect invariant violations in the machine. If invariant violations exist, then a set of atomic modifications that can eliminate the invariant violations will be synthesised. These modifications are found via a constraint solver and selected using the repair evaluator. A modification will be selected if the repair evaluator predicts a high likelihood that the modification can lead to a minimal change in the state space. Each selected atomic modification can eliminate exactly one invariant violation. After that, in the refactoring module, the atomic modifications are simplified as compound modifications. Each compound modification can eliminate one or more invariant violations. Finally, in the update module, the compound modifications are applied to the initial abstract machine, leading to an updated machine. The updated machine will be forwarded to the modification module. If any invariant violations are detected, then further modifications will be required; otherwise, the workflow terminates.

Fig. 1.

The contributions of this work include:

•

B-repair [10] is extended by implementing batch modifications and integrating better machine learning models, which leads to higher speed and accuracy on B model repair tasks.

•

A repair refactoring algorithm is introduced to generalise the code of modifications.

•

An empirical study is conducted to demonstrate the accuracy, generality and efficiency of the extended model repair tool.¹

The rest of this article is organised as follows: Section 2 revisits the B method, supervised machine learning, and B-repair. Section 3 presents technical details of AMBM. Section 4 presents a case study on the use of AMBM to repair a faulty B model. Section 5 presents an empirical study of our approach. Section 6 discusses modifications on non-determinism and compares our study with related work. Section 7 concludes this work and outlines future directions.

2 Preliminaries

This section revisits the B method, supervised machine learning, and B-repair. Emphasised words are terminology that will be used in later discussions.

2.1 The B Method

The B method [2] is a correct-by-construction formal design modelling technique, where models are represented as abstract machines consisting of constants, variables, initialisations, operations, invariants, and properties. Constants define unchangeable values in models, and variables define changeable values that record states of models. Initialisations can assign initial values to variables, and operations can generate new states by assigning new values to variables. Invariants describe conditions that all states must satisfy, and properties describe conditions that must be satisfied by the constants. The derivation and checking of model states can be achieved using model checkers such as the ProB tool [22]. Given an abstract machine that has N variables, the ProB model checker can approximate a state space that consists of a set of state transitions of the form $[v_1,v_2,\ldots ,v_N] \xrightarrow {\alpha } [v^{\prime }_1,v^{\prime }_2,\ldots ,v^{\prime }_N]$, where $\alpha$ is an operation, $v_1,v_2,\ldots ,v_N$ are the values of the variables before applying the operation, and $v^{\prime }_1,v^{\prime }_2,\ldots ,v^{\prime }_N$ are the values of the variables after applying the operation. In other words, the new state $[v^{\prime }_1,v^{\prime }_2,\ldots ,v^{\prime }_N]$ results from the application of the operation $\alpha$ to the existing state $[v_1,v_2,\ldots ,v_N]$. When deriving such transitions, the model checker verifies whether all $v_1,v_2,\ldots ,v_N$ and $v^{\prime }_1,v^{\prime }_2,\ldots ,v^{\prime }_N$ satisfy the given invariants. If not, then an invariant violation will be triggered and reported. Initialisations can trigger invariant violations as well, but in this study, we do not repair faulty initialisations.

Operations are the core components of abstract machines, as they determine the derivation of transitions. They are described using different forms of substitutions, such as pre-conditioned substitutions and conditional substitutions. A pre-conditioned substitution is of the form ${\rm PRE}~~P~~{\rm THEN}~~Q~~{\rm END}$, where P is a predicate, and Q is a substitution. P is a pre-condition that must be true for a state s when the operation is applied. If P is false for s, then the operation will not be activated. A conditional substitution is of the form ${\rm IF}~~P~~{\rm THEN}~~Q~~{\rm ELSE}~~R~~{\rm END}$, where P is a predicate, and Q and R are substitutions. If P is true for a state s, then Q will be applied. If P is false for s, then R will be applied to s. In this study, the above two types of substitution are used to construct repair operators that can generally handle other types of substitution. Additionally, a pre-state and an operation can either deterministically lead to only one post-state, or non-deterministically lead to more than one post-state. In this work, we focus on such determinism and leave non-determinism as future work.

2.2 Supervised Machine Learning

Supervised machine learning aims at constructing a function that maps given input-output pairs [29]. Although supervised machine learning and the B method are rooted from two separate domains, the vectorisation techniques can bridge the gap between the two domains. According to References [10, 35], the states of B model can be converted into binary vectors by applying a set of pre-defined transformations to variables in the states. If a variable is an integer, a Boolean value, a distinct element, or a first-order set, then it can be vectorised by one-hot encoding. In the vectorisation process, infinite types such as INTEGER and NATURAL are converted to finite sets by collecting all the values that occur in a state space and ignoring all unseen values so such infinite types can be partially vectorised. The unseen values are ignored, because supervised machine learning models usually cannot learn unseen values, as they do not occur in training sets; therefore, unseen values should be excluded to avoid noise. Besides, to vectorise higher-order sets, sets in sets can be considered as string elements. Section 6.2 will discuss limitations and alternatives of the vectorisation method. Regardless of the number of variables, a state s can be vectorised by converting all variables in s to vectors and concatenating the vectors.

Using vectorisation, states of a B formal model can be represented as sequences of 0 and 1, so they can be learned using supervised learning models such as Bernoulli Naive Bayes (BNB) classifiers, Logistic Regression (LR) classifiers, Support Vector Machines (SVM), Random Forests (RF), and various neural network architectures [7, 14]. In particular, Silas is an explainable and verifiable classifier that learns patterns using random forests and applies automated reasoning techniques to explain and verify the learning results [8, 9]. Although the theories of these learning models differ from each other, their usages are similar. Each model must have a training algorithm and a prediction algorithm. The training algorithm takes as input a training set containing a list of vectors with their labels. Each label is an identifier representing exactly one class, and each vector has exactly one label. The training algorithm updates parameters of the model to map the vectors in the training set to their corresponding labels. The prediction algorithm takes as input a vector x and returns a vector $y = (y_1,\ldots ,y_K)$ such that $y_k~(k=1,\ldots ,K)$ is the likelihood that x is mapped to the kth label. In the next section, we will explain how to use the supervised machine learning models to learn the states of a B design model.

2.3 B-repair

B-repair [10] aims to use model checking, constraint solving and machine learning to search for repairs that solve invariant violations in B abstract machines. After the use of ProB [22] to detect a state transition that violates invariants, the constraint solver in ProB is used to suggest candidate repairs that change the state transition to satisfy the invariants. One of the core steps of B-repair is to use a repair evaluator based on binary classification to select repairs from the candidate repairs.

Model checkers can approximate an abstract machine using a set of state transitions, where each transition is of the form $S_{pre} \xrightarrow {\alpha } S_{post}$ and can be rewritten as a triple $[S_{pre},S_{post},\alpha ]$ consisting of a pre-state $S_{pre}$, a post-state $S_{post}$, and an operation $\alpha$. The operation $\alpha$ consists of a pre-condition P and a post-condition Q (which is usually represented as a generalised substitution). The triple $[S_{pre},S_{post},\alpha ]$ means that $S_{pre}$ is a state satisfying P, and $S_{post}$ is a state satisfying Q. The analysis of the state space can be converted into a classification problem, i.e., the triple $[S_{pre},S_{post},\alpha ]$ can be classified into either a set of “possible” transitions $S_{P}$ or a set of “impossible” transitions $S_{I}$. $[S_{pre},S_{post},\alpha ]$ is in $S_{P}$ if and only if $S_{pre} \xrightarrow {\alpha } S_{post}$ is a possible transition with respect to the machine. $[S_{pre},S_{post},\alpha ]$ is in $S_{I}$ if and only if $S_{pre} \xrightarrow {\alpha } S_{post}$ is impossible with respect to the machine. B-repair can use binary classifiers to learn the mapping from state transitions to $S_{P}$ and $S_{I}$. The trained classifiers are considered as repair evaluators, i.e., given a repair, the classifiers use their prediction functions to output repair scores indicating the likelihood that the repair results in a state transition in $S_{P}$.

3 Abstract Machine Batch Modification

This section gives details on how B-repair is improved by implementing AMBM. AMBM reuses the learning and update modules of B-repair and adapts the modification module to support batch modifications. Additionally, a new refactoring module is used to simplify the code of batch modifications. Algorithm 1 describes the main function of AMBM. It takes as input a source B machine that contains invariant violations and outputs a repaired machine without any invariant violations. The algorithm consists of the learning phase (Lines 12), the modification phase (Lines 3–18), the refactoring phase (Lines 19–23), and the update phase (Lines 24–25), which are indicated using the “RGB]0,47,167$\mathbf {\rhd }$” symbols. The motivation and intuition of the four phases are as follows:

•

The learning phase is used to train a classifier that learns the state space of the B machine. The trained classifier can be used to rank repairs. Without the ranking process, it will be difficult to select appropriate repairs.

•

The modification phase is used to detect invariant violations and suggest repairs. As different candidate repairs are available, the classifier is used to rank the repairs.

•

The refactoring phase is an optional process that can simplify the code of repair when multiple repairs are applied to an operation. Without simplification, the repair still works, but the resulting B machine may have tedious code.

•

The update phase is used to update the B machine based on the suggested repairs.

Referring to Figure 1 in Section 1, the four phases correspond to the learning module, the modification module, the refactoring module, and the update module, which describe a single loop of modification. During the learning phase, the state space of the source machine is approximated using the ProB model checker [22] and used to train a repair evaluator. During the modification phase, invariant violations are detected and removed from the source machine. First, the model checker is used to find all faulty transitions that trigger invariant violations. Second, the constraint solver embedded in ProB is used to randomly compute a set of candidate states that satisfy all invariants in the source machine. Third, a set of candidate atomic modifications, which can repair single faulty transitions, are produced using the candidate states. (To understand how the candidate atomic modifications are generated, refer to the Atomic-Modifications function and the Update function in Section 3.1.) Their repair scores are estimated using the learned classifier. For each faulty transition, an atomic modification with the highest repair score is selected. Fourth, the source machine is updated using all selected modifications. The modification phase is repeated until no faulty transitions can be found. During the refactoring phase, atomic modifications applied to each operation are collected and rewritten using Algorithm 2, resulting in a set containing compound modifications and atomic modifications that cannot be refactored. Finally, during the update phase, the source machine is changed using the modifications, and the updated machine is returned.

Algorithm 2 is an algorithm that rewrites atomic modifications into compound modifications. It takes as input a set of atomic modifications and outputs a set of compound modifications. First, the atomic modifications are converted to atomic modification predicates (which are generated using the Modifications-To-Predicates function in Section 3.1). Second, Context-Free Grammars (CFG) are used to generate a set of relation predicates describing possible relations between the pre- and post-states. Third, candidate relation predicates that are satisfied by each atomic modification predicate are collected. According to the candidate relation predicates, the atomic modification predicates are split into two partitions. The first partition $P_B$ includes all atomic modification predicates that satisfy a common relation predicate. The second partition $P_A$ includes all atomic modification predicates that do not satisfy the relation predicate. Finally, the above partition process is applied to $P_A$ iteratively until no further partitions can be produced. If such a partition cannot be produced, then the partitions will be converted to compound modifications using their relation predicates.

To help readers understand the algorithms, the following subsection provides details of core functions used in Algorithms 1 and 2. The functions are listed in order of line numbers in the algorithms.

3.1 Core Functions

Algorithm 1 includes the following functions:

•

$D \leftarrow \texttt {State-Space}(M)$ (in Line 1 and Line 7) returns the state space D of a given abstract machine M. It is a function of the ProB model checker. The given abstract machine must be finite with respect to its invariant and have no deadlock states. To approximate a finite state space, ProB is run to generate all states that satisfy the invariant and freeze all states that violate the invariant. In particular, if ProB detects any states violating the invariant, then ProB will be controlled to check other normal states rather than stopping at the violation points. The resulting finite state space D is converted to a list of triples. Each triple is of the form $[S,S^{\prime },\alpha ]$, where S is a pre-state of the form $[x_1,x_2,\ldots ,x_N]$, $S^{\prime }$ is a post-state of the form $[x^{\prime }_1,x^{\prime }_2,\ldots ,x^{\prime }_N]$, and $\alpha$ is an operation. The triple means that $[x_1,x_2,\ldots ,x_N] \xrightarrow {\alpha } [x^{\prime }_1,x^{\prime }_2,\ldots ,x^{\prime }_N]$ is a possible transition of M, where $x_i~(i=1,\ldots ,N)$ and $x^{\prime }_i~(i=1,\ldots ,N)$ are values of variables in M. The above form of triples is consistently used in our functions.

•

$W \leftarrow \texttt {Repair-Evaluator-Training}(D,T)$ (in Line 2) learns a binary classifier of type T for a state space D and returns the learned classifier W. The learning of the classifier is performed by the following steps: First, a set of triples Z is randomly produced such that $|Z| = |D|$ and $D \cap Z = \varnothing$. Second, the triples in D and Z are vectorised as features with labels. Features of D are labelled as “0” (i.e., possible transitions). Features of Z are labelled as “1” (i.e., impossible transitions). Third, the features with labels are learned using a classifier of the type T. T can be BNB, LR, SVM, RF, or Silas. Regardless of the theories of these classifiers, each classifier has a training algorithm implemented as a method, $\texttt {fit}(X,Y)$, that takes as input a list of features X with a list of labels Y and updates parameters of the classifier to fit X and Y. The goal of fitting is to map features in X to their corresponding labels in Y as much as possible. For our repair evaluator training function, the method $\texttt {fit}$ is used to learn the mappings between the features and the labels of the triples. Finally, the learned classifier is returned.

•

$X \leftarrow \texttt {Invariant-Solutions}(M,N)$ (in Line 3) computed at most N states satisfying the invariant of an abstract machine M and returns a set X containing the N computed states. The function consists of the following steps: First, the invariant of M is extracted. Second, the invariant is converted into a constraint where all variables in M are considered unknowns. Finally, the constraint solver of ProB searches solutions satisfying the constraint in random order and returns N solutions, where each solution is a state $[x_1,x_2,\ldots ,x_N]$ satisfying the invariant of M. The purpose of this function is to find candidate components (i.e., modified states) for producing atomic modifications.

•

$T \leftarrow \texttt {Faulty-Transitions}(D)$ (in Line 8) returns a set T containing all faulty transitions in the state space D. D must be produced using the function $D \leftarrow \texttt {State-Space}(M)$. Note that D is assumed to have no deadlock states. If ProB detects an invariant violation, then the computation of the state space will stop and only one faulty state can be reported. To report all invariant violations, we use a trick to control ProB to develop a whole state space and collect all invariant violations, i.e., all states that trigger invariant violations are frozen, and all states that have no outgoing transitions are reported.

•

$S \leftarrow \texttt {Correct-States}(D)$ (in Line 12) returns a set S containing all correct states in the state space D. D must be produced using the function $D \leftarrow \texttt {State-Space}(M)$. Correct states in D are collected by finding all states with at least one outgoing transition.

•

$R \leftarrow \texttt {Atomic-Modifications}(T,S)$ (in Line 13) takes as input a set of faulty transitions T and a set of correct states S. Any transition $S_{pre} \xrightarrow {\alpha } S_{post}~\in ~T$ and any state $S_{mod} \in S$ correspond to an atomic modification $[\alpha ,S_{pre},S_{post},S_{mod}]$, which means that changing $S_{pre} \xrightarrow {\alpha } S_{post}$ to $S_{pre} \xrightarrow {\alpha } S_{mod}$ can eliminate the faulty state $S_{post}$. All possible atomic modifications are collected into a set R and returned. The purpose of this function is to synthesise atomic modifications that can eliminate invariant violations.

•

P $\leftarrow$ $\texttt {Repair-Scores}(R,W)$ (in Line 14) predicts repair scores of a set of atomic modifications R via a classifier W. The classifier must be produced using the function $W \leftarrow \texttt {Repair-Evaluator-Training}(D,T)$. The repair scores are computed via the following steps: First, each atomic modification $[\alpha ,S_{pre},S_{post},S_{mod}]$ is reduced to a triple $[S_{pre},S_{mod},\alpha ]$. Second, $[S_{pre},S_{mod},\alpha ]$ is vectorised as a binary feature $x_0$. Third, the classifier W is used to predict the repair score of $x_0$. Regardless of the type of W, it must have a prediction algorithm implemented as a method $\texttt {predict}(x,y)$ that takes as input a feature x and a label y and returns the likelihood that x is mapped to y. For our repair evaluator training algorithm, the repair score of the triple is the value of $\texttt {predict}(x_0,``0^{\prime \prime })$. Finally, a list P containing the repair scores of all the modifications is returned.

•

$S \leftarrow \texttt {Modification-Selection}(R,P)$ (in Line 15) selects modifications in a list of atomic modifications R with a list of repair scores P and returns a list of selected modifications S. The ith number in P is the repair score of the ith modification in R. The selecting process has the following steps: First, modifications in R are sorted by their repair scores in descending order. Second, for any modification $[\alpha ,S_{pre},S_{post},S_{mod}]$ in the sorted R, if $\alpha$, $S_{pre}$, and $S_{post}$ is the first occurrence, then $[\alpha ,S_{pre},S_{post},S_{mod}]$ will be considered as the best modification for the transition $S_{pre} \xrightarrow {\alpha } S_{post}$. Finally, all the best modifications are collected into a list S and returned. The purpose of this function is to find the best atomic modifications, i.e., for each invariant violation, the trained classifier is used to estimate the repair scores of all applicable candidate modifications, and only the modification with the highest repair score will be selected.

•

U $\leftarrow$ $\texttt {Update}(M,R)$ (in Lines 16 and 24) updates an abstract machine M using a list of modifications R and returns an updated machine U. A modification is applied to an operation $\alpha$ via a pair $(P^{\prime },Y^{\prime })$, where $P^{\prime }$ is a condition, and $Y^{\prime }$ is a substitution. For an atomic modification $[\alpha ,S_{pre},S_{post},S_{mod}]$, $P^{\prime }$ is the predicate form of $S_{pre}$, and $Y^{\prime }$ is the substitution form of $S_{mod}$. For a compound modification $[\alpha ,P,Y]$, $P^{\prime }$ is the predicate form of P, and $Y^{\prime }$ is the substitution form of Y. If $\alpha$ is a substitution T without any pre-conditions, then applying the modification to $\alpha$ will lead to a conditional substitution as follows:

\begin{equation} {\rm IF}~~not(P^{\prime })~~{\rm THEN}~~T~~{\rm ELSE}~~Y^{\prime }~~{\rm END.} \end{equation}

If $\alpha$ is a pre-conditioned substitution “${\rm PRE}~~S~~{\rm THEN}~~T~~{\rm END,}$” then applying the modification to $\alpha$ leads to a pre-conditioned substitution as follows:

\begin{equation} \begin{aligned}{\rm PRE}&~~S~~{\rm THEN} \\ &{\rm IF}~~not(P^{\prime })~~{\rm THEN}~~T~~{\rm ELSE}~~Y^{\prime }~~{\rm END} \\ {\rm END}& . \end{aligned} \end{equation}

Moreover, when $N~(N \ge 1)$ modifications $(P_1^{\prime },Y_1^{\prime }),\ldots ,(P_N^{\prime },Y_N^{\prime })$, where $P_i^{\prime }~(i=1,\ldots ,N)$ is a condition and $Y_i^{\prime }~(i=1,\ldots ,N)$ is a substitution, are applied to the same operation with a substitution T and without any pre-conditions, the following template can be used:

\begin{equation} \begin{aligned}&{\rm IF}~~P_1^{\prime }~~{\rm THEN}~~Y_1^{\prime } \\ & \ldots \\ &{\rm ELSIF}~~P_N^{\prime }~~{\rm THEN}~~Y_N^{\prime } \end{aligned} \end{equation}

\begin{equation*} \begin{aligned}&{\rm ELSE}~~T \nonumber \nonumber \\ &{\rm END.} \nonumber \nonumber \end{aligned} \end{equation*}

If the operation has a pre-condition S and a substitution T, then the following template can be used:

\begin{equation} \begin{aligned}{\rm PRE}&~~S~~{\rm THEN} \\ &{\rm IF}~~P_1^{\prime }~~{\rm THEN}~~Y_1^{\prime } \\ & \ldots \\ &{\rm ELSIF}~~P_N^{\prime }~~{\rm THEN}~~Y_N^{\prime } \\ &{\rm ELSE}~~T \\ &{\rm END} \\ {\rm END}& . \end{aligned} \end{equation}

After applying all modifications to M, the resulting machine U is returned.

•

$S \leftarrow \texttt {Collect-Modifications}(R,\alpha)$ (in Line 21) takes as input a set of modifications R and an operation $\alpha$ and returns a set S containing all modifications that are in R and can be applied to $\alpha$.

•

$S \leftarrow \texttt {Refactoring}(R)$ (in Line 22) takes as input a set of atomic modifications R and returns a set of compound modifications S via Algorithm 2. Each compound modification is of the form $[\alpha ,P,Y]$, where $\alpha$ is an operation, P is a list of pre-states, and Y is a list of substitutions reflecting the relations between the pre-states and the post-states. The purpose of this function is to simplify atomic modifications.

Algorithm 2 includes the following functions:

•

$\alpha \leftarrow \texttt {Get-Operation}(M)$ (in Line 1) takes as input a set of atomic modifications M. If all modifications correspond to a certain operation $\alpha$, then $\alpha$ is returned. If the modifications correspond to two or more operations, then an error is raised.

•

$U \leftarrow \texttt {Modifications-To-Predicates}(M)$ (in Line 2) converts a set of atomic modifications M to a set of predicates U. Each atomic modification is of the form $[\alpha ,S_{pre},S_{post},S_{mod}]$, suggesting that a transition $S_{pre} \xrightarrow {\alpha } S_{post}$ should be changed to $S_{pre} \xrightarrow {\alpha } S_{mod}$. Only $S_{pre}$ and $S_{mod}$ need to be converted to predicate forms. Suppose that $S_{pre}$ and $S_{mod}$ are $[x_1,\ldots ,x_N]$ and $[y_1,\ldots ,y_N]$, respectively, and $v^{pre}_i$ and $v^{mod}_i$ $(i=1,\ldots ,N)$ are identifiers of variables in the pre-state P and the modified state Y, respectively. The predicate forms of P and Y are $v^{pre}_1=x_1~\wedge ~\cdots ~\wedge ~v^{pre}_N=x_N$ and $v^{mod}_1=y_1~\wedge ~\cdots ~\wedge ~v^{mod}_N=y_N$, respectively. Thus, the predicate form of $[\alpha ,S_{pre},S_{post},S_{mod}]$ is the conjunction of the above two predicates, which is $v^{pre}_1=p_1~\wedge ~\cdots ~\wedge ~v^{pre}_N=p_N~\wedge ~v^{mod}_1=y_1~\wedge ~\cdots ~\wedge ~v^{mod}_N=y_N$. This predicate is called a modification predicate. The purpose of producing modification predicates is to enable the constraint-solving function in ProB to find relationships between pre-states and modified states.

•

$P \leftarrow \texttt {CFG-Predicates}(M,D)$ (in Line 3) takes as input a set of atomic modifications M and a search depth D and synthesises a set of predicates P using Context-Free Grammars (CFGs). The CFGs are constructed using: (1) the identifiers of variables in the pre-states and the modified states, including $v^{pre}_i$ and $v^{mod}_i$ $(i=1,\ldots ,N)$, (2) values of variables that occur in M, and (3) the B operators such as arithmetic, Boolean, and set operators. The maximum depth of synthesised predicates is D. Synthesised predicates are of the form $v^{mod}_i = F(v^{pre}_1,\ldots ,v^{pre}_N)$, where F is a function. The synthesised predicates are called CFG predicates. The purpose of synthesising CFG predicates is to find candidate predicates that represent relations between pre-states and modified states. By default, the maximum depth of the CFG predicate is set to 3, and the number of candidate predicates is set to 1,000. Each variable can match at least one candidate predicate because a variable $v_i$ with a value $x_i$ can be directly converted to a predicate $v^{mod}_i = x_i$.

•

W $\leftarrow$ $\texttt {Candidate-Predicates}(X,P)$ (in Line 6) takes as input a set of modification predicates X and a set of CFG predicates P and returns a set W containing candidate pairs that are of the form $(R,S)$ such that $R \in X$, $S \in P$, and S is a candidate predicate of R. To obtain such pairs, for any $R \in X$ and any $S \in P$, the constraint-solving function in the ProB model checker is used to resolve $R \wedge S$. If $R \wedge S$ is true, then S will be considered as a candidate predicate of R. The pair $(R,S)$ will be a member of W and is called a candidate pair. After obtaining all candidate pairs, W is returned. The purpose of finding candidate pairs is to discover hidden relations between pre-states and modified states.

•

$[P_B,P_A] \leftarrow \texttt {Best-Partition}(X,W)$ (in Line 7) takes as input a set of modification predicates X and a set of candidate pairs W. Recall the explanations of Modifications-To-Predicates and CFG-Predicates. We continue to use the notation of modification predicates $v^{pre}_1=p_1~\wedge ~\cdots ~\wedge ~v^{pre}_N=p_N~\wedge ~v^{mod}_1=y_1~\wedge ~\cdots ~\wedge ~v^{mod}_N=y_N$ and the notation of candidate predicates $v^{mod}_i = F(v^{pre}_1,\ldots ,v^{pre}_N)$. For each $v^{mod}_i$ $(i=1,\ldots ,N)$, a candidate predicate $U_i$ is found in W such that:

◦

$U_1 \wedge \cdots \wedge U_N$ is true for all predicates in a subset $X_1 \subseteq X$,

◦

$U_1 \wedge \cdots \wedge U_N$ is false for all predicates in a subset $X_2 \subseteq X$,

◦

$X_1 \cup X_2 = X~\wedge ~X_1 \cap X_2 = \varnothing$, and

◦

the cardinality of $X_1$ is maximised.

In the above process, $U_1 \wedge \cdots \wedge U_N$ is called a compound modification predicate, which uses a set of CFG predicates to describe atomic modifications. After finding such $X_1$ and $X_2$, the best partition $P_B = [X_1,[U_1, \ldots , U_N]]$ and a partition $P_A = X_2$ that contains all the remaining atomic modifications will be returned. The purpose of this function is to find common relations between pre-states and modified states.

•

Z $\leftarrow$ $\texttt {Compound-Modification}(P_B,\alpha)$ (in Line 8) converts the partition $P_B = [X_1$, $[U_1,\ldots ,U_N]]$, which is produced using Best-Partition, to a compound modification Z for an operation $\alpha$. Z is of the form $[\alpha ,P,[U_1,\ldots ,U_N]]$, where P is a set containing all pre-states covered by $X_1$. The purpose of this function is to synthesise modifications using common relations between pre-states and modified states.

3.2 Implementation

AMBM is implemented by extending B-repair [10]. Its main dependencies include ProB 1.7.1 [22], scikit-learn 0.19.2 [25], and Silas Edu 0.8.5 [8, 9]. The model checker in ProB is used to approximate state spaces of abstract machines and detect invariant violations. The constraint solver in ProB is used to find candidate modifications. The constraint-solving function in ProB is used to find relations between pre-states and modified states. Regarding scikit-learn, it provides well-behaved classifiers, including BNB, LR, SVM, and RF, as well as training and prediction functions that can be directly used for the purpose of repair evaluator training. Besides, Silas Edu provides an improved implementation of random forest and corresponding training and prediction functions. In our tool, these classifiers are used as black boxes. The source code of the tool can be downloaded.²

Figure 2 shows the architecture of the implemented AMBM tool. The tool consists of four modules that correspond to the four phases of Algorithm 1. The following descriptions provide details of the four modules:

Fig. 2.

•

The learning module trains a repair evaluator for a given abstract machine. To obtain the training data, correct state transitions of the abstract machine are approximated using the ProB model checker. The state transitions are vectorised as training data using the state transition encoder in B-repair. Repair evaluators can be a Bernoulli naive Bayes classifier, a logistic regression classifier, a support vector machine, or a random forest. Training functions for these classifiers are inherited from scikit-learn and Silas.

•

The modification module generates atomic modifications for the abstract machine. In this module, the abstract machine is checked using the ProB model checker, and all state transitions and all invariant violations are collected. After analysing the state transitions and the invariant violations, the constraint generator and solver work out candidate atomic modifications that can remove all the invariant violations from the state transitions. Features of the candidate atomic modifications are computed using the state transition encoder in B-repair. These features and the trained repair evaluator are used to predict repair scores in the repair score predictor. The predictor makes use of the prediction functions in scikit-learn and Silas to estimate the repair scores. After that, the candidate atomic modifications are sorted by their repair scores, and those with high repair scores are selected.

•

The refactoring module converts the selected atomic modifications to compound modifications. First, the Mod2Pred convertor rewrites the atomic modifications to their predicate forms, and the context-free grammar predicate generator generates candidate predicates from predefined context-free grammars of B and types of variables in the atomic modifications. Then the atomic modifications are clustered into a number of partitions by the satisfiability between the predicate forms of the atomic modifications and the candidate predicates. Next, the predicate extractor collects predicates of each partition. Finally, these predicates are converted to compound modifications via the Pred2Mod convertor.

•

The update module uses a machine updater to apply the compound modifications to the original abstract machine and uses the ProB model checker to check the correctness of the updated abstract machine. If the model checker does not report any invariant violation, then the AMBM tool will terminate and return the abstract machine. Otherwise, a new loop of AMBM is started to eliminate invariant violations in the abstract machine.

In Sections 4 and 5, the AMBM tool is used to conduct experiments.

4 A Case Study On AMBM

This section provides a case study to explain Algorithms 1 and 2. The two algorithms are used to repair the bus control model in Figure 3. The model has the following features:

Fig. 3.

•

Buses are numbered as $1,~2, \ldots , N$.

•

“Selected_Bus” is a variable that denotes the ID of a selected bus that is being controlled. When Selected_Bus = 0, no bus is selected.

•

“Current_Location(i)” is a variable that denotes the current location of the ith bus.

•

“Next_Location(i)” is a variable that denotes the next scheduled location of the ith bus.

•

“Moving(i)” is a variable that denotes whether or not the ith bus is moving.

•

St1, St2, $St3,$ and St4 are stations, and Airport is an airport.

•

Initially, all buses are at St1, not moving and not scheduled.

•

“Bus_Selector” is an operation that selects a bus to send a signal.

•

“Signal_Sender” is an operation that sends a signal to the selected bus to schedule a destination.

•

“Bus_Controller” is an operation that moves the selected bus to the scheduled destination.

In the model, a bus controller is described by the Bus_Controller operation. It has a pre-conditioned substitution of the form ${\rm PRE}~\ldots ~{\rm THEN}~\ldots ~{\rm END}$, where the predicate between ${\rm PRE}$ and ${\rm THEN}$ is a pre-condition, meaning that if the selected bus does not reach its next scheduled location, the bus controller will start controlling it to reach the location. The substitution between the first ${\rm THEN}$ and the last ${\rm END}$ describes how to change the state of the bus. If the bus is not moving, then it starts moving to the next location. If the bus is on the way to the next location, then it will stop at the next location. The operation is required to satisfy the invariant, where variable identifiers “ii” and “jj” are used instead of “i” and “j,” because single letters are not allowed to be identifiers in some B model checkers. The last two lines of the invariant mean that St4 can hold at most one bus, but the invariant is violated, because the operation allows two buses to stop at St4. We use Algorithm 1 to solve such invariant violations.

4.1 The Learning Phase

We used the case of $N = 2$ as an example, i.e., the bus controller controlled two buses with IDs 1 and 2. In the learning phase, the State-Space function can approximate state transitions of the model. To aid the readability, we recorded a state as [Selected_Bus, Moving(1), Moving(2), Current_Location(1), Current_Location(2), Next_Location(1), Next_Location(2)], and “TRUE” and “FALSE” are recorded as “T” and “F,” respectively. Due to the large state space, we give the following state transitions as examples, where changed values are underlined:

•

$[0,F,F,St1,St1,St1,St1] \xrightarrow {Bus\_Selector} [\underline{1},F,F,St1,St1,St1,St1],$

•

$[1,F,F,St1,St1,St1,St1] \xrightarrow {Signal\_Sender} [\underline{0},F,F,St1,St1,\underline{St2},St1],$

•

$[0,F,F,St1,St1,St2,St1] \xrightarrow {Bus\_Selector} [\underline{1},F,F,St1,St1,St2,St1],$

•

$[1,F,F,St1,St1,St2,St1] \xrightarrow {Bus\_Controller} [\underline{0},\underline{T},F,St1,St1,St2,St1],$

•

$[0,T,F,St1,St1,St2,St1] \xrightarrow {Bus\_Selector} [\underline{1},T,F,St1,St1,St2,St1],$

•

$[1,T,F,St1,St1,St2,St1] \xrightarrow {Bus\_Controller} [\underline{0},\underline{F},F,\underline{St2},St1,St2,St1].$

The above state transitions show how the first bus moves from St0 to St1, and they are considered possible state transitions.

Using the Repair-Evaluator-Training function, a set of impossible state transitions can be generated so a binary classifier can be trained to distinguish the two classes of state transitions. Due to a large number of impossible state transitions, we give the following examples:

•

$[0,F,F,St1,St1,St1,St1] \xrightarrow {Bus\_Selector} [0,F,F,St1,St1,\underline{St2},St1],$

•

$[1,F,F,St1,St1,St1,St1] \xrightarrow {Signal\_Sender} [\underline{0},\underline{T},F,St1,St1,\underline{St2},St1],$

•

$[0,F,F,St1,St1,St2,St1] \xrightarrow {Bus\_Selector} [\underline{1},\underline{T},F,St1,St1,St2,St1],$

•

$[1,F,F,St1,St1,St2,St1] \xrightarrow {Bus\_Controller} [1,F,F,St1,St1,\underline{St1},St1],$

•

$[0,T,F,St1,St1,St2,St1] \xrightarrow {Bus\_Selector} [0,\underline{F},F,St1,St1,\underline{St1},St1],$

•

$[1,T,F,St1,St1,St2,St1] \xrightarrow {Bus\_Controller} [\underline{0},\underline{F},F,St1,St1,St2,St1].$

To avoid unnecessary details, we suggest readers refer to Reference [10] for details of repair evaluator training. The trained classifier is stored for future use.

4.2 The Modification Phase

In the modification phase, to produce atomic modifications, the Invariant-Solutions function can compute a set of solutions satisfying the invariant. For example:

•

$[0,F,F,St3,St4,St4,St4],$

•

$[0,F,F,St3,St4,St4,St5],$

•

$[0,F,F,St4,St3,St4,St4],$

•

$[0,F,F,St4,St3,St5,St4].$

Next, the model is iteratively repaired until no further invariant violations can be detected. To detect invariant violations, the State-Space function is used to approximate the state space of the current model, and the Faulty-Transitions function is used to collect invariant violations in the state space. The following four faulty state transitions are found:

•

$[1,T,F,St3,St4,St4,St4] \xrightarrow {Bus\_Controller} [\underline{0},\underline{F},F,\underline{St4},St4,St4,St4],$

•

$[1,T,F,St3,St4,St4,St5] \xrightarrow {Bus\_Controller} [\underline{0},\underline{F},F,\underline{St4},St4,St4,St5],$

•

$[2,F,T,St4,St3,St4,St4] \xrightarrow {Bus\_Controller} [\underline{0},F,\underline{F},St4,\underline{St4},St4,St4],$

•

$[2,F,T,St4,St3,St5,St4] \xrightarrow {Bus\_Controller} [\underline{0},F,\underline{F},St4,\underline{St4},St5,St4].$

The above state transitions violate the invariant, because they allow the two buses to stop at St4 at the same time. To repair the above state transitions, their post-states will be replaced with other candidate post-states satisfying the invariant. All correct states in the state space, which are collected using the Correct-States function, and the states computed using the Invariant-Solutions function, are considered as candidate post-states. For each faulty state transition, the Atomic-Modifications function can use the candidate post-states to generate a set of candidate modifications, and the Repair-Score function can estimate their repair scores. For example, to repair $[1,T,F,St3,St4,St4,St4] \xrightarrow {Bus\_Controller} [\underline{0},\underline{F},F,\underline{St4},St4,St4,St4]$ using the four solutions, the following candidate modifications with repair scores ($\sigma$) are generated:

•

$[1,T,F,St3,St4,St4,St4] \xrightarrow {Bus\_Controller} [\underline{0},\underline{F},F,St3,St4,St4,St4]$ ($\sigma = 0.458$),

•

$[1,T,F,St3,St4,St4,St4] \xrightarrow {Bus\_Controller} [\underline{0},\underline{F},F,St3,St4,St4,\underline{St5}]$ ($\sigma = 0.274$),

•

$[1,T,F,St3,St4,St4,St4] \xrightarrow {Bus\_Controller} [\underline{0},\underline{F},F,\underline{St4},\underline{St3},St4,St4]$ ($\sigma = 0.403$),

•

$[1,T,F,St3,St4,St4,St4] \xrightarrow {Bus\_Controller} [\underline{0},\underline{F},F,\underline{St4},\underline{St3},\underline{St5},St4]$ ($\sigma = 0.159$).

As the first modification has the highest $\sigma$ value, it is selected to repair the model. Similarly, for all the faulty state transitions, the Modification-Selection function can select the following modifications, which are of the form “pre-state $\xrightarrow {\text{operation}}$ faulty post-state $\hookrightarrow$ modified post-state”:

•

$[1,T,F,St3,St4,St4,St4] \xrightarrow {Bus\_Controller}$

$[\underline{0},\underline{F},F,\underline{St4},St4,St4,St4] \hookrightarrow [\underline{0},\underline{F},F,St3,St4,St4,St4],$

•

$[1,T,F,St3,St4,St4,St5] \xrightarrow {Bus\_Controller}$

$[\underline{0},\underline{F},F,\underline{St4},St4,St4,St5] \hookrightarrow [\underline{0},\underline{F},F,St3,St4,St4,St5],$

•

$[2,F,T,St4,St3,St4,St4] \xrightarrow {Bus\_Controller}$

$[\underline{0},F,\underline{F},St4,\underline{St4},St4,St4] \hookrightarrow [\underline{0},F,\underline{F},St4,St3,St4,St4],$

•

$[2,F,T,St4,St3,St5,St4] \xrightarrow {Bus\_Controller}$

$[\underline{0},F,\underline{F},St4,\underline{St4},St5,St4] \hookrightarrow [\underline{0},F,\underline{F},St4,St3,St5,St4].$

The Update function can use the above atomic modifications to repair the Bus_Controller operation, leading to the repaired operation in Figure 4. As a result, the repaired operation no longer triggers any invariant violations. A problem is that each atomic modification can only remove one invariant violation, resulting in tedious code. The code can be simplified in the refactoring phase.

Fig. 4.

4.3 The Refactoring Phase

The atomic modifications are refactored using the Refactoring function, i.e., Algorithm 2. We use $[1,T,F,St3,St4,St4,St4] \xrightarrow {Bus\_Controller} [\underline{0},\underline{F},F,\underline{St4},St4,St4,St4] \hookrightarrow [\underline{0},\underline{F},F,St3,St4,St4,St4]$ as an example. The Modifications-To-Predicates function converts the pre-state $[1$, T, F, St3, St4, St4, $St4]$ and the modified state $[0$, F, F, St3, St4, St4, $St4]$ to the predicate “Pre_Selected_Bus = 0 & Pre_Moving(1) = FALSE & Pre_Moving(2) = FALSE & Pre_Current_Location(1) = St3 & Pre_Current_Location(2) = St4 & Pre_Next_Location(1) = St4 & Pre_Next_Location(2) = St4 & Mod_Selected_Bus = 0 & Mod_Moving(1) = FALSE & Mod_Moving(2) = FALSE & Mod_Current_Location(1) = St3 & Mod_Current_Location(2) = St4 & Mod_Next_Location(1) = St4 & Mod_Next_Location(2) = St4.”

Next, the CFG-Predicates function generates a set of predicates using context-free grammars, e.g., Mod_Selected_Bus = 0, Mod_Selected_Bus = 1, Mod_Selected_Bus = Pre_Selected_Bus, Mod_Selected_Bus = 1 - Pre_Selected_Bus, Mod_Moving(1) = TRUE, Mod_Moving(1) = FALSE, Mod_Next_Location(1) = St4, Mod_Next_Location(1) = St5, and Mod_Next_Location(1) = Pre_Next_Location(2). Based on the predicates, the Candidate-Predicates function will find candidate predicates that describe the modifications. For example, the modification $[1,T,F,St3,St4,St4,St4] \xrightarrow {Bus\_Controller} [\underline{0},\underline{F},F,\underline{St4},St4,St4,St4] \hookrightarrow [\underline{0},\underline{F},F,St3,St4,St4,St4]$ can be described using $Mod\_Selected\_Bus = 0$, $Mod\_Selected\_Bus = 1 - Pre\_Selected\_Bus$, $Mod\_Moving(1) = FALSE$, $Mod\_Next\_Location(1) = St4$, $Mod\_Next\_Location(1) = Pre\_Next\_Location(2)$, and so on.

The Best-Partition function can produce the best partitions of modification predicates using the following predicates:

•

Mod_Selected_Bus = 0,

•

Mod_Moving(1) = FALSE,

•

Mod_Moving(2) = FALSE,

•

Mod_Current_Location(1) = PRE_Current_Location(1),

•

Mod_Current_Location(2) = PRE_Current_Location(2),

•

Mod_Next_Location(1) = PRE_Next_Location(1),

•

Mod_Next_Location(2) = PRE_Next_Location(2).

The above predicates can lead to the best partition, because they are satisfied by all the modifications, leading to a partition that includes all the modifications and an empty partition. As the second partition is empty, the current partitions are the best partitions, and no further partitions will be produced. As a counterexample, if another candidate predicate “Mod_Next_Location(1) = St4” is used to split the modification predicates, then the resulting two partitions will include three modifications and one modification, respectively, which are not the best partitions.

Finally, the Compound-Modification can convert the partition to a compound modification that covers the following pre-states:

•

$[1,T,F,St3,St4,St4,St4],$

•

$[1,T,F,St3,St4,St4,St5],$

•

$[2,F,T,St4,St3,St4,St4],$

•

$[2,F,T,St4,St3,St5,St4].$

The compound modification uses the following refactored substitutions to generate post-states:

•

Selected_Bus := 0,

•

Moving(1) := FALSE,

•

Moving(2) := FALSE.

The above compound modification means that if a bus is already at St4, and the other bus is currently on the way to St4, then the latter should stop at the current location. After returning the compound modification, Algorithm 2 terminates.

4.4 The Update Phase

During the update phase, the compound modification is applied to the original Bus_Controller operation, leading to the repaired operation in Figure 5. Due to the existence of the extra IF-THEN-ELSE-END construct, the compound modification can disable the faulty post-states while maintaining other correct behaviours of the original operation. For any pre-states that are covered by the compound modification, their post-states are determined using the refactored substitution. For any other pre-states, their post-states are determined using the original substitution.

Fig. 5.

5 Evaluation

This section presents an empirical study of the AMBM tool. The experiments consist of two parts: Part I includes experiments of repair evaluator training, and Part II includes experiments of the entire abstract machine modification processes.

5.1 Purpose of the Evaluation

The objective of Part I was to demonstrate that the classifiers can distinguish between possible state transitions and impossible state transitions. Part I evaluated five types of classifiers, including Bernoulli Naive Bayes (BNB) classifiers, Logistic Regression (LR) classifiers, Support Vector Machines (SVM) with radial basis function kernels, Random Forests (RF), and Silas, on repair evaluator training tasks. To compare this work with our previous work [10], we conducted the same evaluation for the traditional B-repair with Classification And Regression Trees (CART) and ResNet. We used a set of correct abstract machines (in Table 1, which will be explained in Section 5.2) to generate training and test sets. First, for each subject, the model checker was used to approximate a state space of the abstract machine, and possible transitions were extracted from the state space. Second, impossible transitions were randomly generated. This process did not randomly generate any new states but used existing states to randomly generate impossible state transitions. For example, if we have a variable $x = 0$ and an operation $Add2 = {\rm PRE}~x \lt 3~{\rm THEN}~x := x + 2~{\rm END}$, then possible state transitions will include $x = 0 \xrightarrow {Add2} x = 2$ and $x = 2 \xrightarrow {Add2} x = 4$, and existing states will be $x = 0$, $x = 2$ and $x = 4$. Impossible state transitions will include $x = 0 \xrightarrow {Add2} x = 0$, $x = 0 \xrightarrow {Add2} x = 4$, $x = 2 \xrightarrow {Add2} x = 0$, $x = 2 \xrightarrow {Add2} x = 2$, $x = 4 \xrightarrow {Add2} x = 0$, $x = 4 \xrightarrow {Add2} x = 2$, and $x = 4 \xrightarrow {Add2} x = 4$. In our experiments, to avoid combinatorial explosion, existing states were randomly selected to generate impossible state transitions, and the generation process would terminate if the number of generated impossible state transitions reached the number of possible state transitions. As a result, 50% of the transitions were of the “possible” class, and the remaining 50% of the transitions were of the “impossible” class. All transitions were shuffled and split into a training set and a test set that contained 80% and 20% of the transitions, respectively. Third, the five classifiers were trained using the training set, and consistent hyper-parameters were used during the training.³ Finally, the trained classifiers were evaluated on the test set, and evaluation metrics included the classification accuracy and the area under the receiver operating characteristic curve (ROC-AUC) [12].

Table 1.

Subject	Source File (.mch)	# LOC	# Var.	# Inv.	# Ope.
			ProB Public Examples
M01	Paperround_simple	40	2 sets	2	3
M02	Sortarray	41	1 array of 4 integers	1	3
M03	ParallelModelCheckTest	47	1 array of 5 integers; an integer	4	3
M04	POR_TwoThreads_WithSync	48	4 integers	4	3
M05	progress	55	3 sets	6	4
M06	monitor2	56	3 sets	6	4
M07	BridgeTransitions	57	3 integers	4	5
M08	InvolvedSequences2	60	4 integers	4	5
M09	club	75	2 sets	4	7
M10	ADD4	85	5 integers; 1 bool	7	9
M11	TestBZTT3	85	1 array of 10 elements; 1 element; 1 set	3	8
M12	scheduler6	90	3 sets	10	5
M13	BinomialCoefficientConcurrent	109	5 integers	5	8
M14	Lift2	119	2 integers; 2 bools; 1 set	5	8
M15	Mikrowelle	203	an integer; 4 bools; 1 element	8	12
M16	CSM	211	14 integers	15	13
M17	GSM_revue	215	two arrays of 4 bools and 4 elements, etc.	20	6
M18	Cruise_finite1	482	1 integer; 2 sets; 12 bools	39	26
Wireless Network Protocol Models
M19	CXCC	124	3 functions	7	6
M20	BarRel	166	4 functions; 1 set	6	9
ABZ Automotive Models
M21	GenericTimersMC	79	1 function	1	5
M22	BlinkLamps	151	3 integers; 1 set; 1 bool	11	7
M23	PitmanController	373	1 integer; 1 set; 5 elements; 2 bools	21	10
M24	PitmanController_TIME_MC	489	1 function; 3 integers; 1 set; 3 elements; 2 bools	22	11

Table 1. Dataset of Abstract Machines

Part II evaluated the whole abstract machine modification process with the five classifiers and the modification refactoring function. We used fault seeding and removal to evaluate the algorithms. First, for each subject, faults were randomly seeded into 100 deterministic transitions of the correct abstract machine, leading to a faulty machine that could trigger 100 invariant violations and a set of standard answers indicating correct modifications. For instance, if $S_{pre} \xrightarrow {\alpha } S_{\top }$ is a correct transition, and $S_{\bot }$ is a state that triggers an invariant violation, then a fault will be seeded by replacing $S_{pre} \xrightarrow {\alpha } S_{\top }$ with $S_{pre} \xrightarrow {\alpha } S_{\bot }$, and the corresponding standard answer is $[S_{pre},\alpha ,S_{\bot },S_{\top }]$, which means that $S_{pre} \xrightarrow {\alpha } S_{\bot }$ is a faulty transition and should be repaired by replacing $S_{\bot }$ with $S_{\top }$. Faulty state transitions were randomly injected into a correct abstract machine using the following steps:

•

The type constraints of variables are extracted from the model.

•

A large number of states satisfying the type constraints are randomly generated. The states are verified against the invariant of the abstract machine, and faulty states that violate the invariant are collected.

•

A faulty state $S_{\bot }$ is randomly selected, and a correct state transition $S_{pre} \xrightarrow {\alpha } S_{\top }$ that is generated by the abstract machine is randomly selected as a position to inject $S_{\bot }$. To inject $S_{\bot }$ into $S_{pre} \xrightarrow {\alpha } S_{\top }$, we produce an atomic modification $[\alpha ,S_{pre},S_{\top },S_{\bot }]$ and use the Update function in Section 3.1 to apply the modification.

The above method was used to make 10 faulty machines based on each correct machine in Table 1. Consequently, $10 \times 24 = 240$ faulty machines were made. In total, 240 faulty machines with 24,000 faulty transitions were produced. After using AMBM to repair all faulty machines, suggested modifications were compared with the standard answers and evaluated using the following metrics:

•

modification accuracy $MA = N^{val}_{cor}/N^{val}_{tot}$, where $N^{val}_{cor}$ denotes the number of correctly modified values with reference to the standard answers, and $N^{val}_{tot}$ denotes the total number of values;

•

refactoring generality $RG = 1 - N_{rec}/N_{0}$, where $N_{0}$ denotes the number of modifications before refactoring, and $N_{rec}$ denotes the number of modifications after refactoring;

•

average repair time $ART = T/N_{F}$, where T denotes running time and $N_{F}$ denotes the number of faults.

The intuition of MA is that the suggested atomic modifications are expected to coincide with standard answers. The intuition of RG is that the suggested atomic modifications are expected to be generalised as fewer compound modifications. When computing RG, both correct and incorrect atomic modifications are taken into account. Generalisation is the outcome of refactoring. Before refactoring, all modifications are atomic modifications. After refactoring, a number of atomic modifications have been replaced with fewer compound modifications. As a single compound modification can cover the functions of multiple atomic modifications, it can reduce the total number of modifications and make the modifications more expressive. In the best case, a compound modification covers all atomic modifications, leading to $N_{rec} = 1$ and $RG = 1 - 1/N_{0}$ (i.e., the maximum RG). In the worst case, no compound modification is produced, leading to $N_{rec} = N_{0}$ and $RG = 0$ (i.e., the minimum RG). With regard to ART, the intuition is that the average time for eliminating a fault should be reasonable.

5.2 Experimental Settings

All experiments were run on a machine equipped with Intel(R) Core(TM) i5-4670 CPU (4 cores, 3.40 GHz) and 8 GB memory. The operating system was Ubuntu Desktop 16.04.⁴ A specific dataset to evaluate our solution was constructed using the materials from the ProB Public Examples repository.⁵ We used the following filters to select machines:

•

Filter #1 selects machines that are syntactically correct.

•

Filter #2 selects machines that have variables, invariants, and operations.

•

Filter #3 selects machines that approximate at most 30K state transitions (to avoid memory exhaustion on our equipment).

•

Filter #4 selects machines that approximate at least 500 deterministic state transitions. Note that the selected machines may include non-determinism as well. To get more machines, the scales of small machines may be expanded by adjusting their set cardinalities and integer scopes.

•

Filter #5 selects machines that can pass the model checking.

•

Filter #6 selects machines that only have Boolean values, integers, distinct elements, and first-order sets as single variables or arrays.

After using the above filters, we manually removed redundant machines and machines without actual meanings. Consequently, we obtained 18 well-formed and error-free machines for evaluation, and the size of their state space (i.e., the number of states plus the number of state transitions) ranged from 1K to 27K. Table 1 provides information on the 18 machines, i.e., M01–M18, including the source file of each machine, the number of lines of code (# LOC), the number of variables (# Var.), the number of invariants (# Inv.), and the number of operations (# Ope.). A subset of the abstract machines and their essential information (e.g., source files and # LOC) were used by Reference [10], where # LOC was counted after using ProB [22] to convert the source files into the pretty-printed format.

Additionally, we added M19–M24 into the evaluation dataset. M19 and M20 were relevant to the wireless network protocols studied by References [30, 31]. The original B models of M19 and M20 were found from the repository shared by [19].⁶ M21–M24 were originally a part of automotive models developed by Reference [23].⁷ We adapted M19–M24 to suit the capabilities of AMBM. For example, partial functions were replaced by total functions, as partial functions could not be processed by the repair evaluators. Besides, composite models were expanded as whole models as AMBM could not process the INCLUDES mechanism.

5.3 Results and Discussions

5.3.1 Results of Part I.

Table 2 shows the results of repair evaluator training experiments, including the number of training and test examples (i.e., # Examples), the ROC-AUC of classifiers in each subject, the total ROC-AUC, and the total classification accuracy (CA). We observed that with regard to ROC-AUC and CA, RF and Silas obtained the best results, i.e., over 99% ROC-AUC and over 98% CA, and the difference between Silas and RF on these metrics was insignificant. Besides, SVM performed well and obtained over 96% ROC-AUC and over 91% CA, whereas LR and BNB fell behind the others. Additionally, both RF and Silas gained better ROC-AUC and CA than the two traditional repair evaluators of B-repair, i.e., ResNet and CART. In summary, RF and Silas showed high predictive performance on the repair evaluator training tasks.

Table 2.

Subject	# Examples	BNB	LR	SVM	RF	Silas	ResNet	CART
		ROC-AUC
M01	24,530	0.615	0.619	1.000	1.000	1.000	0.961	0.973
M02	22,500	0.722	0.736	0.984	0.996	0.990	0.986	0.986
M03	4,608	0.729	0.691	0.965	1.000	1.000	0.976	0.953
M04	10,202	0.660	0.668	0.936	1.000	1.000	0.955	0.955
M05	15,360	0.640	0.645	0.997	0.999	0.999	0.990	0.961
M06	14,160	0.584	0.593	0.999	0.999	0.999	0.979	0.967
M07	4,550	0.663	0.654	0.725	0.965	0.995	0.819	0.840
M08	2,024	0.673	0.709	0.905	0.998	0.988	0.925	0.931
M09	14,580	0.536	0.539	0.994	0.999	0.999	0.992	0.999
M10	6,592	0.625	0.636	0.993	1.000	0.998	0.962	0.998
M11	15,860	0.848	0.868	0.930	0.997	0.995	0.948	0.968
M12	20,976	0.566	0.563	0.996	1.000	1.000	0.996	0.963
M13	10,210	0.536	0.532	0.621	0.999	1.000	0.794	0.813
M14	11,918	0.802	0.814	0.997	1.000	0.997	0.994	0.990
M15	1,944	0.777	0.837	0.844	0.999	0.988	0.886	0.908
M16	2,456	0.642	0.666	0.850	0.999	0.999	0.905	0.912
M17	9,056	0.679	0.735	0.956	0.998	0.996	0.946	0.937
M18	51,384	0.783	0.822	0.995	1.000	0.998	0.975	0.996
M19	11,378	0.736	0.759	1.000	0.999	0.999	0.938	0.966
M20	5,244	0.771	0.786	0.999	1.000	1.000	0.953	0.992
M21	15,662	0.746	0.753	0.982	0.979	0.981	0.924	0.978
M22	5,002	0.884	0.890	0.983	0.996	0.993	0.912	0.968
M23	1,024	0.615	0.718	0.910	0.977	0.937	0.833	0.941
M24	33,944	0.706	0.732	0.999	1.000	0.999	0.952	0.988
All	315,164	0.694	0.711	0.969	0.998	0.997	0.958	0.970
Classification Accuracy		0.646	0.665	0.917	0.980	0.980	0.908	0.941

Table 2. Results of Repair Evaluator Training

5.3.2 Results of Part II.

Table 3 shows modification accuracies (MA), refactoring generalities (RG), and average repair time (ART) of the abstract machine batch modification experiments. We observed that with regard to MA, RF was the leading classifier with over 88% MA, followed by Silas with over 83% MA. The use of BNB and LR resulted in significantly bad accuracy, because the two models were too simple to approximate the encodings of state transitions. SVM obtained better MA, because its kernel functions could form more complex transforms than BNB and LR. Besides, Silas showed better generalisation capability than RF. The use of RF resulted in significantly bad generality probably because the strong fitting ability caused overfitting. By contrast, BNB and LR had significantly lower accuracy and higher generality, because their fitting ability was too weak to cause overfitting.

Table 3.

Subject	BNB	LR	SVM	RF	Silas	BNB	LR	SVM	RF	Silas	BNB	LR	SVM	RF	Silas
	Modification Accuracy					Refactoring Generality					Average Repair Time (s)
M01	0.388	0.388	0.611	0.943	0.834	0.959	0.959	0.980	0.965	0.965	2.965	2.965	6.696	3.033	3.157
M02	0.460	0.456	0.398	0.690	0.659	0.990	0.990	0.495	0.744	0.868	2.069	2.066	4.982	2.142	2.138
M03	0.297	0.249	0.844	0.878	0.854	0.952	0.970	0.359	0.215	0.545	3.135	3.111	6.433	3.161	3.619
M04	0.018	0.013	0.187	0.975	0.948	0.980	0.978	0.583	0.137	0.331	10.694	9.161	43.670	3.168	3.094
M05	0.313	0.313	0.457	0.969	0.866	0.960	0.960	0.960	0.895	0.878	2.018	2.023	3.222	1.758	1.828
M06	0.141	0.138	0.333	0.649	0.383	0.960	0.960	0.960	0.841	0.929	1.417	1.398	2.878	1.498	1.530
M07	0.271	0.284	0.518	0.709	0.902	0.960	0.960	0.920	0.361	0.529	0.716	0.715	2.749	0.760	0.748
M08	0.146	0.155	0.456	0.970	0.929	0.961	0.961	0.532	0.342	0.773	2.682	2.584	2.992	3.010	3.350
M09	0.214	0.214	0.399	0.702	0.428	0.876	0.876	0.950	0.861	0.920	1.534	1.531	2.715	1.597	1.684
M10	0.305	0.314	0.943	0.999	0.999	0.920	0.920	0.340	0.359	0.872	0.649	0.648	0.830	0.668	0.688
M11	0.453	0.652	0.656	0.969	0.952	0.953	0.953	0.912	0.695	0.793	4.525	4.538	6.629	4.578	4.646
M12	0.142	0.148	0.402	0.946	0.916	0.950	0.952	0.959	0.845	0.841	2.421	2.390	6.280	2.384	2.451
M13	0.108	0.080	0.148	0.900	0.976	0.930	0.930	0.885	0.159	0.466	1.629	1.616	16.160	1.703	1.660
M14	0.365	0.362	0.799	0.997	0.981	0.920	0.920	0.591	0.499	0.715	3.345	3.504	4.456	3.244	3.444
M15	0.657	0.652	0.671	0.909	0.733	0.890	0.890	0.882	0.496	0.817	3.389	3.353	4.697	3.277	3.558
M16	0.541	0.449	0.714	0.973	0.975	0.871	0.871	0.367	0.229	0.602	2.019	1.995	4.820	2.049	2.267
M17	0.758	0.758	0.917	0.995	0.990	0.980	0.980	0.839	0.792	0.805	2.159	2.152	2.741	2.233	2.283
M18	0.425	0.409	0.811	0.999	0.990	0.802	0.802	0.719	0.795	0.776	10.034	9.970	26.024	9.518	9.803
M19	0.544	0.551	0.857	0.955	0.905	0.958	0.958	0.837	0.746	0.763	4.998	5.093	5.389	5.005	5.004
M20	0.518	0.551	0.927	0.979	0.977	0.913	0.908	0.673	0.700	0.740	3.517	3.502	4.305	3.862	3.694
M21	0.290	0.290	0.510	0.484	0.445	0.980	0.980	0.928	0.773	0.873	0.952	0.955	1.266	0.958	0.957
M22	0.458	0.474	0.712	0.916	0.763	0.951	0.951	0.904	0.859	0.919	1.189	1.201	1.300	1.255	1.251
M23	0.510	0.481	0.711	0.788	0.746	0.891	0.901	0.580	0.531	0.537	1.685	1.607	1.600	1.591	1.591
M24	0.411	0.493	0.893	0.990	0.986	0.890	0.890	0.733	0.559	0.576	3.984	4.204	11.129	4.037	4.028
Average	0.364	0.370	0.620	0.887	0.839	0.933	0.934	0.745	0.600	0.743	3.072	3.012	7.248	2.770	2.853

Table 3. Results of Abstract Machine Batch Modification

Regarding the performance of AMBM, all classifiers required similar ART of approximately 3 seconds with the exception of SVM, which required over 8 seconds. These results demonstrate the feasibility and the pertinence of the AMBM approach. More specifically, these results suggest that among the classifiers considered in the experiments, Silas is the most suitable for approximating the repair scores of abstract machines, as it has both high accuracy and generalisation capability.

The above experiments have demonstrated that the modification accuracy of B model repair can achieve a high level by means of a well-trained classifier, while the refactoring generality of modifications is dependent upon the classifier. Considering both the modification accuracy and the refactoring generality, Silas has the best performance on model repair tasks. Moreover, the finding implies that model repair processes can benefit from the repair evaluators. With repair evaluator training and constraint solving, machines are able to automatically produce repairs and select well-behaved repairs that eliminate faults in the models and preserve the state spaces of the models as much as possible. As the modification accuracy and the refactoring generality have achieved 0.8 and 0.7, respectively, we infer that a large number of suggested modifications are correct and simple. Furthermore, the study has shown the effectiveness of the AMBM algorithm. AMBM is able to automatically eliminate hundreds of faults within a reasonable time, while manually eliminating these faults may require a considerable amount of human effort. Thus, our study suggests that AMBM can assist programmers in designing abstract machines and possibly realise automated model generation via step-by-step incremental design repairs.

Finally, Figure 6 compared AMBM and the traditional B-repair with respect to accuracy and performance. Figure 6(a) indicated that AMBM with RF and Silas gained better accuracies than the traditional B-repair with CART and ResNet. Moreover, Silas had better accuracy than CART, because Silas’s internal decision trees could model more data types than CART. Additionally, ResNet failed to suggest accurate modifications probably because such a neural network architecture is not appropriate for learning small data. Figure 6(b) indicated that AMBM’s performance was significantly better than B-repair. B-repair’s performance bottleneck was mainly caused by successive repairs, i.e., after eliminating a single invariant violation, a model-checking process was started to detect the next invariant violation. Consequently, B-repair was slowed down when the model repair processes were run repeatedly. AMBM solved this problem by detecting multiple invariant violations during only one model-checking process so the number of required model checking processes could be reduced. Overall, the above results showed that AMBM had higher accuracy and performance than the traditional B-repair.

Fig. 6.

6 Discussions

In this section, we discuss a feasible extension to generally repair non-determinism. Additionally, we discuss the limitations of our approach and compare this study with other existing studies.

6.1 Modifications on Non-determinism

As previously mentioned, the modification operator eliminates invariant violations triggered by deterministic state transitions. To repair non-determinism, we suggest a new modification operator. For an operation, given a pre-state p and an erroneous post state q, a modification operator modifies q to a correct post state r. The modification operator can be described using a triple $[u,``modification,^{\prime \prime }r]$, where u is the concatenation of p and q. The triple means that the faulty state pair $(p,q)$ is rewritten as a correct state pair $(p,r)$ using the modification operator. To control the application domain of modification, u is converted to the following condition:

\begin{equation} v^{pre}_1 = p[v_1]~\wedge ~v^{post}_1 = q[v_1]~\wedge ~\cdots ~\wedge ~v^{pre}_n = p[v_n]~\wedge ~v^{post}_n = q[v_n], \end{equation}

(5)

where $v^{pre}_i~(i=1,2,\ldots ,n)$ is a pre-variable identifier of $v_i$, $v^{post}_i~(i=1,2,\ldots ,n)$ is a post-variable identifier of $v_i$, $p[v_i]$ is the value of $v_i$ in p, and $q[v_i]$ is the value of $v_i$ in q. Moreover, r can be converted to the following substitution:

\begin{equation} v^{post}_1 := r[v_1]~;~\ldots ~;~v^{post}_n := r[v_n], \end{equation}

(6)

where $v^{post}_i~(i=1,2,\ldots ,n)$ is a post-variable identifier of $v_i$, and $r[v_i]$ is the value of $v_i$ in r. The Non-Determinism Modification (NDM) operator to repair non-determinism is defined as follows: Given M modification operators that are described using M triples $[u_j,``modification,^{\prime \prime }r_j]~(j=1,\ldots ,M)$, we use Equation (5) to convert each $u_j$ to a condition $U_j$ and use Equation (6) to convert each $r_j$ to a substitution $R_j$. Given an operation $\alpha = {\rm {\bf PRE}}~~P~~{\rm {\bf THEN}}~~S~~{\rm {\bf END}}$, where P is a pre-condition, and S is a substitution, an NDM can be applied to $\alpha$ using the following template:

In particular, if the original operation does not have a pre-condition, then the repaired operation does not have the outer “PRE-THEN-END” statement. The above template uses the “VAR-IN-END” construct with two blocks of temporary variables, i.e., “$\lt$pre-variables$\gt$” and “$\lt$post-variables$\gt ,$” to store pre-values and post-values. Before applying the substitution S, the values of the state variables are the pre-values, so “$\lt$pre-variables$\gt$ := $\lt$state-variables$\gt$” can assign the pre-values to the pre-variables. After applying S, the values of the state variables become the post-values, so “$\lt$post-variables$\gt$ := $\lt$state-variables$\gt$” can assign the post-values to the post-variables. As a result, the pre-values and the post-values are obtained. After using the conditions $U_j~(j=1,\ldots ,M)$ and the substitutions $R_j~(j=1,\ldots ,M)$ to modify the values of the post-variables, the modified values are assigned to state variables, leading to a new post-state.

The use of NDM requires a slight change on Algorithm 1, i.e., the Update function is adapted to use the NDM template to repair faulty operations. The following example of autonomous vehicle control model will demonstrate the use of NDM. The model describes how to control a bus in a city. Figure 7 visualises a city map, which has a set Location containing five bus stations, i.e., S0, S1, S2, S3, and S4, and five other locations, i.e., C0, C1, C2, C3, and C4. The locations are linked by a set Street containing bidirectional edges. The model has a variable “loc” recording the location of the bus. The value of loc is changed using the following “move” operation:

Fig. 7.

Suppose that the bus station S4 is temporarily unavailable. The following invariant is used to specify the constraint:

The invariant is violated, because the move operation can change the value of loc from S3, C3, and C4 to S4. Corresponding faulty state transitions and feasible modifications (indicated by “$\hookrightarrow$”) are listed below.

The meaning of the modifications is that if the bus is scheduled to move from S3, C3, or C4 to S4, the bus will be rescheduled to move from S3 to S2, from C3 to C4 or from C4 to C3, respectively. They are applied to the move operation using the NDM operator, leading to the following repaired operation:

The above example demonstrates that even though multiple state transitions can share a common post-state that triggers an invariant violation, the NDM can repair each state transition separately. This is because the conditions of NDM can ensure that each repair acts on one and only one state transition.

6.2 Limitations

As discussed in Section 2.2, the repair evaluators cannot deal with unseen elements in infinite sets. When training the repair evaluators, unseen elements are avoided by reducing infinite sets into finite sets that contain all elements occurred in a state space. For example, if a state space only includes integers 0, 1, 2, and 4, then the finite set {0, 1, 2, 4} is used to generate one-hot encodings, e.g., {1, 4} is encoded as [0, 1, 0, 1]. However, unseen elements such as 3 and 5 are not preserved in one-hot encodings, e.g., {1, 3, 4, 5} is treated as {1, 4} and encoded as [0, 1, 0, 1]. Consequently, the repair evaluators cannot distinguish between {1, 4} and {1, 3, 4, 5}. Besides, to avoid unseen elements, AMBM can control the constraint solver to find modifications that only contain elements occurring in a given state space, but such a restriction may disable a few required modifications. For example, suppose that a faulty operation Inc attempts to increase a set of integers by 1, the operation yields correct state transitions such as $\lbrace 0\rbrace \xrightarrow {Inc} \lbrace 1\rbrace$, $\lbrace 1\rbrace \xrightarrow {Inc} \lbrace 2\rbrace$ and $\lbrace 0, 1\rbrace \xrightarrow {Inc} \lbrace 1, 2\rbrace$ and faulty state transitions such as $\lbrace 2\rbrace \xrightarrow {Inc} \lbrace \rbrace$ and $\lbrace 1, 2\rbrace \xrightarrow {Inc} \lbrace 2\rbrace$. The faulty state transitions should be repaired as $\lbrace 2\rbrace \xrightarrow {Inc} \lbrace 3\rbrace$ and $\lbrace 1, 2\rbrace \xrightarrow {Inc} \lbrace 2, 3\rbrace$, respectively. Unfortunately, AMBM cannot suggest the required repairs, because 3 does not occur in the given state transitions. A possible approach to solve this problem is model repair based on inductive programming [33]. Given a component library containing essential symbols such as 0, 1, +, $-$, * and =, inductive programming can synthesise the function $x^{\prime } = x + 1$, where x and $x^{\prime }$ are integers, to generalise the correct state transitions of Inc. When $x = 2$, a new element $x^{\prime } = 3$ can be inferred by the function, so the required modifications can be constructed. The above discussion indicates that a merger of inductive programming and AMBM can be a way to solve the restrictions of infinite sets.

The repair of higher-order sets, e.g., sets of sets, is another limitation to the applicability of AMBM. The number of candidate modifications with higher-order sets can be huge due to combinatorial explosions. For example, if a set of sets is constructed by integers $1, 2, \ldots , n$, then its value will be a member of $\mathcal {P}^2(\lbrace 1, 2, \ldots , n\rbrace)$, where $\mathcal {P}$ represents the power set. As the cardinality of $\mathcal {P}^2(\lbrace 1, 2, \ldots , n\rbrace)$ is $2^{2^n}$, it is unrealistic to enumerate the members when n is large. Instead of enumeration, AMBM only uses components that occurred in a given state space to generate modifications. As a result, a required modification will be missing if it includes a component not in the state space. For example, suppose that a faulty operation Inc_Send attempts to increase sets of integers by 1 and transfer the sets from a sender to a receiver, with two sets of sets $[Sender,~Receiver]$ to represent states, the operation yields correct state transitions such as:

•

$[\lbrace \lbrace 0\rbrace ,~\lbrace 1\rbrace ,~\lbrace 0,~1\rbrace \rbrace ,~\lbrace \rbrace ] \xrightarrow {Inc\_Send} [\lbrace \lbrace 1\rbrace ,~\lbrace 0,~1\rbrace \rbrace ,~\lbrace \lbrace 1\rbrace \rbrace ],$

•

$[\lbrace \lbrace 1\rbrace ,~\lbrace 0,~1\rbrace \rbrace ,~\lbrace \lbrace 1\rbrace \rbrace ] \xrightarrow {Inc\_Send} [\lbrace \lbrace 0,~1\rbrace \rbrace ,~\lbrace \lbrace 1\rbrace ,~\lbrace 2\rbrace \rbrace ],$

•

$[\lbrace \lbrace 0,~1\rbrace \rbrace ,~\lbrace \lbrace 1\rbrace ,~\lbrace 2\rbrace \rbrace ] \xrightarrow {Inc\_Send} [\lbrace \rbrace ,~\lbrace \lbrace 1\rbrace ,~\lbrace 2\rbrace ,~\lbrace 1,~2\rbrace \rbrace ],$

•

$[\lbrace \lbrace 1\rbrace ,~\lbrace 2\rbrace ,~\lbrace 1,~2\rbrace \rbrace ,~\lbrace \rbrace ] \xrightarrow {Inc\_Send} [\lbrace \lbrace 2\rbrace ,~\lbrace 1,~2\rbrace \rbrace ,~\lbrace \lbrace 2\rbrace \rbrace ],$

and faulty state transitions such as:

•

$[\lbrace \lbrace 2\rbrace ,~\lbrace 1,~2\rbrace \rbrace ,~\lbrace \lbrace 2\rbrace \rbrace ] \xrightarrow {Inc\_Send} [\lbrace \lbrace 1,~2\rbrace \rbrace ,~\lbrace \lbrace 2\rbrace ,~\lbrace \rbrace \rbrace ],$

•

$[\lbrace \lbrace 1,~2\rbrace \rbrace ,~\lbrace \lbrace 2\rbrace ,~\lbrace \rbrace \rbrace ] \xrightarrow {Inc\_Send} [\lbrace \rbrace ,~\lbrace \lbrace 2\rbrace ,~\lbrace \rbrace \rbrace ].$

The faulty state transitions should be repaired as follows:

•

$[\lbrace \lbrace 1,~2\rbrace \rbrace ,~\lbrace \lbrace 2\rbrace ,~\lbrace 3\rbrace \rbrace ] \xrightarrow {Inc\_Send} [\lbrace \rbrace ,~\lbrace \lbrace 2\rbrace ,~\lbrace 3\rbrace ,~\lbrace 2,~3\rbrace \rbrace ].$

Unfortunately, AMBM cannot suggest the required repairs, because {3} and {2, 3} are unseen in the given state space. To suggest repairs with such unseen components, inductive programming is probably feasible, because it can construct functions to infer unseen components, as discussed above.

Besides, as the repair evaluators treat sets in sets as string elements, any unseen sets in sets cannot be encoded appropriately. To encode such unseen components, a feasible method is to introduce repair evaluators based on graphs. Figure 8 shows an example of the conversion from sets of sets (or higher-order sets) to a graph. Compared to the string representation, the advantage of graph representation is that graphs can link two components at multiple levels so unseen components can be represented by linking with their sub-components. For example, both {{1}, {2}, {1, 2}} and {{2}, {3}, {2, 3}} have a direct link from the set {2} and two indirect links from the element 2. Even though {{2}, {3}, {2, 3}} is unseen in the state space, its encoding can still be partially inferred from the encoding of {{1}, {2}, {1, 2}} using graph learning models [36]. As there are a large number of graph learning models available, their application to B model repair can be considered future work.

Fig. 8.

6.3 Comparisons to Related Work

Automated B model repair originates from two studies [32, 33]. To eliminate faults in abstract machines, they have proposed four methods, including the strengthening of pre-conditions, the relaxation of pre-conditions, the relaxation of invariants, and the synthesis of new operations. When relaxing pre-conditions and invariants and synthesising new operations, users are required to manually produce positive and negative I/O examples for program synthesis. The difference between their work and our work is that we focus on repairing substitutions, while they focus on repairing pre-conditions. Moreover, our AMBM algorithm uses repair evaluators and constraint solving to automatically synthesise repairs, while their methods require users to manually produce I/O examples for repair synthesis.

AMBM is an improved implementation of our previous work, B-repair [10]. When revising faulty abstract machines, B-repair uses a constraint solver to generate candidate repairs and uses learned quality estimation functions to rank repairs. As B-repair eliminates one and only one fault during each loop of repair, it takes considerably longer to repair models with a large number of faults. In this work, as multiple faults are eliminated using fewer compound repairs during each loop of repair, AMBM is significantly more efficient than the previous B-repair, and the resulting corrections are simpler in terms of the predicate structure.

With regard to the previous work on automatic imperative program repair such as RSRepair [28], GenProg [21], CASC [37], and SearchRepair [17], automated B model repair is conceptually different, because B is a design modelling language for constructing formal specifications at the abstract design level, while imperative programs are at the concrete implementation level [2]. Functions in B design models are usually represented as operations with pre-conditions and substitutions (which play the role of post-conditions) that declare the facts between the before and after values (states) of the variables used in the system. Consequently, operation executions are decided by the current states of variables but not decided by the control flows of the program. Thus, repair evaluators can separately analyse each operation, and repairs can be applied to faulty operations asynchronously without considering the execution orders of operations. In automatic imperative program repair, however, execution orders that are decided by control flows are important factors to be considered.

There are a number of similarities between the repair of B models and imperative program repair. First, both of them have fault localisation functions to reduce the search space of the repair. B model repair can rely on a model checker to detect faulty operations, while imperative program repair can rely on Spectrum-based Fault Localisation (SFL) [1] functions that find suspicious code blocks by counting successful and failed paths of program executions. Second, the concept of inductive programming can be used to synthesise patches of imperative programs [18] and refactor atomic modifications to compound modifications (this work). Third, conditional statements are often used to avoid the side effects of repair. For example, Staged Program Repair (SPR) [24] can use a number of instances to generate conditions that distinguish between correct and failed executions, so side effects to the correct executions can be minimised. Similarly, our AMBM uses conditional substitutions to avoid the side effects of modifications. Finally, as repairs are not definitive, evaluation functions seem to be inevitable to estimate the appropriateness of candidate repairs and select the best repairs. For example, GenProg [21] evaluates candidate repairs by observing successful and failed executions related to the repairs, while AMBM uses classification models and repair scores to select repairs. The above similarities between imperative program repair and B model repair indicate that the technologies in the two fields may be used by each other in the future.

7 Conclusion

We have extended B-repair by implementing abstract machine batch modification, which is an automated method for repairing erroneous B formal models during the correct-by-construction development processes. We have demonstrated that the state spaces of abstract machines can be accurately learned using classic classifiers such as random forests. The learned classifiers can be used to select atomic modifications produced by solving invariant constraints. Moreover, atomic modifications can be merged as compound modifications using predicate refactoring. The explainable and verifiable classifier has yielded high modification accuracies and improved the generality of compound modifications. Consequently, we suggest that automated abstract machine modification has the potential to increase the efficiency and productivity of software development.

In the future, AMBM may be improved as described below:

•

It is possible to develop repair evaluators based on unsupervised machine learning algorithms, where the generation of negative training examples is no longer needed, because the characteristics of state spaces can be directly learned using the unsupervised methods. We may develop new unsupervised methods similar to random forests and appropriate decision functions to achieve relatively high modification accuracies.

•

It is possible to extend AMBM using refinement checking techniques. First, a model checker is used to find faulty state transitions that violate refinement conditions. Next, pre- and post-conditions of a refined abstract machine are rewritten as constraints, so a set of candidate modifications can be generated by solving the constraints. After that, a repair evaluator is used to select the best modification to repair each faulty state transition. Finally, the original model is updated using the modifications and checked against the refinement conditions.

•

It is possible to design an algorithm that can eliminate invariant violations by either weakening invariants or modifying existing state transitions. The problem is how to make a choice between the above two options. A possible solution is to use an evaluator to decide whether a faulty state transition should be kept or removed, i.e., if the state transition gains a relatively high score, then the state transition will be kept, and the corresponding invariant will be weakened. If the state transition does not gain a high score, then the state transition will be modified using atomic modifications, and the corresponding invariant will not be changed.

•

To use AMBM in industry, we may try to improve the scalability of AMBM by integrating it with more development tools and speeding up the refactoring process using probabilistic techniques.

Footnotes

The source code is available via https://github.com/cchrewrite/ambm.

Silas Edu is developed by us and can be downloaded from https://www.depintel.com/silas_download.html.

For BNB, LR, SVM, and RF, default settings in scikit-learn were used. The default settings of Silas is for datasets containing over 1 million entries. We therefore used another set of settings for smaller datasets, where the number of decision trees in Silas is equal to its counterpart in RF.

⁴

Scripts and datasets of our experiments can be found in https://github.com/cchrewrite/ambm.

⁵

https://www3.hhu.de/stups/downloads/prob/source/.

⁶

https://github.com/hhu-stups/specifications/tree/update/prob-examples/B/MobileComm.

⁷

https://github.com/hhu-stups/abz2020-models.

References

[1]

Rui Abreu, Peter Zoeteweij, Rob Golsteijn, and Arjan J. C. van Gemund. 2009. A practical evaluation of spectrum-based fault localization. J. Syst. Softw. 82, 11 (2009), 1780–1792.

Digital Library

Google Scholar

[2]

Jean-Raymond Abrial. 2005. The B-book - Assigning Programs to Meanings. Cambridge University Press.

Digital Library

Google Scholar

[3]

Thomas Ackling, Bradley Alexander, and Ian Grunert. 2011. Evolving patches for software repair. In Proceedings of the 13th Annual Genetic and Evolutionary Computation Conference (GECCO). 1427–1434.

Digital Library

Google Scholar

[4]

Dalal Alrajeh, Jeff Kramer, Alessandra Russo, and Sebastián Uchitel. 2015. Automated support for diagnosis and repair. Commun. ACM 58, 2 (2015), 65–72.

Digital Library

Google Scholar

[5]

Haniel Barbosa and David Déharbe. 2012. Formal verification of PLC programs using the B method. In Proceedings of the 3rd International Conference: on Abstract State Machines, Alloy, B, VDM, and Z. 353–356.

Digital Library

Google Scholar

[6]

Nazim Benaïssa, David Bonvoisin, Abderrahmane Feliachi, and Julien Ordioni. 2016. The PERF approach for formal verification. In Proceedings of the 1st International Conference on Reliability, Safety, and Security of Railway Systems. Modelling, Analysis, Verification, and Certification. 203–214.

Google Scholar

[7]

Christopher M. Bishop. 2007. Pattern Recognition and Machine Learning, 5th Edition. Springer.

Google Scholar

[8]

Hadrien Bride, Cheng-Hao Cai, Jie Dong, Jin Song Dong, Zhé Hóu, Seyedali Mirjalili, and Jing Sun. 2021. Silas: A high-performance machine learning foundation for logical reasoning and verification. Exp. Syst. Applic. 176 (2021), 114806.

Digital Library

Google Scholar

[9]

Hadrien Bride, Jie Dong, Jin Song Dong, and Zhé Hóu. 2018. Towards dependable and explainable machine learning using automated reasoning. In Proceedings of the 20th International Conference on Formal Engineering Methods (ICFEM’18). 412–416.

Google Scholar

[10]

Cheng-Hao Cai, Jing Sun, and Gillian Dobbie. 2019. Automatic B-model repair using model checking and machine learning. Autom. Softw. Eng. 26, 3 (2019), 653–704.

Digital Library

Google Scholar

[11]

Johan de Kleer and Brian C. Williams. 1989. Diagnosis with behavioral modes. In Proceedings of the 11th International Joint Conference on Artificial Intelligence (IJCAI’89). 1324–1330.

Google Scholar

[12]

Tom Fawcett. 2006. An introduction to ROC analysis. Patt. Recog. Lett. 27, 8 (2006), 861–874.

Digital Library

Google Scholar

[13]

Luca Gazzola, Daniela Micucci, and Leonardo Mariani. 2019. Automatic software repair: A survey. IEEE Trans. Softw. Eng. 45, 1 (2019), 34–67.

Digital Library

Google Scholar

[14]

Tin Kam Ho. 1995. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition (ICDAR’95). 278–282.

Google Scholar

[15]

Sarah Hoffmann, Germain Haugou, Sophie Gabriele, and Lilian Burdy. 2007. The B-method for the construction of microkernel-based systems. In Proceedings of the 7th International Conference of B Users. 257–259.

Google Scholar

[16]

Tao Ji, Liqian Chen, Xiaoguang Mao, and Xin Yi. 2016. Automated program repair by using similar code containing fix ingredients. In Proceedings of the 40th IEEE Annual Computer Software and Applications Conference (COMPSAC’16). 197–202.

Google Scholar

[17]

Yalin Ke, Kathryn T. Stolee, Claire Le Goues, and Yuriy Brun. 2015. Repairing programs with semantic code search (T). In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE’15). 295–306.

Digital Library

Google Scholar

[18]

Emanuel Kitzelmann. 2009. Inductive programming: A survey of program synthesis techniques. In Proceedings of the 3rd International Workshop on Approaches and Applications of Inductive Programming. 50–73.

Google Scholar

[19]

Philipp Körner, Michael Leuschel, and Jannik Dunkelau. 2020. Towards a shared specification repository. In Proceedings of the 7th International Conference on Rigorous State-based Methods. 266–271.

Digital Library

Google Scholar

[20]

Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. 2012. A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In Proceedings of the 34th International Conference on Software Engineering (ICSE’12). 3–13.

Crossref

Google Scholar

[21]

Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest, and Westley Weimer. 2012. GenProg: A generic method for automatic software repair. IEEE Trans. Softw. Eng. 38, 1 (2012), 54–72.

Digital Library

Google Scholar

[22]

Michael Leuschel and Michael J. Butler. 2008. ProB: An automated analysis toolset for the B method. Int. J. Softw. Tools Technol. Transf. 10, 2 (2008), 185–203.

Crossref

Google Scholar

[23]

Michael Leuschel, Mareike Mutz, and Michelle Werth. 2020. Modelling and validating an automotive system in classical B and Event-B. In Proceedings of the 7th International Conference on Rigorous State-based Methods. 335–350.

Digital Library

Google Scholar

[24]

Fan Long and Martin Rinard. 2015. Staged program repair with condition synthesis. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. 166–178.

Digital Library

Google Scholar

[25]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake VanderPlas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Edouard Duchesnay. 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12 (2011), 2825–2830.

Digital Library

Google Scholar

[26]

Yu Pei, Carlo A. Furia, Martin Nordio, Yi Wei, Bertrand Meyer, and Andreas Zeller. 2014. Automated fixing of programs with contracts. IEEE Trans. Softw. Eng. 40, 5 (2014), 427–449.

Digital Library

Google Scholar

[27]

Ingo Pill and Thomas Quaritsch. 2013. Behavioral diagnosis of LTL specifications at operator level. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI). 1053–1059.

Google Scholar

[28]

Yuhua Qi, Xiaoguang Mao, Yan Lei, Ziying Dai, and Chengsong Wang. 2014. The strength of random search on automated program repair. In Proceedings of the 36th International Conference on Software Engineering (ICSE’14). 254–265.

Google Scholar

[29]

Stuart J. Russell and Peter Norvig. 2010. Artificial Intelligence - A Modern Approach (3rd International ed.). Pearson Education.

Google Scholar

[30]

Björn Scheuermann. 2007. Reading between the Packets—Implicit Feedback in Wireless Multihop Networks. Ph. D. Dissertation. University of Düsseldorf, Germany.

Google Scholar

[31]

Björn Scheuermann, Christian Lochert, and Martin Mauve. 2008. Implicit hop-by-hop congestion control in wireless multihop networks. Ad Hoc Netw. 6, 2 (2008), 260–286.

Digital Library

Google Scholar

[32]

Joshua Schmidt, Sebastian Krings, and Michael Leuschel. 2016. Interactive model repair by synthesis. In Proceedings of the 5th International Conference on Abstract State Machines, Alloy, B, TLA, VDM, and Z. 303–307.

Google Scholar

[33]

Joshua Schmidt, Sebastian Krings, and Michael Leuschel. 2018. Repair and generation of formal models using synthesis. In Proceedings of the 14th International Conference on Integrated Formal Methods. 346–366.

Google Scholar

[34]

Alexey Smirnov and Tzi-cker Chiueh. 2007. Automatic patch generation for buffer overflow attacks. In Proceedings of the 3rd International Symposium on Information Assurance and Security (IAS’07). 165–170.

Digital Library

Google Scholar

[35]

Joseph P. Turian, Lev-Arie Ratinov, and Yoshua Bengio. 2010. Word representations: A simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL). 384–394.

Google Scholar

[36]

Shoujin Wang, Liang Hu, Yan Wang, Xiangnan He, Quan Z. Sheng, Mehmet A. Orgun, Longbing Cao, Francesco Ricci, and Philip S. Yu. 2021. Graph learning based recommender systems: A review. In Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI’21). 4644–4652.

Google Scholar

[37]

Josh L. Wilkerson and Daniel R. Tauritz. 2010. Coevolutionary automated software correction. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO’10). 1391–1392.

Google Scholar

Cited By

View all

Soltanieh ASiegle M(2024)Rate Lifting for Stochastic Process Algebra by Transition Context AugmentationACM Transactions on Modeling and Computer Simulation10.1145/365658234:3(1-30)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3656582

Index Terms

Fast Automated Abstract Machine Repair Using Simultaneous Modifications and Refactoring
1. Computing methodologies
  1. Machine learning
2. Software and its engineering
  1. Software creation and management
    1. Software development techniques
    2. Software verification and validation

Recommendations

Automatic B-model repair using model checking and machine learning
Abstract
The B-method, which provides automated verification for the design of software systems, still requires users to manually repair faulty models. This paper proposes B-repair, an approach that supports automated repair of faulty models written in the ...
Model and Program Repair via SAT Solving
Special Issue on MEMCODE 2015 and Regular Papers (Diamonds)

We consider the subtractive model repair problem: given a finite Kripke structure M and a CTL formula η, determine if M contains a substructure M^′ that satisfies η. Thus, M can be “repaired” to satisfy eta by deleting some transitions and states. We map ...
Abstract model repair
NFM'12: Proceedings of the 4th international conference on NASA Formal Methods

Given a Kripke structure M and CTL formula φ, where M ⊭ φ, the problem of Model Repair is to obtain a new model M′ such that M′⊨φ. Moreover, the changes made to M to derive M′ should be minimal with respect to all such M′. As in model checking, state ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

Formal Aspects of Computing Volume 34, Issue 2

June 2022

113 pages

ISSN:0934-5043

EISSN:1433-299X

DOI:10.1145/3557792

Editor:
Jim Woodcock
University of York, UK

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 September 2022

Online AM: 14 May 2022

Accepted: 05 May 2022

Revised: 28 April 2022

Received: 21 February 2020

Published in FAC Volume 34, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
758
Total Downloads

Downloads (Last 12 months)310
Downloads (Last 6 weeks)40

Reflects downloads up to 14 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Soltanieh ASiegle M(2024)Rate Lifting for Stochastic Process Algebra by Transition Context AugmentationACM Transactions on Modeling and Computer Simulation10.1145/365658234:3(1-30)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3656582

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

1 Introduction

2 Preliminaries

2.1 The B Method

2.2 Supervised Machine Learning

2.3 B-repair

3 Abstract Machine Batch Modification

3.1 Core Functions

3.2 Implementation

4 A Case Study On AMBM

4.1 The Learning Phase

4.2 The Modification Phase

4.3 The Refactoring Phase

4.4 The Update Phase

5 Evaluation

5.1 Purpose of the Evaluation

5.2 Experimental Settings

5.3 Results and Discussions

5.3.1 Results of Part I.

5.3.2 Results of Part II.

6 Discussions

6.1 Modifications on Non-determinism

6.2 Limitations

6.3 Comparisons to Related Work

7 Conclusion

Footnotes

References

Cited By

Index Terms

Recommendations

Automatic B-model repair using model checking and machine learning

Model and Program Repair via SAT Solving

Abstract model repair

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations