[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Fast Automated Abstract Machine Repair Using Simultaneous Modifications and Refactoring

Published: 20 September 2022 Publication History

Abstract

Automated model repair techniques enable machines to synthesise patches that ensure models meet given requirements. B-repair, which is an existing model repair approach, assists users in repairing erroneous models in the B formal method, but repairing large models is inefficient due to successive applications of repair. In this work, we improve the performance of B-repair using simultaneous modifications, repair refactoring, and better classifiers. The simultaneous modifications can eliminate multiple invariant violations at a time so the average time to repair each fault can be reduced. Further, the modifications can be refactored to reduce the length of repair. The purpose of using better classifiers is to perform more accurate and general repairs and avoid inefficient brute-force searches. We conducted an empirical study to demonstrate that the improved implementation leads to the entire model process achieving higher accuracy, generality, and efficiency.

1 Introduction

Automatic Software Repair (ASR) [4, 13] aims to use verification, testing, and program synthesis techniques to assist humans to repair erroneous programs. In general, repairing software requires machines to locate faults before generating the repairs. Spectrum-based fault localisation has been heavily used to locate faults in imperative programs [1, 20]. It uses a set of I/O pairs to test a program and records traces of successful cases and failed cases. According to the occurrences of operations in the traces, suspicious code that causes the failed cases can be found, and candidate repairs can be generated using various techniques. For example, GenProg [21] and SCRepair [16] can generate mutation repairs; PASAN [34], AutoFix-E [26], and SPR [24] can use pre-defined template repairs to generate repairs; CASC [37], pyEDB [3], and GenProg [21] can use genetic programming to generate repairs. However, these ASR tools only focus on traditional test-based software development at the concrete code level. Little work has been done on ASR for correct-by-construction software development at the abstract design level, as much work in this field has focused on the computer-assisted diagnosis of faulty models. For example, a theory for identifying consistent behavioural modes in abstract models has been proposed by Reference [11]. Moreover, Linear Temporal Logic (LTL) specifications, which are used to specify sequential properties of programs at the abstract design level, can be diagnosed using SAT encodings and reasoning [27]. However, model repair after diagnosis requires more investigation. In this work, we study automatic model repair techniques based on the B method.
The B method [2] is a formal software development method at the abstract design level, where design specifications are represented as abstract machines (called “models”). B has been used to develop a number of automatic railway control systems in France, Sweden, and USA [6], formalise the security properties of the L4 microkernel [15], and verify industrial PLC controllers [5]. The idea of B model repair is proposed by Schmidt et al. [32], and the goal of B model repair is to automatically (or semi-automatically) eliminate invariant violations and deadlocks in abstract machines. Moreover, Schmidt et al. [33] have developed a model repair approach that eliminates invariant violations by strengthening pre-conditions and relaxing invariants and eliminates deadlocks by weakening pre-conditions and generating new operations. This approach is semi-automatic because users are required to manually give I/O examples to synthesise new operations, and the code of new operations is constructed using a pre-defined program component library. Another automated B model repair approach is called B-repair [10], which uses machine learning techniques to learn the state spaces of abstract machines and select well-behaved repairs that preserve the original state spaces as much as possible. However, B-repair eliminates only one fault during each loop of repair, which means that repairing a large number of faults is time-consuming.
In this article, we improve B-repair by implementing Abstract Machine Batch Modification (AMBM), which is more automatic and efficient than the previous B-repair. AMBM can repair multiple invariant violations at a time using simultaneous modifications, repair refactoring, and better classifiers. AMBM inherits the concept of B-repair, which aims to repair design models at a high level of abstraction. The design models consist of operations describing changes in model states and invariants describing model properties, and the operations are expected to satisfy the properties with respect to the system requirements [2]. If the design models are logically verified to be correct, then they can be further developed into executable software via different techniques, e.g., they can be rewritten as concrete models by refinement and finally converted to concrete programs. If faults exist in a design model, then succeeding concrete models and final concrete programs can be faulty. Thus, repairing faulty design models is of great importance to the secure software development process.
The workflow of AMBM is revealed in Figure 1. First, an initial abstract machine that specifies the design model, which possibly has invariant violations, is constructed by the user. The machine consists of initialisation and a number of operations. In the learning module, the state space of the machine is analysed using a model checker and learned using a classifier, leading to a repair evaluator. We have the following two assumptions: First, high-quality repairs are expected to retain the state space of the original model as much as possible. Second, instead of manual estimation, the quality of repair can be automatically estimated using the repair evaluator. Based on the above assumptions, the repair evaluator will be used to maximise the similarities of state space before and after repair. Next, in the modification module, the model checker is used to detect invariant violations in the machine. If invariant violations exist, then a set of atomic modifications that can eliminate the invariant violations will be synthesised. These modifications are found via a constraint solver and selected using the repair evaluator. A modification will be selected if the repair evaluator predicts a high likelihood that the modification can lead to a minimal change in the state space. Each selected atomic modification can eliminate exactly one invariant violation. After that, in the refactoring module, the atomic modifications are simplified as compound modifications. Each compound modification can eliminate one or more invariant violations. Finally, in the update module, the compound modifications are applied to the initial abstract machine, leading to an updated machine. The updated machine will be forwarded to the modification module. If any invariant violations are detected, then further modifications will be required; otherwise, the workflow terminates.
Fig. 1.
Fig. 1. The overall flow chart of AMBM.
The contributions of this work include:
B-repair [10] is extended by implementing batch modifications and integrating better machine learning models, which leads to higher speed and accuracy on B model repair tasks.
A repair refactoring algorithm is introduced to generalise the code of modifications.
An empirical study is conducted to demonstrate the accuracy, generality and efficiency of the extended model repair tool.1
The rest of this article is organised as follows: Section 2 revisits the B method, supervised machine learning, and B-repair. Section 3 presents technical details of AMBM. Section 4 presents a case study on the use of AMBM to repair a faulty B model. Section 5 presents an empirical study of our approach. Section 6 discusses modifications on non-determinism and compares our study with related work. Section 7 concludes this work and outlines future directions.

2 Preliminaries

This section revisits the B method, supervised machine learning, and B-repair. Emphasised words are terminology that will be used in later discussions.

2.1 The B Method

The B method [2] is a correct-by-construction formal design modelling technique, where models are represented as abstract machines consisting of constants, variables, initialisations, operations, invariants, and properties. Constants define unchangeable values in models, and variables define changeable values that record states of models. Initialisations can assign initial values to variables, and operations can generate new states by assigning new values to variables. Invariants describe conditions that all states must satisfy, and properties describe conditions that must be satisfied by the constants. The derivation and checking of model states can be achieved using model checkers such as the ProB tool [22]. Given an abstract machine that has N variables, the ProB model checker can approximate a state space that consists of a set of state transitions of the form \([v_1,v_2,\ldots ,v_N] \xrightarrow {\alpha } [v^{\prime }_1,v^{\prime }_2,\ldots ,v^{\prime }_N]\), where \(\alpha\) is an operation, \(v_1,v_2,\ldots ,v_N\) are the values of the variables before applying the operation, and \(v^{\prime }_1,v^{\prime }_2,\ldots ,v^{\prime }_N\) are the values of the variables after applying the operation. In other words, the new state \([v^{\prime }_1,v^{\prime }_2,\ldots ,v^{\prime }_N]\) results from the application of the operation \(\alpha\) to the existing state \([v_1,v_2,\ldots ,v_N]\). When deriving such transitions, the model checker verifies whether all \(v_1,v_2,\ldots ,v_N\) and \(v^{\prime }_1,v^{\prime }_2,\ldots ,v^{\prime }_N\) satisfy the given invariants. If not, then an invariant violation will be triggered and reported. Initialisations can trigger invariant violations as well, but in this study, we do not repair faulty initialisations.
Operations are the core components of abstract machines, as they determine the derivation of transitions. They are described using different forms of substitutions, such as pre-conditioned substitutions and conditional substitutions. A pre-conditioned substitution is of the form \({\rm PRE}~~P~~{\rm THEN}~~Q~~{\rm END}\), where P is a predicate, and Q is a substitution. P is a pre-condition that must be true for a state s when the operation is applied. If P is false for s, then the operation will not be activated. A conditional substitution is of the form \({\rm IF}~~P~~{\rm THEN}~~Q~~{\rm ELSE}~~R~~{\rm END}\), where P is a predicate, and Q and R are substitutions. If P is true for a state s, then Q will be applied. If P is false for s, then R will be applied to s. In this study, the above two types of substitution are used to construct repair operators that can generally handle other types of substitution. Additionally, a pre-state and an operation can either deterministically lead to only one post-state, or non-deterministically lead to more than one post-state. In this work, we focus on such determinism and leave non-determinism as future work.

2.2 Supervised Machine Learning

Supervised machine learning aims at constructing a function that maps given input-output pairs [29]. Although supervised machine learning and the B method are rooted from two separate domains, the vectorisation techniques can bridge the gap between the two domains. According to References [10, 35], the states of B model can be converted into binary vectors by applying a set of pre-defined transformations to variables in the states. If a variable is an integer, a Boolean value, a distinct element, or a first-order set, then it can be vectorised by one-hot encoding. In the vectorisation process, infinite types such as INTEGER and NATURAL are converted to finite sets by collecting all the values that occur in a state space and ignoring all unseen values so such infinite types can be partially vectorised. The unseen values are ignored, because supervised machine learning models usually cannot learn unseen values, as they do not occur in training sets; therefore, unseen values should be excluded to avoid noise. Besides, to vectorise higher-order sets, sets in sets can be considered as string elements. Section 6.2 will discuss limitations and alternatives of the vectorisation method. Regardless of the number of variables, a state s can be vectorised by converting all variables in s to vectors and concatenating the vectors.
Using vectorisation, states of a B formal model can be represented as sequences of 0 and 1, so they can be learned using supervised learning models such as Bernoulli Naive Bayes (BNB) classifiers, Logistic Regression (LR) classifiers, Support Vector Machines (SVM), Random Forests (RF), and various neural network architectures [7, 14]. In particular, Silas is an explainable and verifiable classifier that learns patterns using random forests and applies automated reasoning techniques to explain and verify the learning results [8, 9]. Although the theories of these learning models differ from each other, their usages are similar. Each model must have a training algorithm and a prediction algorithm. The training algorithm takes as input a training set containing a list of vectors with their labels. Each label is an identifier representing exactly one class, and each vector has exactly one label. The training algorithm updates parameters of the model to map the vectors in the training set to their corresponding labels. The prediction algorithm takes as input a vector x and returns a vector \(y = (y_1,\ldots ,y_K)\) such that \(y_k~(k=1,\ldots ,K)\) is the likelihood that x is mapped to the kth label. In the next section, we will explain how to use the supervised machine learning models to learn the states of a B design model.

2.3 B-repair

B-repair [10] aims to use model checking, constraint solving and machine learning to search for repairs that solve invariant violations in B abstract machines. After the use of ProB [22] to detect a state transition that violates invariants, the constraint solver in ProB is used to suggest candidate repairs that change the state transition to satisfy the invariants. One of the core steps of B-repair is to use a repair evaluator based on binary classification to select repairs from the candidate repairs.
Model checkers can approximate an abstract machine using a set of state transitions, where each transition is of the form \(S_{pre} \xrightarrow {\alpha } S_{post}\) and can be rewritten as a triple \([S_{pre},S_{post},\alpha ]\) consisting of a pre-state \(S_{pre}\), a post-state \(S_{post}\), and an operation \(\alpha\). The operation \(\alpha\) consists of a pre-condition P and a post-condition Q (which is usually represented as a generalised substitution). The triple \([S_{pre},S_{post},\alpha ]\) means that \(S_{pre}\) is a state satisfying P, and \(S_{post}\) is a state satisfying Q. The analysis of the state space can be converted into a classification problem, i.e., the triple \([S_{pre},S_{post},\alpha ]\) can be classified into either a set of “possible” transitions \(S_{P}\) or a set of “impossible” transitions \(S_{I}\). \([S_{pre},S_{post},\alpha ]\) is in \(S_{P}\) if and only if \(S_{pre} \xrightarrow {\alpha } S_{post}\) is a possible transition with respect to the machine. \([S_{pre},S_{post},\alpha ]\) is in \(S_{I}\) if and only if \(S_{pre} \xrightarrow {\alpha } S_{post}\) is impossible with respect to the machine. B-repair can use binary classifiers to learn the mapping from state transitions to \(S_{P}\) and \(S_{I}\). The trained classifiers are considered as repair evaluators, i.e., given a repair, the classifiers use their prediction functions to output repair scores indicating the likelihood that the repair results in a state transition in \(S_{P}\).

3 Abstract Machine Batch Modification

This section gives details on how B-repair is improved by implementing AMBM. AMBM reuses the learning and update modules of B-repair and adapts the modification module to support batch modifications. Additionally, a new refactoring module is used to simplify the code of batch modifications. Algorithm 1 describes the main function of AMBM. It takes as input a source B machine that contains invariant violations and outputs a repaired machine without any invariant violations. The algorithm consists of the learning phase (Lines 12), the modification phase (Lines 3–18), the refactoring phase (Lines 19–23), and the update phase (Lines 24–25), which are indicated using the “RGB]0,47,167\(\mathbf {\rhd }\)” symbols. The motivation and intuition of the four phases are as follows:
The learning phase is used to train a classifier that learns the state space of the B machine. The trained classifier can be used to rank repairs. Without the ranking process, it will be difficult to select appropriate repairs.
The modification phase is used to detect invariant violations and suggest repairs. As different candidate repairs are available, the classifier is used to rank the repairs.
The refactoring phase is an optional process that can simplify the code of repair when multiple repairs are applied to an operation. Without simplification, the repair still works, but the resulting B machine may have tedious code.
The update phase is used to update the B machine based on the suggested repairs.
Referring to Figure 1 in Section 1, the four phases correspond to the learning module, the modification module, the refactoring module, and the update module, which describe a single loop of modification. During the learning phase, the state space of the source machine is approximated using the ProB model checker [22] and used to train a repair evaluator. During the modification phase, invariant violations are detected and removed from the source machine. First, the model checker is used to find all faulty transitions that trigger invariant violations. Second, the constraint solver embedded in ProB is used to randomly compute a set of candidate states that satisfy all invariants in the source machine. Third, a set of candidate atomic modifications, which can repair single faulty transitions, are produced using the candidate states. (To understand how the candidate atomic modifications are generated, refer to the Atomic-Modifications function and the Update function in Section 3.1.) Their repair scores are estimated using the learned classifier. For each faulty transition, an atomic modification with the highest repair score is selected. Fourth, the source machine is updated using all selected modifications. The modification phase is repeated until no faulty transitions can be found. During the refactoring phase, atomic modifications applied to each operation are collected and rewritten using Algorithm 2, resulting in a set containing compound modifications and atomic modifications that cannot be refactored. Finally, during the update phase, the source machine is changed using the modifications, and the updated machine is returned.
Algorithm 2 is an algorithm that rewrites atomic modifications into compound modifications. It takes as input a set of atomic modifications and outputs a set of compound modifications. First, the atomic modifications are converted to atomic modification predicates (which are generated using the Modifications-To-Predicates function in Section 3.1). Second, Context-Free Grammars (CFG) are used to generate a set of relation predicates describing possible relations between the pre- and post-states. Third, candidate relation predicates that are satisfied by each atomic modification predicate are collected. According to the candidate relation predicates, the atomic modification predicates are split into two partitions. The first partition \(P_B\) includes all atomic modification predicates that satisfy a common relation predicate. The second partition \(P_A\) includes all atomic modification predicates that do not satisfy the relation predicate. Finally, the above partition process is applied to \(P_A\) iteratively until no further partitions can be produced. If such a partition cannot be produced, then the partitions will be converted to compound modifications using their relation predicates.
To help readers understand the algorithms, the following subsection provides details of core functions used in Algorithms 1 and 2. The functions are listed in order of line numbers in the algorithms.

3.1 Core Functions

Algorithm 1 includes the following functions:
\(D \leftarrow \texttt {State-Space}(M)\) (in Line 1 and Line 7) returns the state space D of a given abstract machine M. It is a function of the ProB model checker. The given abstract machine must be finite with respect to its invariant and have no deadlock states. To approximate a finite state space, ProB is run to generate all states that satisfy the invariant and freeze all states that violate the invariant. In particular, if ProB detects any states violating the invariant, then ProB will be controlled to check other normal states rather than stopping at the violation points. The resulting finite state space D is converted to a list of triples. Each triple is of the form \([S,S^{\prime },\alpha ]\), where S is a pre-state of the form \([x_1,x_2,\ldots ,x_N]\), \(S^{\prime }\) is a post-state of the form \([x^{\prime }_1,x^{\prime }_2,\ldots ,x^{\prime }_N]\), and \(\alpha\) is an operation. The triple means that \([x_1,x_2,\ldots ,x_N] \xrightarrow {\alpha } [x^{\prime }_1,x^{\prime }_2,\ldots ,x^{\prime }_N]\) is a possible transition of M, where \(x_i~(i=1,\ldots ,N)\) and \(x^{\prime }_i~(i=1,\ldots ,N)\) are values of variables in M. The above form of triples is consistently used in our functions.
\(W \leftarrow \texttt {Repair-Evaluator-Training}(D,T)\) (in Line 2) learns a binary classifier of type T for a state space D and returns the learned classifier W. The learning of the classifier is performed by the following steps: First, a set of triples Z is randomly produced such that \(|Z| = |D|\) and \(D \cap Z = \varnothing\). Second, the triples in D and Z are vectorised as features with labels. Features of D are labelled as “0” (i.e., possible transitions). Features of Z are labelled as “1” (i.e., impossible transitions). Third, the features with labels are learned using a classifier of the type T. T can be BNB, LR, SVM, RF, or Silas. Regardless of the theories of these classifiers, each classifier has a training algorithm implemented as a method, \(\texttt {fit}(X,Y)\), that takes as input a list of features X with a list of labels Y and updates parameters of the classifier to fit X and Y. The goal of fitting is to map features in X to their corresponding labels in Y as much as possible. For our repair evaluator training function, the method \(\texttt {fit}\) is used to learn the mappings between the features and the labels of the triples. Finally, the learned classifier is returned.
\(X \leftarrow \texttt {Invariant-Solutions}(M,N)\) (in Line 3) computed at most N states satisfying the invariant of an abstract machine M and returns a set X containing the N computed states. The function consists of the following steps: First, the invariant of M is extracted. Second, the invariant is converted into a constraint where all variables in M are considered unknowns. Finally, the constraint solver of ProB searches solutions satisfying the constraint in random order and returns N solutions, where each solution is a state \([x_1,x_2,\ldots ,x_N]\) satisfying the invariant of M. The purpose of this function is to find candidate components (i.e., modified states) for producing atomic modifications.
\(T \leftarrow \texttt {Faulty-Transitions}(D)\) (in Line 8) returns a set T containing all faulty transitions in the state space D. D must be produced using the function \(D \leftarrow \texttt {State-Space}(M)\). Note that D is assumed to have no deadlock states. If ProB detects an invariant violation, then the computation of the state space will stop and only one faulty state can be reported. To report all invariant violations, we use a trick to control ProB to develop a whole state space and collect all invariant violations, i.e., all states that trigger invariant violations are frozen, and all states that have no outgoing transitions are reported.
\(S \leftarrow \texttt {Correct-States}(D)\) (in Line 12) returns a set S containing all correct states in the state space D. D must be produced using the function \(D \leftarrow \texttt {State-Space}(M)\). Correct states in D are collected by finding all states with at least one outgoing transition.
\(R \leftarrow \texttt {Atomic-Modifications}(T,S)\) (in Line 13) takes as input a set of faulty transitions T and a set of correct states S. Any transition \(S_{pre} \xrightarrow {\alpha } S_{post}~\in ~T\) and any state \(S_{mod} \in S\) correspond to an atomic modification \([\alpha ,S_{pre},S_{post},S_{mod}]\), which means that changing \(S_{pre} \xrightarrow {\alpha } S_{post}\) to \(S_{pre} \xrightarrow {\alpha } S_{mod}\) can eliminate the faulty state \(S_{post}\). All possible atomic modifications are collected into a set R and returned. The purpose of this function is to synthesise atomic modifications that can eliminate invariant violations.
P \(\leftarrow\) \(\texttt {Repair-Scores}(R,W)\) (in Line 14) predicts repair scores of a set of atomic modifications R via a classifier W. The classifier must be produced using the function \(W \leftarrow \texttt {Repair-Evaluator-Training}(D,T)\). The repair scores are computed via the following steps: First, each atomic modification \([\alpha ,S_{pre},S_{post},S_{mod}]\) is reduced to a triple \([S_{pre},S_{mod},\alpha ]\). Second, \([S_{pre},S_{mod},\alpha ]\) is vectorised as a binary feature \(x_0\). Third, the classifier W is used to predict the repair score of \(x_0\). Regardless of the type of W, it must have a prediction algorithm implemented as a method \(\texttt {predict}(x,y)\) that takes as input a feature x and a label y and returns the likelihood that x is mapped to y. For our repair evaluator training algorithm, the repair score of the triple is the value of \(\texttt {predict}(x_0,``0^{\prime \prime })\). Finally, a list P containing the repair scores of all the modifications is returned.
\(S \leftarrow \texttt {Modification-Selection}(R,P)\) (in Line 15) selects modifications in a list of atomic modifications R with a list of repair scores P and returns a list of selected modifications S. The ith number in P is the repair score of the ith modification in R. The selecting process has the following steps: First, modifications in R are sorted by their repair scores in descending order. Second, for any modification \([\alpha ,S_{pre},S_{post},S_{mod}]\) in the sorted R, if \(\alpha\), \(S_{pre}\), and \(S_{post}\) is the first occurrence, then \([\alpha ,S_{pre},S_{post},S_{mod}]\) will be considered as the best modification for the transition \(S_{pre} \xrightarrow {\alpha } S_{post}\). Finally, all the best modifications are collected into a list S and returned. The purpose of this function is to find the best atomic modifications, i.e., for each invariant violation, the trained classifier is used to estimate the repair scores of all applicable candidate modifications, and only the modification with the highest repair score will be selected.
U \(\leftarrow\) \(\texttt {Update}(M,R)\) (in Lines 16 and 24) updates an abstract machine M using a list of modifications R and returns an updated machine U. A modification is applied to an operation \(\alpha\) via a pair \((P^{\prime },Y^{\prime })\), where \(P^{\prime }\) is a condition, and \(Y^{\prime }\) is a substitution. For an atomic modification \([\alpha ,S_{pre},S_{post},S_{mod}]\), \(P^{\prime }\) is the predicate form of \(S_{pre}\), and \(Y^{\prime }\) is the substitution form of \(S_{mod}\). For a compound modification \([\alpha ,P,Y]\), \(P^{\prime }\) is the predicate form of P, and \(Y^{\prime }\) is the substitution form of Y. If \(\alpha\) is a substitution T without any pre-conditions, then applying the modification to \(\alpha\) will lead to a conditional substitution as follows:
\begin{equation} {\rm IF}~~not(P^{\prime })~~{\rm THEN}~~T~~{\rm ELSE}~~Y^{\prime }~~{\rm END.} \end{equation}
If \(\alpha\) is a pre-conditioned substitution “\({\rm PRE}~~S~~{\rm THEN}~~T~~{\rm END,}\)” then applying the modification to \(\alpha\) leads to a pre-conditioned substitution as follows:
\begin{equation} \begin{aligned}{\rm PRE}&~~S~~{\rm THEN} \\ &{\rm IF}~~not(P^{\prime })~~{\rm THEN}~~T~~{\rm ELSE}~~Y^{\prime }~~{\rm END} \\ {\rm END}& . \end{aligned} \end{equation}
Moreover, when \(N~(N \ge 1)\) modifications \((P_1^{\prime },Y_1^{\prime }),\ldots ,(P_N^{\prime },Y_N^{\prime })\), where \(P_i^{\prime }~(i=1,\ldots ,N)\) is a condition and \(Y_i^{\prime }~(i=1,\ldots ,N)\) is a substitution, are applied to the same operation with a substitution T and without any pre-conditions, the following template can be used:
\begin{equation} \begin{aligned}&{\rm IF}~~P_1^{\prime }~~{\rm THEN}~~Y_1^{\prime } \\ & \ldots \\ &{\rm ELSIF}~~P_N^{\prime }~~{\rm THEN}~~Y_N^{\prime } \end{aligned} \end{equation}
\begin{equation*} \begin{aligned}&{\rm ELSE}~~T \nonumber \nonumber \\ &{\rm END.} \nonumber \nonumber \end{aligned} \end{equation*}
If the operation has a pre-condition S and a substitution T, then the following template can be used:
\begin{equation} \begin{aligned}{\rm PRE}&~~S~~{\rm THEN} \\ &{\rm IF}~~P_1^{\prime }~~{\rm THEN}~~Y_1^{\prime } \\ & \ldots \\ &{\rm ELSIF}~~P_N^{\prime }~~{\rm THEN}~~Y_N^{\prime } \\ &{\rm ELSE}~~T \\ &{\rm END} \\ {\rm END}& . \end{aligned} \end{equation}
After applying all modifications to M, the resulting machine U is returned.
\(S \leftarrow \texttt {Collect-Modifications}(R,\alpha)\) (in Line 21) takes as input a set of modifications R and an operation \(\alpha\) and returns a set S containing all modifications that are in R and can be applied to \(\alpha\).
\(S \leftarrow \texttt {Refactoring}(R)\) (in Line 22) takes as input a set of atomic modifications R and returns a set of compound modifications S via Algorithm 2. Each compound modification is of the form \([\alpha ,P,Y]\), where \(\alpha\) is an operation, P is a list of pre-states, and Y is a list of substitutions reflecting the relations between the pre-states and the post-states. The purpose of this function is to simplify atomic modifications.
Algorithm 2 includes the following functions:
\(\alpha \leftarrow \texttt {Get-Operation}(M)\) (in Line 1) takes as input a set of atomic modifications M. If all modifications correspond to a certain operation \(\alpha\), then \(\alpha\) is returned. If the modifications correspond to two or more operations, then an error is raised.
\(U \leftarrow \texttt {Modifications-To-Predicates}(M)\) (in Line 2) converts a set of atomic modifications M to a set of predicates U. Each atomic modification is of the form \([\alpha ,S_{pre},S_{post},S_{mod}]\), suggesting that a transition \(S_{pre} \xrightarrow {\alpha } S_{post}\) should be changed to \(S_{pre} \xrightarrow {\alpha } S_{mod}\). Only \(S_{pre}\) and \(S_{mod}\) need to be converted to predicate forms. Suppose that \(S_{pre}\) and \(S_{mod}\) are \([x_1,\ldots ,x_N]\) and \([y_1,\ldots ,y_N]\), respectively, and \(v^{pre}_i\) and \(v^{mod}_i\) \((i=1,\ldots ,N)\) are identifiers of variables in the pre-state P and the modified state Y, respectively. The predicate forms of P and Y are \(v^{pre}_1=x_1~\wedge ~\cdots ~\wedge ~v^{pre}_N=x_N\) and \(v^{mod}_1=y_1~\wedge ~\cdots ~\wedge ~v^{mod}_N=y_N\), respectively. Thus, the predicate form of \([\alpha ,S_{pre},S_{post},S_{mod}]\) is the conjunction of the above two predicates, which is \(v^{pre}_1=p_1~\wedge ~\cdots ~\wedge ~v^{pre}_N=p_N~\wedge ~v^{mod}_1=y_1~\wedge ~\cdots ~\wedge ~v^{mod}_N=y_N\). This predicate is called a modification predicate. The purpose of producing modification predicates is to enable the constraint-solving function in ProB to find relationships between pre-states and modified states.
\(P \leftarrow \texttt {CFG-Predicates}(M,D)\) (in Line 3) takes as input a set of atomic modifications M and a search depth D and synthesises a set of predicates P using Context-Free Grammars (CFGs). The CFGs are constructed using: (1) the identifiers of variables in the pre-states and the modified states, including \(v^{pre}_i\) and \(v^{mod}_i\) \((i=1,\ldots ,N)\), (2) values of variables that occur in M, and (3) the B operators such as arithmetic, Boolean, and set operators. The maximum depth of synthesised predicates is D. Synthesised predicates are of the form \(v^{mod}_i = F(v^{pre}_1,\ldots ,v^{pre}_N)\), where F is a function. The synthesised predicates are called CFG predicates. The purpose of synthesising CFG predicates is to find candidate predicates that represent relations between pre-states and modified states. By default, the maximum depth of the CFG predicate is set to 3, and the number of candidate predicates is set to 1,000. Each variable can match at least one candidate predicate because a variable \(v_i\) with a value \(x_i\) can be directly converted to a predicate \(v^{mod}_i = x_i\).
W \(\leftarrow\) \(\texttt {Candidate-Predicates}(X,P)\) (in Line 6) takes as input a set of modification predicates X and a set of CFG predicates P and returns a set W containing candidate pairs that are of the form \((R,S)\) such that \(R \in X\), \(S \in P\), and S is a candidate predicate of R. To obtain such pairs, for any \(R \in X\) and any \(S \in P\), the constraint-solving function in the ProB model checker is used to resolve \(R \wedge S\). If \(R \wedge S\) is true, then S will be considered as a candidate predicate of R. The pair \((R,S)\) will be a member of W and is called a candidate pair. After obtaining all candidate pairs, W is returned. The purpose of finding candidate pairs is to discover hidden relations between pre-states and modified states.
\([P_B,P_A] \leftarrow \texttt {Best-Partition}(X,W)\) (in Line 7) takes as input a set of modification predicates X and a set of candidate pairs W. Recall the explanations of Modifications-To-Predicates and CFG-Predicates. We continue to use the notation of modification predicates \(v^{pre}_1=p_1~\wedge ~\cdots ~\wedge ~v^{pre}_N=p_N~\wedge ~v^{mod}_1=y_1~\wedge ~\cdots ~\wedge ~v^{mod}_N=y_N\) and the notation of candidate predicates \(v^{mod}_i = F(v^{pre}_1,\ldots ,v^{pre}_N)\). For each \(v^{mod}_i\) \((i=1,\ldots ,N)\), a candidate predicate \(U_i\) is found in W such that:
\(U_1 \wedge \cdots \wedge U_N\) is true for all predicates in a subset \(X_1 \subseteq X\),
\(U_1 \wedge \cdots \wedge U_N\) is false for all predicates in a subset \(X_2 \subseteq X\),
\(X_1 \cup X_2 = X~\wedge ~X_1 \cap X_2 = \varnothing\), and
the cardinality of \(X_1\) is maximised.
In the above process, \(U_1 \wedge \cdots \wedge U_N\) is called a compound modification predicate, which uses a set of CFG predicates to describe atomic modifications. After finding such \(X_1\) and \(X_2\), the best partition \(P_B = [X_1,[U_1, \ldots , U_N]]\) and a partition \(P_A = X_2\) that contains all the remaining atomic modifications will be returned. The purpose of this function is to find common relations between pre-states and modified states.
Z \(\leftarrow\) \(\texttt {Compound-Modification}(P_B,\alpha)\) (in Line 8) converts the partition \(P_B = [X_1\), \([U_1,\ldots ,U_N]]\), which is produced using Best-Partition, to a compound modification Z for an operation \(\alpha\). Z is of the form \([\alpha ,P,[U_1,\ldots ,U_N]]\), where P is a set containing all pre-states covered by \(X_1\). The purpose of this function is to synthesise modifications using common relations between pre-states and modified states.

3.2 Implementation

AMBM is implemented by extending B-repair [10]. Its main dependencies include ProB 1.7.1 [22], scikit-learn 0.19.2 [25], and Silas Edu 0.8.5 [8, 9]. The model checker in ProB is used to approximate state spaces of abstract machines and detect invariant violations. The constraint solver in ProB is used to find candidate modifications. The constraint-solving function in ProB is used to find relations between pre-states and modified states. Regarding scikit-learn, it provides well-behaved classifiers, including BNB, LR, SVM, and RF, as well as training and prediction functions that can be directly used for the purpose of repair evaluator training. Besides, Silas Edu provides an improved implementation of random forest and corresponding training and prediction functions. In our tool, these classifiers are used as black boxes. The source code of the tool can be downloaded.2
Figure 2 shows the architecture of the implemented AMBM tool. The tool consists of four modules that correspond to the four phases of Algorithm 1. The following descriptions provide details of the four modules:
Fig. 2.
Fig. 2. The implementation of AMBM.
The learning module trains a repair evaluator for a given abstract machine. To obtain the training data, correct state transitions of the abstract machine are approximated using the ProB model checker. The state transitions are vectorised as training data using the state transition encoder in B-repair. Repair evaluators can be a Bernoulli naive Bayes classifier, a logistic regression classifier, a support vector machine, or a random forest. Training functions for these classifiers are inherited from scikit-learn and Silas.
The modification module generates atomic modifications for the abstract machine. In this module, the abstract machine is checked using the ProB model checker, and all state transitions and all invariant violations are collected. After analysing the state transitions and the invariant violations, the constraint generator and solver work out candidate atomic modifications that can remove all the invariant violations from the state transitions. Features of the candidate atomic modifications are computed using the state transition encoder in B-repair. These features and the trained repair evaluator are used to predict repair scores in the repair score predictor. The predictor makes use of the prediction functions in scikit-learn and Silas to estimate the repair scores. After that, the candidate atomic modifications are sorted by their repair scores, and those with high repair scores are selected.
The refactoring module converts the selected atomic modifications to compound modifications. First, the Mod2Pred convertor rewrites the atomic modifications to their predicate forms, and the context-free grammar predicate generator generates candidate predicates from predefined context-free grammars of B and types of variables in the atomic modifications. Then the atomic modifications are clustered into a number of partitions by the satisfiability between the predicate forms of the atomic modifications and the candidate predicates. Next, the predicate extractor collects predicates of each partition. Finally, these predicates are converted to compound modifications via the Pred2Mod convertor.
The update module uses a machine updater to apply the compound modifications to the original abstract machine and uses the ProB model checker to check the correctness of the updated abstract machine. If the model checker does not report any invariant violation, then the AMBM tool will terminate and return the abstract machine. Otherwise, a new loop of AMBM is started to eliminate invariant violations in the abstract machine.
In Sections 4 and 5, the AMBM tool is used to conduct experiments.

4 A Case Study On AMBM

This section provides a case study to explain Algorithms 1 and 2. The two algorithms are used to repair the bus control model in Figure 3. The model has the following features:
Fig. 3.
Fig. 3. A bus control model.
Buses are numbered as \(1,~2, \ldots , N\).
“Selected_Bus” is a variable that denotes the ID of a selected bus that is being controlled. When Selected_Bus = 0, no bus is selected.
“Current_Location(i)” is a variable that denotes the current location of the ith bus.
“Next_Location(i)” is a variable that denotes the next scheduled location of the ith bus.
“Moving(i)” is a variable that denotes whether or not the ith bus is moving.
St1, St2, \(St3,\) and St4 are stations, and Airport is an airport.
Initially, all buses are at St1, not moving and not scheduled.
“Bus_Selector” is an operation that selects a bus to send a signal.
“Signal_Sender” is an operation that sends a signal to the selected bus to schedule a destination.
“Bus_Controller” is an operation that moves the selected bus to the scheduled destination.
In the model, a bus controller is described by the Bus_Controller operation. It has a pre-conditioned substitution of the form \({\rm PRE}~\ldots ~{\rm THEN}~\ldots ~{\rm END}\), where the predicate between \({\rm PRE}\) and \({\rm THEN}\) is a pre-condition, meaning that if the selected bus does not reach its next scheduled location, the bus controller will start controlling it to reach the location. The substitution between the first \({\rm THEN}\) and the last \({\rm END}\) describes how to change the state of the bus. If the bus is not moving, then it starts moving to the next location. If the bus is on the way to the next location, then it will stop at the next location. The operation is required to satisfy the invariant, where variable identifiers “ii” and “jj” are used instead of “i” and “j,” because single letters are not allowed to be identifiers in some B model checkers. The last two lines of the invariant mean that St4 can hold at most one bus, but the invariant is violated, because the operation allows two buses to stop at St4. We use Algorithm 1 to solve such invariant violations.

4.1 The Learning Phase

We used the case of \(N = 2\) as an example, i.e., the bus controller controlled two buses with IDs 1 and 2. In the learning phase, the State-Space function can approximate state transitions of the model. To aid the readability, we recorded a state as [Selected_Bus, Moving(1), Moving(2), Current_Location(1), Current_Location(2), Next_Location(1), Next_Location(2)], and “TRUE” and “FALSE” are recorded as “T” and “F,” respectively. Due to the large state space, we give the following state transitions as examples, where changed values are underlined:
\([0,F,F,St1,St1,St1,St1] \xrightarrow {Bus\_Selector} [\underline{1},F,F,St1,St1,St1,St1],\)
\([1,F,F,St1,St1,St1,St1] \xrightarrow {Signal\_Sender} [\underline{0},F,F,St1,St1,\underline{St2},St1],\)
\([0,F,F,St1,St1,St2,St1] \xrightarrow {Bus\_Selector} [\underline{1},F,F,St1,St1,St2,St1],\)
\([1,F,F,St1,St1,St2,St1] \xrightarrow {Bus\_Controller} [\underline{0},\underline{T},F,St1,St1,St2,St1],\)
\([0,T,F,St1,St1,St2,St1] \xrightarrow {Bus\_Selector} [\underline{1},T,F,St1,St1,St2,St1],\)
\([1,T,F,St1,St1,St2,St1] \xrightarrow {Bus\_Controller} [\underline{0},\underline{F},F,\underline{St2},St1,St2,St1].\)
The above state transitions show how the first bus moves from St0 to St1, and they are considered possible state transitions.
Using the Repair-Evaluator-Training function, a set of impossible state transitions can be generated so a binary classifier can be trained to distinguish the two classes of state transitions. Due to a large number of impossible state transitions, we give the following examples:
\([0,F,F,St1,St1,St1,St1] \xrightarrow {Bus\_Selector} [0,F,F,St1,St1,\underline{St2},St1],\)
\([1,F,F,St1,St1,St1,St1] \xrightarrow {Signal\_Sender} [\underline{0},\underline{T},F,St1,St1,\underline{St2},St1],\)
\([0,F,F,St1,St1,St2,St1] \xrightarrow {Bus\_Selector} [\underline{1},\underline{T},F,St1,St1,St2,St1],\)
\([1,F,F,St1,St1,St2,St1] \xrightarrow {Bus\_Controller} [1,F,F,St1,St1,\underline{St1},St1],\)
\([0,T,F,St1,St1,St2,St1] \xrightarrow {Bus\_Selector} [0,\underline{F},F,St1,St1,\underline{St1},St1],\)
\([1,T,F,St1,St1,St2,St1] \xrightarrow {Bus\_Controller} [\underline{0},\underline{F},F,St1,St1,St2,St1].\)
To avoid unnecessary details, we suggest readers refer to Reference [10] for details of repair evaluator training. The trained classifier is stored for future use.

4.2 The Modification Phase

In the modification phase, to produce atomic modifications, the Invariant-Solutions function can compute a set of solutions satisfying the invariant. For example:
\([0,F,F,St3,St4,St4,St4],\)
\([0,F,F,St3,St4,St4,St5],\)
\([0,F,F,St4,St3,St4,St4],\)
\([0,F,F,St4,St3,St5,St4].\)
Next, the model is iteratively repaired until no further invariant violations can be detected. To detect invariant violations, the State-Space function is used to approximate the state space of the current model, and the Faulty-Transitions function is used to collect invariant violations in the state space. The following four faulty state transitions are found:
\([1,T,F,St3,St4,St4,St4] \xrightarrow {Bus\_Controller} [\underline{0},\underline{F},F,\underline{St4},St4,St4,St4],\)
\([1,T,F,St3,St4,St4,St5] \xrightarrow {Bus\_Controller} [\underline{0},\underline{F},F,\underline{St4},St4,St4,St5],\)
\([2,F,T,St4,St3,St4,St4] \xrightarrow {Bus\_Controller} [\underline{0},F,\underline{F},St4,\underline{St4},St4,St4],\)
\([2,F,T,St4,St3,St5,St4] \xrightarrow {Bus\_Controller} [\underline{0},F,\underline{F},St4,\underline{St4},St5,St4].\)
The above state transitions violate the invariant, because they allow the two buses to stop at St4 at the same time. To repair the above state transitions, their post-states will be replaced with other candidate post-states satisfying the invariant. All correct states in the state space, which are collected using the Correct-States function, and the states computed using the Invariant-Solutions function, are considered as candidate post-states. For each faulty state transition, the Atomic-Modifications function can use the candidate post-states to generate a set of candidate modifications, and the Repair-Score function can estimate their repair scores. For example, to repair \([1,T,F,St3,St4,St4,St4] \xrightarrow {Bus\_Controller} [\underline{0},\underline{F},F,\underline{St4},St4,St4,St4]\) using the four solutions, the following candidate modifications with repair scores (\(\sigma\)) are generated:
\([1,T,F,St3,St4,St4,St4] \xrightarrow {Bus\_Controller} [\underline{0},\underline{F},F,St3,St4,St4,St4]\) (\(\sigma = 0.458\)),
\([1,T,F,St3,St4,St4,St4] \xrightarrow {Bus\_Controller} [\underline{0},\underline{F},F,St3,St4,St4,\underline{St5}]\) (\(\sigma = 0.274\)),
\([1,T,F,St3,St4,St4,St4] \xrightarrow {Bus\_Controller} [\underline{0},\underline{F},F,\underline{St4},\underline{St3},St4,St4]\) (\(\sigma = 0.403\)),
\([1,T,F,St3,St4,St4,St4] \xrightarrow {Bus\_Controller} [\underline{0},\underline{F},F,\underline{St4},\underline{St3},\underline{St5},St4]\) (\(\sigma = 0.159\)).
As the first modification has the highest \(\sigma\) value, it is selected to repair the model. Similarly, for all the faulty state transitions, the Modification-Selection function can select the following modifications, which are of the form “pre-state \(\xrightarrow {\text{operation}}\) faulty post-state \(\hookrightarrow\) modified post-state”:
\([1,T,F,St3,St4,St4,St4] \xrightarrow {Bus\_Controller}\)
\([\underline{0},\underline{F},F,\underline{St4},St4,St4,St4] \hookrightarrow [\underline{0},\underline{F},F,St3,St4,St4,St4],\)
\([1,T,F,St3,St4,St4,St5] \xrightarrow {Bus\_Controller}\)
\([\underline{0},\underline{F},F,\underline{St4},St4,St4,St5] \hookrightarrow [\underline{0},\underline{F},F,St3,St4,St4,St5],\)
\([2,F,T,St4,St3,St4,St4] \xrightarrow {Bus\_Controller}\)
\([\underline{0},F,\underline{F},St4,\underline{St4},St4,St4] \hookrightarrow [\underline{0},F,\underline{F},St4,St3,St4,St4],\)
\([2,F,T,St4,St3,St5,St4] \xrightarrow {Bus\_Controller}\)
\([\underline{0},F,\underline{F},St4,\underline{St4},St5,St4] \hookrightarrow [\underline{0},F,\underline{F},St4,St3,St5,St4].\)
The Update function can use the above atomic modifications to repair the Bus_Controller operation, leading to the repaired operation in Figure 4. As a result, the repaired operation no longer triggers any invariant violations. A problem is that each atomic modification can only remove one invariant violation, resulting in tedious code. The code can be simplified in the refactoring phase.
Fig. 4.
Fig. 4. The repaired bus controller operation before refactoring.

4.3 The Refactoring Phase

The atomic modifications are refactored using the Refactoring function, i.e., Algorithm 2. We use \([1,T,F,St3,St4,St4,St4] \xrightarrow {Bus\_Controller} [\underline{0},\underline{F},F,\underline{St4},St4,St4,St4] \hookrightarrow [\underline{0},\underline{F},F,St3,St4,St4,St4]\) as an example. The Modifications-To-Predicates function converts the pre-state \([1\), T, F, St3, St4, St4, \(St4]\) and the modified state \([0\), F, F, St3, St4, St4, \(St4]\) to the predicate “Pre_Selected_Bus = 0 & Pre_Moving(1) = FALSE & Pre_Moving(2) = FALSE & Pre_Current_Location(1) = St3 & Pre_Current_Location(2) = St4 & Pre_Next_Location(1) = St4 & Pre_Next_Location(2) = St4 & Mod_Selected_Bus = 0 & Mod_Moving(1) = FALSE & Mod_Moving(2) = FALSE & Mod_Current_Location(1) = St3 & Mod_Current_Location(2) = St4 & Mod_Next_Location(1) = St4 & Mod_Next_Location(2) = St4.”
Next, the CFG-Predicates function generates a set of predicates using context-free grammars, e.g., Mod_Selected_Bus = 0, Mod_Selected_Bus = 1, Mod_Selected_Bus = Pre_Selected_Bus, Mod_Selected_Bus = 1 - Pre_Selected_Bus, Mod_Moving(1) = TRUE, Mod_Moving(1) = FALSE, Mod_Next_Location(1) = St4, Mod_Next_Location(1) = St5, and Mod_Next_Location(1) = Pre_Next_Location(2). Based on the predicates, the Candidate-Predicates function will find candidate predicates that describe the modifications. For example, the modification \([1,T,F,St3,St4,St4,St4] \xrightarrow {Bus\_Controller} [\underline{0},\underline{F},F,\underline{St4},St4,St4,St4] \hookrightarrow [\underline{0},\underline{F},F,St3,St4,St4,St4]\) can be described using \(Mod\_Selected\_Bus = 0\), \(Mod\_Selected\_Bus = 1 - Pre\_Selected\_Bus\), \(Mod\_Moving(1) = FALSE\), \(Mod\_Next\_Location(1) = St4\), \(Mod\_Next\_Location(1) = Pre\_Next\_Location(2)\), and so on.
The Best-Partition function can produce the best partitions of modification predicates using the following predicates:
Mod_Selected_Bus = 0,
Mod_Moving(1) = FALSE,
Mod_Moving(2) = FALSE,
Mod_Current_Location(1) = PRE_Current_Location(1),
Mod_Current_Location(2) = PRE_Current_Location(2),
Mod_Next_Location(1) = PRE_Next_Location(1),
Mod_Next_Location(2) = PRE_Next_Location(2).
The above predicates can lead to the best partition, because they are satisfied by all the modifications, leading to a partition that includes all the modifications and an empty partition. As the second partition is empty, the current partitions are the best partitions, and no further partitions will be produced. As a counterexample, if another candidate predicate “Mod_Next_Location(1) = St4” is used to split the modification predicates, then the resulting two partitions will include three modifications and one modification, respectively, which are not the best partitions.
Finally, the Compound-Modification can convert the partition to a compound modification that covers the following pre-states:
\([1,T,F,St3,St4,St4,St4],\)
\([1,T,F,St3,St4,St4,St5],\)
\([2,F,T,St4,St3,St4,St4],\)
\([2,F,T,St4,St3,St5,St4].\)
The compound modification uses the following refactored substitutions to generate post-states:
Selected_Bus := 0,
Moving(1) := FALSE,
Moving(2) := FALSE.
The above compound modification means that if a bus is already at St4, and the other bus is currently on the way to St4, then the latter should stop at the current location. After returning the compound modification, Algorithm 2 terminates.

4.4 The Update Phase

During the update phase, the compound modification is applied to the original Bus_Controller operation, leading to the repaired operation in Figure 5. Due to the existence of the extra IF-THEN-ELSE-END construct, the compound modification can disable the faulty post-states while maintaining other correct behaviours of the original operation. For any pre-states that are covered by the compound modification, their post-states are determined using the refactored substitution. For any other pre-states, their post-states are determined using the original substitution.
Fig. 5.
Fig. 5. The refactored bus controller operation.

5 Evaluation

This section presents an empirical study of the AMBM tool. The experiments consist of two parts: Part I includes experiments of repair evaluator training, and Part II includes experiments of the entire abstract machine modification processes.

5.1 Purpose of the Evaluation

The objective of Part I was to demonstrate that the classifiers can distinguish between possible state transitions and impossible state transitions. Part I evaluated five types of classifiers, including Bernoulli Naive Bayes (BNB) classifiers, Logistic Regression (LR) classifiers, Support Vector Machines (SVM) with radial basis function kernels, Random Forests (RF), and Silas, on repair evaluator training tasks. To compare this work with our previous work [10], we conducted the same evaluation for the traditional B-repair with Classification And Regression Trees (CART) and ResNet. We used a set of correct abstract machines (in Table 1, which will be explained in Section 5.2) to generate training and test sets. First, for each subject, the model checker was used to approximate a state space of the abstract machine, and possible transitions were extracted from the state space. Second, impossible transitions were randomly generated. This process did not randomly generate any new states but used existing states to randomly generate impossible state transitions. For example, if we have a variable \(x = 0\) and an operation \(Add2 = {\rm PRE}~x \lt 3~{\rm THEN}~x := x + 2~{\rm END}\), then possible state transitions will include \(x = 0 \xrightarrow {Add2} x = 2\) and \(x = 2 \xrightarrow {Add2} x = 4\), and existing states will be \(x = 0\), \(x = 2\) and \(x = 4\). Impossible state transitions will include \(x = 0 \xrightarrow {Add2} x = 0\), \(x = 0 \xrightarrow {Add2} x = 4\), \(x = 2 \xrightarrow {Add2} x = 0\), \(x = 2 \xrightarrow {Add2} x = 2\), \(x = 4 \xrightarrow {Add2} x = 0\), \(x = 4 \xrightarrow {Add2} x = 2\), and \(x = 4 \xrightarrow {Add2} x = 4\). In our experiments, to avoid combinatorial explosion, existing states were randomly selected to generate impossible state transitions, and the generation process would terminate if the number of generated impossible state transitions reached the number of possible state transitions. As a result, 50% of the transitions were of the “possible” class, and the remaining 50% of the transitions were of the “impossible” class. All transitions were shuffled and split into a training set and a test set that contained 80% and 20% of the transitions, respectively. Third, the five classifiers were trained using the training set, and consistent hyper-parameters were used during the training.3 Finally, the trained classifiers were evaluated on the test set, and evaluation metrics included the classification accuracy and the area under the receiver operating characteristic curve (ROC-AUC) [12].
Table 1.
SubjectSource File (.mch)# LOC# Var.# Inv.# Ope.
   ProB Public Examples  
M01Paperround_simple402 sets23
M02Sortarray411 array of 4 integers13
M03ParallelModelCheckTest471 array of 5 integers; an integer43
M04POR_TwoThreads_WithSync484 integers43
M05progress553 sets64
M06monitor2563 sets64
M07BridgeTransitions573 integers45
M08InvolvedSequences2604 integers45
M09club752 sets47
M10ADD4855 integers; 1 bool79
M11TestBZTT3851 array of 10 elements; 1 element; 1 set38
M12scheduler6903 sets105
M13BinomialCoefficientConcurrent1095 integers58
M14Lift21192 integers; 2 bools; 1 set58
M15Mikrowelle203an integer; 4 bools; 1 element812
M16CSM21114 integers1513
M17GSM_revue215two arrays of 4 bools and 4 elements, etc.206
M18Cruise_finite14821 integer; 2 sets; 12 bools3926
Wireless Network Protocol Models
M19CXCC1243 functions76
M20BarRel1664 functions; 1 set69
ABZ Automotive Models
M21GenericTimersMC791 function15
M22BlinkLamps1513 integers; 1 set; 1 bool117
M23PitmanController3731 integer; 1 set; 5 elements; 2 bools2110
M24PitmanController_TIME_MC4891 function; 3 integers; 1 set; 3 elements; 2 bools2211
Table 1. Dataset of Abstract Machines
Part II evaluated the whole abstract machine modification process with the five classifiers and the modification refactoring function. We used fault seeding and removal to evaluate the algorithms. First, for each subject, faults were randomly seeded into 100 deterministic transitions of the correct abstract machine, leading to a faulty machine that could trigger 100 invariant violations and a set of standard answers indicating correct modifications. For instance, if \(S_{pre} \xrightarrow {\alpha } S_{\top }\) is a correct transition, and \(S_{\bot }\) is a state that triggers an invariant violation, then a fault will be seeded by replacing \(S_{pre} \xrightarrow {\alpha } S_{\top }\) with \(S_{pre} \xrightarrow {\alpha } S_{\bot }\), and the corresponding standard answer is \([S_{pre},\alpha ,S_{\bot },S_{\top }]\), which means that \(S_{pre} \xrightarrow {\alpha } S_{\bot }\) is a faulty transition and should be repaired by replacing \(S_{\bot }\) with \(S_{\top }\). Faulty state transitions were randomly injected into a correct abstract machine using the following steps:
The type constraints of variables are extracted from the model.
A large number of states satisfying the type constraints are randomly generated. The states are verified against the invariant of the abstract machine, and faulty states that violate the invariant are collected.
A faulty state \(S_{\bot }\) is randomly selected, and a correct state transition \(S_{pre} \xrightarrow {\alpha } S_{\top }\) that is generated by the abstract machine is randomly selected as a position to inject \(S_{\bot }\). To inject \(S_{\bot }\) into \(S_{pre} \xrightarrow {\alpha } S_{\top }\), we produce an atomic modification \([\alpha ,S_{pre},S_{\top },S_{\bot }]\) and use the Update function in Section 3.1 to apply the modification.
The above method was used to make 10 faulty machines based on each correct machine in Table 1. Consequently, \(10 \times 24 = 240\) faulty machines were made. In total, 240 faulty machines with 24,000 faulty transitions were produced. After using AMBM to repair all faulty machines, suggested modifications were compared with the standard answers and evaluated using the following metrics:
modification accuracy \(MA = N^{val}_{cor}/N^{val}_{tot}\), where \(N^{val}_{cor}\) denotes the number of correctly modified values with reference to the standard answers, and \(N^{val}_{tot}\) denotes the total number of values;
refactoring generality \(RG = 1 - N_{rec}/N_{0}\), where \(N_{0}\) denotes the number of modifications before refactoring, and \(N_{rec}\) denotes the number of modifications after refactoring;
average repair time \(ART = T/N_{F}\), where T denotes running time and \(N_{F}\) denotes the number of faults.
The intuition of MA is that the suggested atomic modifications are expected to coincide with standard answers. The intuition of RG is that the suggested atomic modifications are expected to be generalised as fewer compound modifications. When computing RG, both correct and incorrect atomic modifications are taken into account. Generalisation is the outcome of refactoring. Before refactoring, all modifications are atomic modifications. After refactoring, a number of atomic modifications have been replaced with fewer compound modifications. As a single compound modification can cover the functions of multiple atomic modifications, it can reduce the total number of modifications and make the modifications more expressive. In the best case, a compound modification covers all atomic modifications, leading to \(N_{rec} = 1\) and \(RG = 1 - 1/N_{0}\) (i.e., the maximum RG). In the worst case, no compound modification is produced, leading to \(N_{rec} = N_{0}\) and \(RG = 0\) (i.e., the minimum RG). With regard to ART, the intuition is that the average time for eliminating a fault should be reasonable.

5.2 Experimental Settings

All experiments were run on a machine equipped with Intel(R) Core(TM) i5-4670 CPU (4 cores, 3.40 GHz) and 8 GB memory. The operating system was Ubuntu Desktop 16.04.4 A specific dataset to evaluate our solution was constructed using the materials from the ProB Public Examples repository.5 We used the following filters to select machines:
Filter #1 selects machines that are syntactically correct.
Filter #2 selects machines that have variables, invariants, and operations.
Filter #3 selects machines that approximate at most 30K state transitions (to avoid memory exhaustion on our equipment).
Filter #4 selects machines that approximate at least 500 deterministic state transitions. Note that the selected machines may include non-determinism as well. To get more machines, the scales of small machines may be expanded by adjusting their set cardinalities and integer scopes.
Filter #5 selects machines that can pass the model checking.
Filter #6 selects machines that only have Boolean values, integers, distinct elements, and first-order sets as single variables or arrays.
After using the above filters, we manually removed redundant machines and machines without actual meanings. Consequently, we obtained 18 well-formed and error-free machines for evaluation, and the size of their state space (i.e., the number of states plus the number of state transitions) ranged from 1K to 27K. Table 1 provides information on the 18 machines, i.e., M01–M18, including the source file of each machine, the number of lines of code (# LOC), the number of variables (# Var.), the number of invariants (# Inv.), and the number of operations (# Ope.). A subset of the abstract machines and their essential information (e.g., source files and # LOC) were used by Reference [10], where # LOC was counted after using ProB [22] to convert the source files into the pretty-printed format.
Additionally, we added M19–M24 into the evaluation dataset. M19 and M20 were relevant to the wireless network protocols studied by References [30, 31]. The original B models of M19 and M20 were found from the repository shared by [19].6 M21–M24 were originally a part of automotive models developed by Reference [23].7 We adapted M19–M24 to suit the capabilities of AMBM. For example, partial functions were replaced by total functions, as partial functions could not be processed by the repair evaluators. Besides, composite models were expanded as whole models as AMBM could not process the INCLUDES mechanism.

5.3 Results and Discussions

5.3.1 Results of Part I.

Table 2 shows the results of repair evaluator training experiments, including the number of training and test examples (i.e., # Examples), the ROC-AUC of classifiers in each subject, the total ROC-AUC, and the total classification accuracy (CA). We observed that with regard to ROC-AUC and CA, RF and Silas obtained the best results, i.e., over 99% ROC-AUC and over 98% CA, and the difference between Silas and RF on these metrics was insignificant. Besides, SVM performed well and obtained over 96% ROC-AUC and over 91% CA, whereas LR and BNB fell behind the others. Additionally, both RF and Silas gained better ROC-AUC and CA than the two traditional repair evaluators of B-repair, i.e., ResNet and CART. In summary, RF and Silas showed high predictive performance on the repair evaluator training tasks.
Table 2.
  ROC-AUC
Subject# ExamplesBNBLRSVMRFSilasResNetCART
M0124,5300.6150.6191.0001.0001.0000.9610.973
M0222,5000.7220.7360.9840.9960.9900.9860.986
M034,6080.7290.6910.9651.0001.0000.9760.953
M0410,2020.6600.6680.9361.0001.0000.9550.955
M0515,3600.6400.6450.9970.9990.9990.9900.961
M0614,1600.5840.5930.9990.9990.9990.9790.967
M074,5500.6630.6540.7250.9650.9950.8190.840
M082,0240.6730.7090.9050.9980.9880.9250.931
M0914,5800.5360.5390.9940.9990.9990.9920.999
M106,5920.6250.6360.9931.0000.9980.9620.998
M1115,8600.8480.8680.9300.9970.9950.9480.968
M1220,9760.5660.5630.9961.0001.0000.9960.963
M1310,2100.5360.5320.6210.9991.0000.7940.813
M1411,9180.8020.8140.9971.0000.9970.9940.990
M151,9440.7770.8370.8440.9990.9880.8860.908
M162,4560.6420.6660.8500.9990.9990.9050.912
M179,0560.6790.7350.9560.9980.9960.9460.937
M1851,3840.7830.8220.9951.0000.9980.9750.996
M1911,3780.7360.7591.0000.9990.9990.9380.966
M205,2440.7710.7860.9991.0001.0000.9530.992
M2115,6620.7460.7530.9820.9790.9810.9240.978
M225,0020.8840.8900.9830.9960.9930.9120.968
M231,0240.6150.7180.9100.9770.9370.8330.941
M2433,9440.7060.7320.9991.0000.9990.9520.988
All315,1640.6940.7110.9690.9980.9970.9580.970
Classification Accuracy0.6460.6650.9170.9800.9800.9080.941
Table 2. Results of Repair Evaluator Training

5.3.2 Results of Part II.

Table 3 shows modification accuracies (MA), refactoring generalities (RG), and average repair time (ART) of the abstract machine batch modification experiments. We observed that with regard to MA, RF was the leading classifier with over 88% MA, followed by Silas with over 83% MA. The use of BNB and LR resulted in significantly bad accuracy, because the two models were too simple to approximate the encodings of state transitions. SVM obtained better MA, because its kernel functions could form more complex transforms than BNB and LR. Besides, Silas showed better generalisation capability than RF. The use of RF resulted in significantly bad generality probably because the strong fitting ability caused overfitting. By contrast, BNB and LR had significantly lower accuracy and higher generality, because their fitting ability was too weak to cause overfitting.
Table 3.
 Modification AccuracyRefactoring GeneralityAverage Repair Time (s)
SubjectBNBLRSVMRFSilasBNBLRSVMRFSilasBNBLRSVMRFSilas
M010.3880.3880.6110.9430.8340.9590.9590.9800.9650.9652.9652.9656.6963.0333.157
M020.4600.4560.3980.6900.6590.9900.9900.4950.7440.8682.0692.0664.9822.1422.138
M030.2970.2490.8440.8780.8540.9520.9700.3590.2150.5453.1353.1116.4333.1613.619
M040.0180.0130.1870.9750.9480.9800.9780.5830.1370.33110.6949.16143.6703.1683.094
M050.3130.3130.4570.9690.8660.9600.9600.9600.8950.8782.0182.0233.2221.7581.828
M060.1410.1380.3330.6490.3830.9600.9600.9600.8410.9291.4171.3982.8781.4981.530
M070.2710.2840.5180.7090.9020.9600.9600.9200.3610.5290.7160.7152.7490.7600.748
M080.1460.1550.4560.9700.9290.9610.9610.5320.3420.7732.6822.5842.9923.0103.350
M090.2140.2140.3990.7020.4280.8760.8760.9500.8610.9201.5341.5312.7151.5971.684
M100.3050.3140.9430.9990.9990.9200.9200.3400.3590.8720.6490.6480.8300.6680.688
M110.4530.6520.6560.9690.9520.9530.9530.9120.6950.7934.5254.5386.6294.5784.646
M120.1420.1480.4020.9460.9160.9500.9520.9590.8450.8412.4212.3906.2802.3842.451
M130.1080.0800.1480.9000.9760.9300.9300.8850.1590.4661.6291.61616.1601.7031.660
M140.3650.3620.7990.9970.9810.9200.9200.5910.4990.7153.3453.5044.4563.2443.444
M150.6570.6520.6710.9090.7330.8900.8900.8820.4960.8173.3893.3534.6973.2773.558
M160.5410.4490.7140.9730.9750.8710.8710.3670.2290.6022.0191.9954.8202.0492.267
M170.7580.7580.9170.9950.9900.9800.9800.8390.7920.8052.1592.1522.7412.2332.283
M180.4250.4090.8110.9990.9900.8020.8020.7190.7950.77610.0349.97026.0249.5189.803
M190.5440.5510.8570.9550.9050.9580.9580.8370.7460.7634.9985.0935.3895.0055.004
M200.5180.5510.9270.9790.9770.9130.9080.6730.7000.7403.5173.5024.3053.8623.694
M210.2900.2900.5100.4840.4450.9800.9800.9280.7730.8730.9520.9551.2660.9580.957
M220.4580.4740.7120.9160.7630.9510.9510.9040.8590.9191.1891.2011.3001.2551.251
M230.5100.4810.7110.7880.7460.8910.9010.5800.5310.5371.6851.6071.6001.5911.591
M240.4110.4930.8930.9900.9860.8900.8900.7330.5590.5763.9844.20411.1294.0374.028
Average0.3640.3700.6200.8870.8390.9330.9340.7450.6000.7433.0723.0127.2482.7702.853
Table 3. Results of Abstract Machine Batch Modification
Regarding the performance of AMBM, all classifiers required similar ART of approximately 3 seconds with the exception of SVM, which required over 8 seconds. These results demonstrate the feasibility and the pertinence of the AMBM approach. More specifically, these results suggest that among the classifiers considered in the experiments, Silas is the most suitable for approximating the repair scores of abstract machines, as it has both high accuracy and generalisation capability.
The above experiments have demonstrated that the modification accuracy of B model repair can achieve a high level by means of a well-trained classifier, while the refactoring generality of modifications is dependent upon the classifier. Considering both the modification accuracy and the refactoring generality, Silas has the best performance on model repair tasks. Moreover, the finding implies that model repair processes can benefit from the repair evaluators. With repair evaluator training and constraint solving, machines are able to automatically produce repairs and select well-behaved repairs that eliminate faults in the models and preserve the state spaces of the models as much as possible. As the modification accuracy and the refactoring generality have achieved 0.8 and 0.7, respectively, we infer that a large number of suggested modifications are correct and simple. Furthermore, the study has shown the effectiveness of the AMBM algorithm. AMBM is able to automatically eliminate hundreds of faults within a reasonable time, while manually eliminating these faults may require a considerable amount of human effort. Thus, our study suggests that AMBM can assist programmers in designing abstract machines and possibly realise automated model generation via step-by-step incremental design repairs.
Finally, Figure 6 compared AMBM and the traditional B-repair with respect to accuracy and performance. Figure 6(a) indicated that AMBM with RF and Silas gained better accuracies than the traditional B-repair with CART and ResNet. Moreover, Silas had better accuracy than CART, because Silas’s internal decision trees could model more data types than CART. Additionally, ResNet failed to suggest accurate modifications probably because such a neural network architecture is not appropriate for learning small data. Figure 6(b) indicated that AMBM’s performance was significantly better than B-repair. B-repair’s performance bottleneck was mainly caused by successive repairs, i.e., after eliminating a single invariant violation, a model-checking process was started to detect the next invariant violation. Consequently, B-repair was slowed down when the model repair processes were run repeatedly. AMBM solved this problem by detecting multiple invariant violations during only one model-checking process so the number of required model checking processes could be reduced. Overall, the above results showed that AMBM had higher accuracy and performance than the traditional B-repair.
Fig. 6.
Fig. 6. Comparing AMBM with the traditional B-repair [10].

6 Discussions

In this section, we discuss a feasible extension to generally repair non-determinism. Additionally, we discuss the limitations of our approach and compare this study with other existing studies.

6.1 Modifications on Non-determinism

As previously mentioned, the modification operator eliminates invariant violations triggered by deterministic state transitions. To repair non-determinism, we suggest a new modification operator. For an operation, given a pre-state p and an erroneous post state q, a modification operator modifies q to a correct post state r. The modification operator can be described using a triple \([u,``modification,^{\prime \prime }r]\), where u is the concatenation of p and q. The triple means that the faulty state pair \((p,q)\) is rewritten as a correct state pair \((p,r)\) using the modification operator. To control the application domain of modification, u is converted to the following condition:
\begin{equation} v^{pre}_1 = p[v_1]~\wedge ~v^{post}_1 = q[v_1]~\wedge ~\cdots ~\wedge ~v^{pre}_n = p[v_n]~\wedge ~v^{post}_n = q[v_n], \end{equation}
(5)
where \(v^{pre}_i~(i=1,2,\ldots ,n)\) is a pre-variable identifier of \(v_i\), \(v^{post}_i~(i=1,2,\ldots ,n)\) is a post-variable identifier of \(v_i\), \(p[v_i]\) is the value of \(v_i\) in p, and \(q[v_i]\) is the value of \(v_i\) in q. Moreover, r can be converted to the following substitution:
\begin{equation} v^{post}_1 := r[v_1]~;~\ldots ~;~v^{post}_n := r[v_n], \end{equation}
(6)
where \(v^{post}_i~(i=1,2,\ldots ,n)\) is a post-variable identifier of \(v_i\), and \(r[v_i]\) is the value of \(v_i\) in r. The Non-Determinism Modification (NDM) operator to repair non-determinism is defined as follows: Given M modification operators that are described using M triples \([u_j,``modification,^{\prime \prime }r_j]~(j=1,\ldots ,M)\), we use Equation (5) to convert each \(u_j\) to a condition \(U_j\) and use Equation (6) to convert each \(r_j\) to a substitution \(R_j\). Given an operation \(\alpha = {\rm {\bf PRE}}~~P~~{\rm {\bf THEN}}~~S~~{\rm {\bf END}}\), where P is a pre-condition, and S is a substitution, an NDM can be applied to \(\alpha\) using the following template:
In particular, if the original operation does not have a pre-condition, then the repaired operation does not have the outer “PRE-THEN-END” statement. The above template uses the “VAR-IN-END” construct with two blocks of temporary variables, i.e., “\(\lt\)pre-variables\(\gt\)” and “\(\lt\)post-variables\(\gt ,\)” to store pre-values and post-values. Before applying the substitution S, the values of the state variables are the pre-values, so “\(\lt\)pre-variables\(\gt\) := \(\lt\)state-variables\(\gt\)” can assign the pre-values to the pre-variables. After applying S, the values of the state variables become the post-values, so “\(\lt\)post-variables\(\gt\) := \(\lt\)state-variables\(\gt\)” can assign the post-values to the post-variables. As a result, the pre-values and the post-values are obtained. After using the conditions \(U_j~(j=1,\ldots ,M)\) and the substitutions \(R_j~(j=1,\ldots ,M)\) to modify the values of the post-variables, the modified values are assigned to state variables, leading to a new post-state.
The use of NDM requires a slight change on Algorithm 1, i.e., the Update function is adapted to use the NDM template to repair faulty operations. The following example of autonomous vehicle control model will demonstrate the use of NDM. The model describes how to control a bus in a city. Figure 7 visualises a city map, which has a set Location containing five bus stations, i.e., S0, S1, S2, S3, and S4, and five other locations, i.e., C0, C1, C2, C3, and C4. The locations are linked by a set Street containing bidirectional edges. The model has a variable “loc” recording the location of the bus. The value of loc is changed using the following “move” operation:
Fig. 7.
Fig. 7. The road map of the AVC model.
Suppose that the bus station S4 is temporarily unavailable. The following invariant is used to specify the constraint:
The invariant is violated, because the move operation can change the value of loc from S3, C3, and C4 to S4. Corresponding faulty state transitions and feasible modifications (indicated by “\(\hookrightarrow\)”) are listed below.
The meaning of the modifications is that if the bus is scheduled to move from S3, C3, or C4 to S4, the bus will be rescheduled to move from S3 to S2, from C3 to C4 or from C4 to C3, respectively. They are applied to the move operation using the NDM operator, leading to the following repaired operation:
The above example demonstrates that even though multiple state transitions can share a common post-state that triggers an invariant violation, the NDM can repair each state transition separately. This is because the conditions of NDM can ensure that each repair acts on one and only one state transition.

6.2 Limitations

As discussed in Section 2.2, the repair evaluators cannot deal with unseen elements in infinite sets. When training the repair evaluators, unseen elements are avoided by reducing infinite sets into finite sets that contain all elements occurred in a state space. For example, if a state space only includes integers 0, 1, 2, and 4, then the finite set {0, 1, 2, 4} is used to generate one-hot encodings, e.g., {1, 4} is encoded as [0, 1, 0, 1]. However, unseen elements such as 3 and 5 are not preserved in one-hot encodings, e.g., {1, 3, 4, 5} is treated as {1, 4} and encoded as [0, 1, 0, 1]. Consequently, the repair evaluators cannot distinguish between {1, 4} and {1, 3, 4, 5}. Besides, to avoid unseen elements, AMBM can control the constraint solver to find modifications that only contain elements occurring in a given state space, but such a restriction may disable a few required modifications. For example, suppose that a faulty operation Inc attempts to increase a set of integers by 1, the operation yields correct state transitions such as \(\lbrace 0\rbrace \xrightarrow {Inc} \lbrace 1\rbrace\), \(\lbrace 1\rbrace \xrightarrow {Inc} \lbrace 2\rbrace\) and \(\lbrace 0, 1\rbrace \xrightarrow {Inc} \lbrace 1, 2\rbrace\) and faulty state transitions such as \(\lbrace 2\rbrace \xrightarrow {Inc} \lbrace \rbrace\) and \(\lbrace 1, 2\rbrace \xrightarrow {Inc} \lbrace 2\rbrace\). The faulty state transitions should be repaired as \(\lbrace 2\rbrace \xrightarrow {Inc} \lbrace 3\rbrace\) and \(\lbrace 1, 2\rbrace \xrightarrow {Inc} \lbrace 2, 3\rbrace\), respectively. Unfortunately, AMBM cannot suggest the required repairs, because 3 does not occur in the given state transitions. A possible approach to solve this problem is model repair based on inductive programming [33]. Given a component library containing essential symbols such as 0, 1, +, \(-\), * and =, inductive programming can synthesise the function \(x^{\prime } = x + 1\), where x and \(x^{\prime }\) are integers, to generalise the correct state transitions of Inc. When \(x = 2\), a new element \(x^{\prime } = 3\) can be inferred by the function, so the required modifications can be constructed. The above discussion indicates that a merger of inductive programming and AMBM can be a way to solve the restrictions of infinite sets.
The repair of higher-order sets, e.g., sets of sets, is another limitation to the applicability of AMBM. The number of candidate modifications with higher-order sets can be huge due to combinatorial explosions. For example, if a set of sets is constructed by integers \(1, 2, \ldots , n\), then its value will be a member of \(\mathcal {P}^2(\lbrace 1, 2, \ldots , n\rbrace)\), where \(\mathcal {P}\) represents the power set. As the cardinality of \(\mathcal {P}^2(\lbrace 1, 2, \ldots , n\rbrace)\) is \(2^{2^n}\), it is unrealistic to enumerate the members when n is large. Instead of enumeration, AMBM only uses components that occurred in a given state space to generate modifications. As a result, a required modification will be missing if it includes a component not in the state space. For example, suppose that a faulty operation Inc_Send attempts to increase sets of integers by 1 and transfer the sets from a sender to a receiver, with two sets of sets \([Sender,~Receiver]\) to represent states, the operation yields correct state transitions such as:
\([\lbrace \lbrace 0\rbrace ,~\lbrace 1\rbrace ,~\lbrace 0,~1\rbrace \rbrace ,~\lbrace \rbrace ] \xrightarrow {Inc\_Send} [\lbrace \lbrace 1\rbrace ,~\lbrace 0,~1\rbrace \rbrace ,~\lbrace \lbrace 1\rbrace \rbrace ],\)
\([\lbrace \lbrace 1\rbrace ,~\lbrace 0,~1\rbrace \rbrace ,~\lbrace \lbrace 1\rbrace \rbrace ] \xrightarrow {Inc\_Send} [\lbrace \lbrace 0,~1\rbrace \rbrace ,~\lbrace \lbrace 1\rbrace ,~\lbrace 2\rbrace \rbrace ],\)
\([\lbrace \lbrace 0,~1\rbrace \rbrace ,~\lbrace \lbrace 1\rbrace ,~\lbrace 2\rbrace \rbrace ] \xrightarrow {Inc\_Send} [\lbrace \rbrace ,~\lbrace \lbrace 1\rbrace ,~\lbrace 2\rbrace ,~\lbrace 1,~2\rbrace \rbrace ],\)
\([\lbrace \lbrace 1\rbrace ,~\lbrace 2\rbrace ,~\lbrace 1,~2\rbrace \rbrace ,~\lbrace \rbrace ] \xrightarrow {Inc\_Send} [\lbrace \lbrace 2\rbrace ,~\lbrace 1,~2\rbrace \rbrace ,~\lbrace \lbrace 2\rbrace \rbrace ],\)
and faulty state transitions such as:
\([\lbrace \lbrace 2\rbrace ,~\lbrace 1,~2\rbrace \rbrace ,~\lbrace \lbrace 2\rbrace \rbrace ] \xrightarrow {Inc\_Send} [\lbrace \lbrace 1,~2\rbrace \rbrace ,~\lbrace \lbrace 2\rbrace ,~\lbrace \rbrace \rbrace ],\)
\([\lbrace \lbrace 1,~2\rbrace \rbrace ,~\lbrace \lbrace 2\rbrace ,~\lbrace \rbrace \rbrace ] \xrightarrow {Inc\_Send} [\lbrace \rbrace ,~\lbrace \lbrace 2\rbrace ,~\lbrace \rbrace \rbrace ].\)
The faulty state transitions should be repaired as follows:
\([\lbrace \lbrace 2\rbrace ,~\lbrace 1,~2\rbrace \rbrace ,~\lbrace \lbrace 2\rbrace \rbrace ] \xrightarrow {Inc\_Send} [\lbrace \lbrace 1,~2\rbrace \rbrace ,~\lbrace \lbrace 2\rbrace ,~\lbrace 3\rbrace \rbrace ],\)
\([\lbrace \lbrace 1,~2\rbrace \rbrace ,~\lbrace \lbrace 2\rbrace ,~\lbrace 3\rbrace \rbrace ] \xrightarrow {Inc\_Send} [\lbrace \rbrace ,~\lbrace \lbrace 2\rbrace ,~\lbrace 3\rbrace ,~\lbrace 2,~3\rbrace \rbrace ].\)
Unfortunately, AMBM cannot suggest the required repairs, because {3} and {2, 3} are unseen in the given state space. To suggest repairs with such unseen components, inductive programming is probably feasible, because it can construct functions to infer unseen components, as discussed above.
Besides, as the repair evaluators treat sets in sets as string elements, any unseen sets in sets cannot be encoded appropriately. To encode such unseen components, a feasible method is to introduce repair evaluators based on graphs. Figure 8 shows an example of the conversion from sets of sets (or higher-order sets) to a graph. Compared to the string representation, the advantage of graph representation is that graphs can link two components at multiple levels so unseen components can be represented by linking with their sub-components. For example, both {{1}, {2}, {1, 2}} and {{2}, {3}, {2, 3}} have a direct link from the set {2} and two indirect links from the element 2. Even though {{2}, {3}, {2, 3}} is unseen in the state space, its encoding can still be partially inferred from the encoding of {{1}, {2}, {1, 2}} using graph learning models [36]. As there are a large number of graph learning models available, their application to B model repair can be considered future work.
Fig. 8.
Fig. 8. Graph representations of member relationships in sets of sets.

6.3 Comparisons to Related Work

Automated B model repair originates from two studies [32, 33]. To eliminate faults in abstract machines, they have proposed four methods, including the strengthening of pre-conditions, the relaxation of pre-conditions, the relaxation of invariants, and the synthesis of new operations. When relaxing pre-conditions and invariants and synthesising new operations, users are required to manually produce positive and negative I/O examples for program synthesis. The difference between their work and our work is that we focus on repairing substitutions, while they focus on repairing pre-conditions. Moreover, our AMBM algorithm uses repair evaluators and constraint solving to automatically synthesise repairs, while their methods require users to manually produce I/O examples for repair synthesis.
AMBM is an improved implementation of our previous work, B-repair [10]. When revising faulty abstract machines, B-repair uses a constraint solver to generate candidate repairs and uses learned quality estimation functions to rank repairs. As B-repair eliminates one and only one fault during each loop of repair, it takes considerably longer to repair models with a large number of faults. In this work, as multiple faults are eliminated using fewer compound repairs during each loop of repair, AMBM is significantly more efficient than the previous B-repair, and the resulting corrections are simpler in terms of the predicate structure.
With regard to the previous work on automatic imperative program repair such as RSRepair [28], GenProg [21], CASC [37], and SearchRepair [17], automated B model repair is conceptually different, because B is a design modelling language for constructing formal specifications at the abstract design level, while imperative programs are at the concrete implementation level [2]. Functions in B design models are usually represented as operations with pre-conditions and substitutions (which play the role of post-conditions) that declare the facts between the before and after values (states) of the variables used in the system. Consequently, operation executions are decided by the current states of variables but not decided by the control flows of the program. Thus, repair evaluators can separately analyse each operation, and repairs can be applied to faulty operations asynchronously without considering the execution orders of operations. In automatic imperative program repair, however, execution orders that are decided by control flows are important factors to be considered.
There are a number of similarities between the repair of B models and imperative program repair. First, both of them have fault localisation functions to reduce the search space of the repair. B model repair can rely on a model checker to detect faulty operations, while imperative program repair can rely on Spectrum-based Fault Localisation (SFL) [1] functions that find suspicious code blocks by counting successful and failed paths of program executions. Second, the concept of inductive programming can be used to synthesise patches of imperative programs [18] and refactor atomic modifications to compound modifications (this work). Third, conditional statements are often used to avoid the side effects of repair. For example, Staged Program Repair (SPR) [24] can use a number of instances to generate conditions that distinguish between correct and failed executions, so side effects to the correct executions can be minimised. Similarly, our AMBM uses conditional substitutions to avoid the side effects of modifications. Finally, as repairs are not definitive, evaluation functions seem to be inevitable to estimate the appropriateness of candidate repairs and select the best repairs. For example, GenProg [21] evaluates candidate repairs by observing successful and failed executions related to the repairs, while AMBM uses classification models and repair scores to select repairs. The above similarities between imperative program repair and B model repair indicate that the technologies in the two fields may be used by each other in the future.

7 Conclusion

We have extended B-repair by implementing abstract machine batch modification, which is an automated method for repairing erroneous B formal models during the correct-by-construction development processes. We have demonstrated that the state spaces of abstract machines can be accurately learned using classic classifiers such as random forests. The learned classifiers can be used to select atomic modifications produced by solving invariant constraints. Moreover, atomic modifications can be merged as compound modifications using predicate refactoring. The explainable and verifiable classifier has yielded high modification accuracies and improved the generality of compound modifications. Consequently, we suggest that automated abstract machine modification has the potential to increase the efficiency and productivity of software development.
In the future, AMBM may be improved as described below:
It is possible to develop repair evaluators based on unsupervised machine learning algorithms, where the generation of negative training examples is no longer needed, because the characteristics of state spaces can be directly learned using the unsupervised methods. We may develop new unsupervised methods similar to random forests and appropriate decision functions to achieve relatively high modification accuracies.
It is possible to extend AMBM using refinement checking techniques. First, a model checker is used to find faulty state transitions that violate refinement conditions. Next, pre- and post-conditions of a refined abstract machine are rewritten as constraints, so a set of candidate modifications can be generated by solving the constraints. After that, a repair evaluator is used to select the best modification to repair each faulty state transition. Finally, the original model is updated using the modifications and checked against the refinement conditions.
It is possible to design an algorithm that can eliminate invariant violations by either weakening invariants or modifying existing state transitions. The problem is how to make a choice between the above two options. A possible solution is to use an evaluator to decide whether a faulty state transition should be kept or removed, i.e., if the state transition gains a relatively high score, then the state transition will be kept, and the corresponding invariant will be weakened. If the state transition does not gain a high score, then the state transition will be modified using atomic modifications, and the corresponding invariant will not be changed.
To use AMBM in industry, we may try to improve the scalability of AMBM by integrating it with more development tools and speeding up the refactoring process using probabilistic techniques.

Footnotes

1
The source code is available via https://github.com/cchrewrite/ambm.
2
Silas Edu is developed by us and can be downloaded from https://www.depintel.com/silas_download.html.
3
For BNB, LR, SVM, and RF, default settings in scikit-learn were used. The default settings of Silas is for datasets containing over 1 million entries. We therefore used another set of settings for smaller datasets, where the number of decision trees in Silas is equal to its counterpart in RF.
4
Scripts and datasets of our experiments can be found in https://github.com/cchrewrite/ambm.

References

[1]
Rui Abreu, Peter Zoeteweij, Rob Golsteijn, and Arjan J. C. van Gemund. 2009. A practical evaluation of spectrum-based fault localization. J. Syst. Softw. 82, 11 (2009), 1780–1792.
[2]
Jean-Raymond Abrial. 2005. The B-book - Assigning Programs to Meanings. Cambridge University Press.
[3]
Thomas Ackling, Bradley Alexander, and Ian Grunert. 2011. Evolving patches for software repair. In Proceedings of the 13th Annual Genetic and Evolutionary Computation Conference (GECCO). 1427–1434.
[4]
Dalal Alrajeh, Jeff Kramer, Alessandra Russo, and Sebastián Uchitel. 2015. Automated support for diagnosis and repair. Commun. ACM 58, 2 (2015), 65–72.
[5]
Haniel Barbosa and David Déharbe. 2012. Formal verification of PLC programs using the B method. In Proceedings of the 3rd International Conference: on Abstract State Machines, Alloy, B, VDM, and Z. 353–356.
[6]
Nazim Benaïssa, David Bonvoisin, Abderrahmane Feliachi, and Julien Ordioni. 2016. The PERF approach for formal verification. In Proceedings of the 1st International Conference on Reliability, Safety, and Security of Railway Systems. Modelling, Analysis, Verification, and Certification. 203–214.
[7]
Christopher M. Bishop. 2007. Pattern Recognition and Machine Learning, 5th Edition. Springer.
[8]
Hadrien Bride, Cheng-Hao Cai, Jie Dong, Jin Song Dong, Zhé Hóu, Seyedali Mirjalili, and Jing Sun. 2021. Silas: A high-performance machine learning foundation for logical reasoning and verification. Exp. Syst. Applic. 176 (2021), 114806.
[9]
Hadrien Bride, Jie Dong, Jin Song Dong, and Zhé Hóu. 2018. Towards dependable and explainable machine learning using automated reasoning. In Proceedings of the 20th International Conference on Formal Engineering Methods (ICFEM’18). 412–416.
[10]
Cheng-Hao Cai, Jing Sun, and Gillian Dobbie. 2019. Automatic B-model repair using model checking and machine learning. Autom. Softw. Eng. 26, 3 (2019), 653–704.
[11]
Johan de Kleer and Brian C. Williams. 1989. Diagnosis with behavioral modes. In Proceedings of the 11th International Joint Conference on Artificial Intelligence (IJCAI’89). 1324–1330.
[12]
Tom Fawcett. 2006. An introduction to ROC analysis. Patt. Recog. Lett. 27, 8 (2006), 861–874.
[13]
Luca Gazzola, Daniela Micucci, and Leonardo Mariani. 2019. Automatic software repair: A survey. IEEE Trans. Softw. Eng. 45, 1 (2019), 34–67.
[14]
Tin Kam Ho. 1995. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition (ICDAR’95). 278–282.
[15]
Sarah Hoffmann, Germain Haugou, Sophie Gabriele, and Lilian Burdy. 2007. The B-method for the construction of microkernel-based systems. In Proceedings of the 7th International Conference of B Users. 257–259.
[16]
Tao Ji, Liqian Chen, Xiaoguang Mao, and Xin Yi. 2016. Automated program repair by using similar code containing fix ingredients. In Proceedings of the 40th IEEE Annual Computer Software and Applications Conference (COMPSAC’16). 197–202.
[17]
Yalin Ke, Kathryn T. Stolee, Claire Le Goues, and Yuriy Brun. 2015. Repairing programs with semantic code search (T). In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE’15). 295–306.
[18]
Emanuel Kitzelmann. 2009. Inductive programming: A survey of program synthesis techniques. In Proceedings of the 3rd International Workshop on Approaches and Applications of Inductive Programming. 50–73.
[19]
Philipp Körner, Michael Leuschel, and Jannik Dunkelau. 2020. Towards a shared specification repository. In Proceedings of the 7th International Conference on Rigorous State-based Methods. 266–271.
[20]
Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. 2012. A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In Proceedings of the 34th International Conference on Software Engineering (ICSE’12). 3–13.
[21]
Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest, and Westley Weimer. 2012. GenProg: A generic method for automatic software repair. IEEE Trans. Softw. Eng. 38, 1 (2012), 54–72.
[22]
Michael Leuschel and Michael J. Butler. 2008. ProB: An automated analysis toolset for the B method. Int. J. Softw. Tools Technol. Transf. 10, 2 (2008), 185–203.
[23]
Michael Leuschel, Mareike Mutz, and Michelle Werth. 2020. Modelling and validating an automotive system in classical B and Event-B. In Proceedings of the 7th International Conference on Rigorous State-based Methods. 335–350.
[24]
Fan Long and Martin Rinard. 2015. Staged program repair with condition synthesis. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. 166–178.
[25]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake VanderPlas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Edouard Duchesnay. 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12 (2011), 2825–2830.
[26]
Yu Pei, Carlo A. Furia, Martin Nordio, Yi Wei, Bertrand Meyer, and Andreas Zeller. 2014. Automated fixing of programs with contracts. IEEE Trans. Softw. Eng. 40, 5 (2014), 427–449.
[27]
Ingo Pill and Thomas Quaritsch. 2013. Behavioral diagnosis of LTL specifications at operator level. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI). 1053–1059.
[28]
Yuhua Qi, Xiaoguang Mao, Yan Lei, Ziying Dai, and Chengsong Wang. 2014. The strength of random search on automated program repair. In Proceedings of the 36th International Conference on Software Engineering (ICSE’14). 254–265.
[29]
Stuart J. Russell and Peter Norvig. 2010. Artificial Intelligence - A Modern Approach (3rd International ed.). Pearson Education.
[30]
Björn Scheuermann. 2007. Reading between the Packets—Implicit Feedback in Wireless Multihop Networks. Ph. D. Dissertation. University of Düsseldorf, Germany.
[31]
Björn Scheuermann, Christian Lochert, and Martin Mauve. 2008. Implicit hop-by-hop congestion control in wireless multihop networks. Ad Hoc Netw. 6, 2 (2008), 260–286.
[32]
Joshua Schmidt, Sebastian Krings, and Michael Leuschel. 2016. Interactive model repair by synthesis. In Proceedings of the 5th International Conference on Abstract State Machines, Alloy, B, TLA, VDM, and Z. 303–307.
[33]
Joshua Schmidt, Sebastian Krings, and Michael Leuschel. 2018. Repair and generation of formal models using synthesis. In Proceedings of the 14th International Conference on Integrated Formal Methods. 346–366.
[34]
Alexey Smirnov and Tzi-cker Chiueh. 2007. Automatic patch generation for buffer overflow attacks. In Proceedings of the 3rd International Symposium on Information Assurance and Security (IAS’07). 165–170.
[35]
Joseph P. Turian, Lev-Arie Ratinov, and Yoshua Bengio. 2010. Word representations: A simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL). 384–394.
[36]
Shoujin Wang, Liang Hu, Yan Wang, Xiangnan He, Quan Z. Sheng, Mehmet A. Orgun, Longbing Cao, Francesco Ricci, and Philip S. Yu. 2021. Graph learning based recommender systems: A review. In Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI’21). 4644–4652.
[37]
Josh L. Wilkerson and Daniel R. Tauritz. 2010. Coevolutionary automated software correction. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO’10). 1391–1392.

Cited By

View all
  • (2024)Rate Lifting for Stochastic Process Algebra by Transition Context AugmentationACM Transactions on Modeling and Computer Simulation10.1145/365658234:3(1-30)Online publication date: 10-Jul-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Formal Aspects of Computing
Formal Aspects of Computing  Volume 34, Issue 2
June 2022
113 pages
ISSN:0934-5043
EISSN:1433-299X
DOI:10.1145/3557792
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 September 2022
Online AM: 14 May 2022
Accepted: 05 May 2022
Revised: 28 April 2022
Received: 21 February 2020
Published in FAC Volume 34, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. B-method
  2. model checking
  3. automated model repair
  4. repair evaluator training

Qualifiers

  • Research-article
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)310
  • Downloads (Last 6 weeks)40
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Rate Lifting for Stochastic Process Algebra by Transition Context AugmentationACM Transactions on Modeling and Computer Simulation10.1145/365658234:3(1-30)Online publication date: 10-Jul-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media