Keywords

figure a

1 Introduction

When developing concurrency libraries or operating system kernels, performance and scalability of the concurrency primitives is of paramount importance. These primitives rely on the synchronization guarantees of the underlying hardware and the programming language runtime environment. The formal semantics of these guarantees are often defined in terms of weak memory models. There is considerable interest in verification tools that take memory models into account [5, 9, 13, 22].

A successful approach to formalizing weak memory models is CAT [11, 12, 16], a flexible specification language in which all memory models considered so far can be expressed succinctly. CAT, together with its accompanying tool Herd [4], has been used to formalize the semantics not only of assembly for x86/TSO, Power, ARMv7 and ARMv8, but also high-level programming languages, such as C/C++, transactional memory extensions, and recently the Linux kernel concurrency primitives [11, 15, 16, 18, 20, 24, 29]. This success indicates the need for universal verification tools that are not limited to a specific memory model.

We present Dartagnan [3], a bounded model checker that takes memory models as inputs. Dartagnan expects a concurrent program annotated with an assertion and a memory model for which the verification should be conducted. It verifies the assertion on those executions of the program that are valid under the given memory model and returns a counterexample execution if the verification fails. As is typical of BMC, the verification results hold relative to an unrolling bound [21]. The encoding phase, however, is new. Not only the program but also its semantics as defined by the CAT model are translated into an SMT formula.

Having to take into account the semantics quickly leads to large encodings. To overcome this problem, Dartagnan implements a novel relation analysis, which can be understood as a static analysis of the program semantics as defined by the memory model. More precisely, CAT defines the program semantics in terms of relations between the events that may occur in an execution. Depending on constraints over these relations, an execution is considered valid or invalid. Relation analysis determines the pairs of events that may influence a constraint of the memory model. Any remaining pair can be dropped from the encoding. The analysis is compatible with optimized fixpoint encodings presented in [27, 28].

The second novelty is the support for advanced programming constructs. We redesigned Dartagnan ’s heap model, which now has pointers and arrays. Furthermore, we enriched the set of synchronization primitives, including read-modify-write and read-copy-update (RCU) instructions [26]. One motivation for this richer set of programming constructs is the Linux kernel memory model [15] that has recently been added to the kernel documentation [2]. This model has already been used by kernel developers to find bugs in and clarify details of the concurrency primitives. Since the model is expected to be refined with further development of the kernel, verification tools will need to quickly accommodate updates in the specification. So far, only Herd [4] has satisfied this requirement. Unfortunately, it is limited to fairly small programs (litmus tests). The present version of Dartagnan offers an alternative with substantially better performance.

We present experiments on a series of benchmarks consisting of 4751 Linux litmus tests and 7 mutual exclusion algorithms executed on TSO, ARM, and Linux. Despite the flexibility of taking memory models as inputs, Dartagnan ’s performance is comparable to CBMC [13] and considerably better than that of Nidhugg [5, 9]. Both are model-specific tools. Compared to the previous version of Dartagnan [28] and compared to Herd [4], we gain a speed-up of more than two orders of magnitude, thanks to the relation analysis.

Related Work. In terms of the verification task to be solved, the following tools are the closest to ours. CBMC [13] is a scalable bounded model checker supporting TSO, but not ARM. An earlier version also supported Power. Nidhugg [5, 9] is a stateless model checker supporting TSO, Power, and a subset of ARMv7. It is excellent for programs with a small number of executions. RCMC [22] implements a stateless model checking algorithm targeting C11. We cannot directly benchmark against it because the source code of the tool is not yet publicly available, nor do we fully support C11. Herd [4] is the only tool aside from ours that takes a CAT memory model as input. Herd does not scale well to programs with a large number of executions, including some of the Linux kernel tests. Other verification tasks (e.g., fence insertion to restore sequential consistency) are tackled by Memorax [6,7,8], offence  [14], Fender  [23], DFence [25], and trencher  [19].

Relation Analysis on an Example. Consider the program (in the .litmus format) given to the left in the figure below. The assertion asks whether there is a reachable state with final values \(\texttt {EBX} = 1, \texttt {ECX} = 0\). We analyze the program under the x86-TSO memory model shown below the program. The semantics of the program under TSO is a set of executions. An execution is a graph, similar to the one given below, where the nodes are events and the edges correspond to the relations defined by the memory model. Events are instances of instructions that access the shared memory: \(\textsf {R}\) (loads), \(\textsf {W}\) (stores, including initial stores), and \(\textsf {M}\) (the union of both). The atomic exchange instruction \(\texttt {xchg}\,[\texttt {x}], \texttt {EAX}\) gives rise to a pair of read and write events related by a (dashed) \(\textsf {rmw}\) edge. Such reads and writes belong to the set \(\textsf {A}\) of atomic read-modify-write events.

figure b

The relations \(\textsf {rf}\), \(\textsf {co}\), and \(\textsf {fr}\) model the communication of instructions via the shared memory (reading from a write, coherence, overwriting a read). Their restrictions \(\textsf {rfe}\), \(\textsf {coe}\), and \(\textsf {fre}\) denote (external) communication between instructions from different threads. Relation \(\textsf {po}\) is the program order within the same thread and \(\textsf {po-loc}\) is its restriction to events addressing the same memory location. Edges of \(\textsf {mfence}\) relate events separated by a fence. Further relations are derived from these base relations. To belong to the TSO semantics of the program, an execution has to satisfy the constraints of the memory model: \(\textsf {empty}\,\, \textsf {rmw} \cap (\textsf {fre}\,; \textsf {coe})\), which enforces atomicity of read-modify-write events, and the two acyclicity constraints.

Dartagnan encodes the semantics of the given program under the given memory model into an SMT formula. The problem is that each edge (ab) that may be present in a relation \(\textsf {r}\) gives rise to a variable \(\textsf {r}(a, b)\). The goal of our relation analysis is to reduce the number of edges that need to be encoded. We illustrate this on the constraint \(\textsf {acyclic}\,\, \textsf {ghb-tso}\). The graph next to the program shows the 14 (dotted and solid) edges which may contribute to the relation ghb-tso. Of those, only the 6 solid edges can occur in a cycle. The dotted edges can be dropped from the SMT encoding. Our relation analysis determines the solid edges—edges that may have an influence on a constraint of the memory model. Additionally, \(\textsf {ghb-tso}\) is a composition of various subrelations (e.g., \(\textsf {po-tso}\) or \(\textsf {co} \cup \textsf {fr}\)) that also require encoding into SMT. Relation analysis applies to subrelations as well. Applied to all constraints, it reduces the number of encoded edges for all (sub)relations from 221 to 58.

2 Input, Functionality, and Implementation

Dartagnan has the ambition of being widely applicable, from assembly over operating system code written in C/C++ to lock-free data structures. The tool accepts programs in PPC, x86, AArch64 assembly, and a subset of C11, all limited to the subsets supported by Herd’s .litmus format. It also reads our own .pts format with C11-like syntax [28]. We refer to global variables as memory locations and to local variables as registers. We support pointers, i.e., a register may hold the address of a location. Addresses and values are integers, and we allow the same arithmetic operations for addresses as for regular integer values. Different synchronization mechanisms are available, including variants of read-modify-write, various fences, and RCU instructions [26].

We support the assertion language of Herd. Assertions define inequalities over the values of registers and locations. They come with quantifiers over the reachable states that should satisfy the inequalities.

We use the CAT language [11, 12, 16] to define memory models. A memory model consists of named relations between events that may occur in an execution. Whether or not an execution is valid is defined by constraints over these relations:

CAT has a rich relational language, and we only show an excerpt above. So-called base relations \(\langle b \rangle \) model the control flow, data flow, and synchronization constraints. The language provides intuitive operators to derive further relations. One may define relations recursively by referencing named relations. Their semantics is the least fixpoint.

Dartagnan is invoked with two inputs: the program, annotated with an assertion over the final states, and the memory model. There are two optional parameters related to the verification. The SMT encoding technique for recursive relations is defined by mode chosen between \(\textsf {knastertarski}\) (default) and \(\textsf {idl}\) (see below). The parameter alias, chosen between \(\textsf {none}\) and \(\textsf {andersen}\) (default), defines whether to use an alias analysis for our relation analysis (cf. Sect. 3).

Being a bounded model checker, Dartagnan computes an unrolled program with conditionals but no loops. It encodes this acyclic program together with the memory model into an SMT formula and passes it to the Z3 solver. The formula has the form \(\psi _{ prog }\wedge \psi _{ assert }\wedge \psi _{ mm }\), where \(\psi _{ prog }\) encodes the program, \(\psi _{ assert }\) the assertion, and \(\psi _{ mm }\) the memory model. We elaborate on the encoding of the program and the memory model. The assertion is already given as a formula.

We model the heap by encoding a new memory location for each variable and a set of locations for each memory allocation of an array. Every location has an address encoded as an integer variable whose value is chosen by the solver. In an array, the locations are required to have consecutive addresses. Instances of instructions are modeled as events, most notably stores (to the shared memory) and loads (from the shared memory).

We encode relations by associating pairs of events with Boolean variables. Whether the pair \((e_1, e_2)\) is contained in relation \(\textsf {r}\) is indicated by the variable \(\textsf {r}(e_1, e_2)\). Encoding the relations \(\textsf {r}_{1}\cap \textsf {r}_{2}\), \(\textsf {r}_{1}\cup \textsf {r}_{2}\), \(\textsf {r}_{1}\ ;\ \textsf {r}_{2}\), \(\textsf {r}_{1}\setminus \textsf {r}_{2}\) and \(\textsf {r}^{-1}\) is straightforward [27]. For recursively defined and (reflexive and) transitive relations, Dartagnan lets the user choose between two methods for computing fixed points by setting the appropriate parameter. The integer-difference logic (IDL) method encodes a Kleene iteration by means of integer variables (one for each pair of events) representing the step in which the pair was added to the relation [27]. The Knaster-Tarski encoding simply looks for a post fixpoint. We have shown in [28] that this is sufficient for reachability analysis.

3 Relation Analysis

To optimize the size of the encoding (and the solving times), we found it essential to reduce the domains of the relations. We determine for each relation a static over-approximation of the pairs of events that may be in this relation. Even more, we restrict the relation to the set of pairs that may influence a constraint of the given memory model. These restricted sets are the relation analysis information (of the program relative to the memory model). Technically, we compute, for each relation \(\textsf {r}\), two sets of event pairs, \( M (\textsf {r})\) and \( A (\textsf {r})\). The former contains so-called may pairs, pairs of events that may be in relation \(\textsf {r}\). This does not yet take into account whether the may pairs occur in some constraint of the memory model. The active pairs \( A (\textsf {r})\) incorporate this information, and hence restrict the set of may pairs. As a consequence of the relation analysis, we only introduce Boolean variables \(\textsf {r}(e_1, e_2)\) for the pairs \((e_1, e_2)\in A (\textsf {r})\) to the SMT encoding.

The algorithm for constructing the may set and the active set is a fixpoint computation. What is unconventional is that the two sets propagate their information in different directions. For \( A (\textsf {r})\), the computation proceeds from the constraints and propagates information down the syntax tree of the CAT memory model. The sets \( M (\textsf {r})\) are computed bottom-up the syntax tree. Interestingly, in our implementation, we do not compute the full fixpoint but let the top-down process trigger the required bottom-up computation.

Both sets are computed as least solutions to a common system of inequalities. As we work over powerset lattices (relations are sets after all), the order of the system will be inclusion. We understand each set \( M (\textsf {r})\) and \( A (\textsf {r})\) as a variable, thereby identifying it with its least solution. To begin with, we give the definition for \( A (\textsf {r})\). In the base case, we have a relation \(\textsf {r}\) that occurs in a constraint of the memory model. The inequality is defined based on the shape of the constraint:

For the emptiness constraint, all pairs of events that may be contained in the relation are relevant. If the constraint requires irreflexivity, what matters are the pairs (ee). If the constraint requires acyclicity, we concentrate on the pairs \((e_1, e_2)\), where \((e_1, e_2)\) may be in relation \(\textsf {r}\) and \((e_2, e_1)\) may be in relation \(\textsf {r}^+\). Note how the definition of active pairs triggers the computation of may pairs.

If the relation in the constraint is a composed one, the following inequalities propagate the information about the active pairs down the syntax tree of the CAT memory model:

The definition maintains the invariant \( A (\textsf {r})\subseteq M (\textsf {r})\). If a pair \((e_1, e_2)\) is relevant to relation \(\textsf {r}=\textsf {r}_1^{-1}\), then \((e_2, e_1)\) will be relevant to \(\textsf {r}_1\). We do not have to intersect \( A (\textsf {r})^{-1}\) with \( M (\textsf {r})^{-1}\) because \( A (\textsf {r})\subseteq M (\textsf {r})\) ensures \( A (\textsf {r})^{-1}\subseteq M (\textsf {r})^{-1}\). We can avoid the intersection with the may pairs for the next case as well. There, \( A (\textsf {r})\subseteq M (\textsf {r})\) holds by the invariant and \( M (\textsf {r})= M (\textsf {r}_1)\cap M (\textsf {r}_2)\) by definition (see below). For union and the other case of subtraction, the intersection with \( M (\textsf {r}_1)\) is necessary. There are symmetric definitions for union and intersection for \(\textsf {r}_2\). For a relation \(\textsf {r}_1\) that occurs in a relational composition \(\textsf {r}=\textsf {r}_1;\textsf {r}_2\), the pairs \((e_1, e_3)\) become relevant if they may be composed with a pair \((e_3, e_2)\) in \(\textsf {r}_2\) to obtain a pair \((e_1, e_2)\) relevant to \(\textsf {r}\). Note that for \(\textsf {r}_2\) we again need the may pairs. The definition for \(\textsf {r}_2\) is similar. The definition for the (reflexive and) transitive closure follows the ideas for relational composition.

The definition of the may sets follows the syntax of the CAT memory model bottom-up. With \(\oplus \in \{\cup , \cap ,\ ;\}\) and \(\otimes \in \{+, *, -1\}\), we have:

$$\begin{aligned} M (\textsf {r}_1\oplus \textsf {r}_2)\;\supseteq \; M (\textsf {r}_1)\oplus M (\textsf {r}_2)\qquad M (\textsf {r}^{\otimes })\;\supseteq \; M (r)^{\otimes } \qquad M (\textsf {r}_1\setminus \textsf {r}_2)\;\supseteq \; M (\textsf {r}_1). \end{aligned}$$

This simply executes the operator of the relation on the corresponding may sets. Subtraction (\(\textsf {r}_1\setminus \textsf {r}_2 \)) is the exception, it is not sound to over-approximate \(\textsf {r}_2\).

At the bottom level, the may sets are determined by the base relations. They depend on the shape of the relations and the positions of the events in the control flow. The relations \(\textsf {loc}\), \(\textsf {co}\) and \(\textsf {rf}\) are concerned with memory accesses. What makes it difficult to approximate these relations is our support for pointers and pointer arithmetic. Without further information, we have to conservatively assume that a memory event may access any address. To improve the precision of the may sets for \(\textsf {loc}\), \(\textsf {co}\), and \(\textsf {rf}\), our fixpoint computation incorporates a may-alias analysis. We use a control-flow insensitive Andersen-style analysis [17]. It incurs only a small overhead and produces a close over-approximation of the may sets. The analysis returnsFootnote 1 a set of pairs of memory events \(\textit{PTS}\;\subseteq \; (\mathbb {W}\cup \mathbb {R}) \times (\mathbb {W}\cup \mathbb {R})\) such that every pair of events outside \(\textit{PTS}\) definitely accesses different addresses. Here, \(\mathbb {W}\) are the store events in the program and \(\mathbb {R}\) are the loads. Note that the analysis has to be control-flow insensitive as the given memory model may be very weak [10]. We have \( M (\textsf {loc}) \supseteq \textit{PTS}\). Similarly, \( M (\textsf {co})\) and \( M (\textsf {rf}) \) are defined by \(\textit{PTS}\) restricted to \((\mathbb {W}\times \mathbb {W})\) and \( (\mathbb {W}\times \mathbb {R})\), respectively.

We stress the importance of the alias analysis for our relation analysis: loc, co, and rf are frequently used as building blocks of composite relations. Excessive may sets will therefore negatively affect the over-approximations of virtually all relations in a memory model, and keep the overall encoding unnecessarily large.

Fig. 1.
figure 1

Impact of the unrolling bound (x-axis) on the verification time (y-axis).

Illustration. We illustrate the relation analysis on the example from the introduction. Consider constraint \(\textsf {acyclic}\,\, \textsf {ghb-tso}\). The computation of the active set for the relation \(\textsf {ghb-tso}\) triggers the calculation of the may set, following the inequality \( A (\textsf {ghb-tso})\supseteq M (\textsf {ghb-tso})\cap M (\textsf {ghb-tso}^+)^{-1}\). The may set is the union of the may sets for the subrelations, shown by colored (dotted and solid) edges. The intersection yields the edges that may lie on cycles of \(\textsf {ghb-tso}\). They are drawn in solid. These solid edges in \( A (\textsf {ghb-tso})\) are propagated down to the subrelations. For example, \( A (\textsf {po-tso}) \supseteq A (\textsf {ghb-tso})\cap M (\textsf {po-tso})\) yields the solid black edges.

4 Experiments

We compare Dartagnan to CBMC [13] and Nidhugg [5, 9], both model-specific tools, and to Herd [4, 16] and the Dartagnan FMCAD-18 version [3, 28] (without relation analysis), both taking CAT models as inputs. We also evaluate the impact of the alias analysis on the execution time.

Benchmarks. For CBMC, Nidhugg, and the FMCAD-18 Dartagnan, we evaluate the performance on 7 mutual exclusion benchmarks executed on TSO (all tools) and a subset of ARMv7 (only Nidhugg and Dartagnan). The results on Power are similar to those on ARM and thus omitted. We excluded Herd from this experiment since it did not scale even for small unrolling bounds [28]. We set a 5 min timeout for Parker, Dekker, and Peterson as this is sufficient to show the trends in the runtimes, and a 30 min timeout for the remaining benchmarks. To compare against Herd, and to evaluate the impact of the alias analysis, we run 4751 Linux kernel litmus tests (all tests from [1] without Linux spinlocks). The tests contain kernel primitives, such as RCU, on the Linux kernel model. We set a 30 min timeout.

Fig. 2.
figure 2

Execution times (logarithmic scale) on Linux kernel litmus tests: impact of alias analysis (left) and comparison against Herd (right).

Evaluation. The times for CBMC, Nidhugg -ARM, and the FMCAD-2018 version of Dartagnan grow exponentially for Parker (see Fig. 1). The growth in CBMC and FMCAD-2018 is due to the explosion of the encoding. For the latter, the solver runs out of memory with unrolling bounds 20 (TSO) and 10 (ARM). For Nidhugg -ARM, the tool explores many unnecessary executions. The verification times for Nidhugg -TSO and the current version of Dartagnan grow linearly. The latter is due to the relation analysis. For Peterson, the results are similar except for CBMC, which matches Dartagnan ’s performance.

For Dekker, Nidhugg outperforms both CBMC and Dartagnan. This is because the number of executions grows slowly compared to the explosion of the number of instructions. The executions in both memory models coincide, making the performance on ARM comparable to that on TSO for Nidhugg. The difference is due to the optimal exploration in TSO, but not in ARM. Relation analysis has some impact on the performance (see FMCAD-2018 vs. Dartagnan), but the encoding size still grows faster than the number of executions.

The benchmarks Burns, Bakery, and Lamport demonstrate the opposite trend: the number of executions grows much faster than the size of the encoding. Here, CBMC and Dartagnan outperform Nidhugg. Notice that for Burns, Nidhugg performs better on ARM than on TSO with unrolling bound 5. This is counter-intuitive since one expects more executions on ARM. Although the number of executions coincide, the exploration time is higher on TSO due to a different search algorithm. For Szymanski, similar results hold except for Dartagnan -ARM where the encoding grows exponentially.

Figure 2 (left) shows the verification times for the current version of Dartagnan with and without alias analysis. The alias analysis results in a speed-up of more than two orders of magnitude in benchmarks with several threads accessing up to 18 locations. Figure 2 (right) compares the performance of Dartagnan against Herd. We used the Knaster-Tarski encoding and alias analysis since they yield the best performance. Herd outperforms Dartagnan on small test instances (less than 1 s execution time). This is due to the JVM startup time and the preprocessing costs of Dartagnan. However, on large benchmarks, Herd times out while Dartagnan takes less than 10 s.