1 Introduction
Ever since processor speeds plateaued in the early 2000s, building high-performance systems has increasingly relied on concurrency. Writing concurrent programs, however, is notoriously error prone, as programmers must consider all possible thread interleavings. If a bug manifests on only one such interleaving, it is extremely hard to detect using traditional testing techniques, let alone to reproduce and repair. Formal verification provides an alternative: a way to guarantee that the program is completely free of such bugs.
This article presents Armada, a methodology, language, and tool that enables low-effort verification of high-performance, concurrent code. Armada’s contribution rests on three pillars:
flexibility for high performance;
automation to reduce manual effort; and an expressive, low-level framework that allows for
sound semantic extensibility. These three pillars let us achieve automated verification, with semantic extensibility, of concurrent C-like imperative code executed in a weak memory model (x86-TSO [
38]).
Prior work (see Section
8) has achieved some of these, but not simultaneously. For example, Iris [
27] supports powerful and sound semantic extensibility but focuses less on automation and C-like imperative code. Conversely, CIVL [
21], for instance, supports automation and imperative code without sound extensibility. Instead, it relies on paper proofs when using techniques such as reduction, and the CIVL team is continuously introducing new trusted tactics as they encounter new use cases [
40]. Recent work building larger verified concurrent systems [
6,
7,
18] supports sound extensibility but sacrifices flexibility and, thus, some potential for performance optimization, to reduce the burden of proof writing.
In contrast, Armadaachieves all three properties, which we now expand and discuss in greater detail:
Flexibility. To support high-performance code, Armadalets developers choose any memory layout and any synchronization primitives they need for high performance. Fixing on any one strategy for concurrency or memory management will inevitably rule out clever optimizations that developers come up with in practice. Hence, Armadauses a common low-level semantic framework that allows arbitrary flexibility akin to the flexibility provided by a C-like language. For example, it supports pointers to fields of objects and to elements of arrays, lock-free data structures, optimistic racy reads, and cache-friendly memory layouts. We enable such flexibility by using a small-step state-machine semantics rather than one that preserves structured program syntax but limits a priori the set of programs that can be verified.
Automation. However, actually writing programs as state machines is unpleasantly tedious. Hence, Armadaintroduces a higher-level syntax that lets developers write imperative programs that are automatically translated into state-machine semantics. To prove these programs correct, the developer then writes a series of increasingly simplified programs and proves that each is a sound abstraction of the previous program, eventually arriving at a simple, high-level specification for the system. To create these proofs, the Armadadeveloper simply annotates each level with the proof strategy necessary to support the refinement proof connecting it to the previous level. Armadathen analyzes both levels and automatically generates a lemma demonstrating that refinement holds. Typically, this lemma uses one of the libraries we have developed to support eight common concurrent-systems reasoning patterns (e.g., logical reasoning about memory regions, rely-guarantee, TSO elimination, and reduction). These lemmas are then verified by an SMT-powered theorem prover. Explicitly manifesting Armada’s lemmas lets developers perform lemma customization, that is, augmentations to lemmas in the rare cases in which the automatically generated lemmas are insufficient.
Sound semantic extensibility. Each of Armada’s proof-strategy libraries, and each proof generated by our tool, is mechanically proven to be correct. Insisting on verifying these proofs gives us the confidence to extend Armadawith arbitrary reasoning principles, including newly proposed approaches, without worrying that in the process we may undermine the soundness of our system. Note that inventing new reasoning principles is an explicit non-goal for Armada. Instead, we expect Armada’s flexible design to support new reasoning principles as they arise.
Our current implementation of Armadauses Dafny [
29] as a general-purpose theorem prover. Dafny’s SMT-based [
12] automated reasoning simplifies development of our proof libraries and developers’ lemma customizations. However, Armada’s broad structure and approach are compatible with any general-purpose theorem prover. We extend Dafny with a backend that produces C code compatible with ClightTSO [
46], which can then be compiled to an executable by CompCertTSO in a way that preserves Armada’s guarantees.
We evaluate Armadaon five case studies and show that it handles complex heap and concurrency reasoning with relatively little developer-supplied proof annotation. We also show that Armadaprograms can achieve performance comparable to that of unverified code.
In summary, this article makes the following contributions.
•
A flexible language for developing high-performance, verified, concurrent systems code.
•
A mechanically verified, extensible semantic framework that already supports a collection of eight verified libraries for performing refinement-based proofs, including region-based pointer reasoning, rely-guarantee, TSO elimination, and reduction.
•
A practical tool that uses these techniques to enable reasoning about complex concurrent programs with modest developer effort.
This article is an expanded version of an earlier paper [
33]. It contains more details about Armada, discusses a later version of the implementation (see Section
5), and describes an additional case study (see Section
6.3).
2 Overview
As shown in Figure
1, to use Armada, a developer writes an implementation program in the Armadalanguage. The developer also writes an imperative specification, which need not be performant or executable, in that language. This specification should be easy to read and understand so that others can determine (e.g., through inspection) whether it meets their expectations. Given these two programs, Armada’s goal is to prove that all finite behaviors of the implementation are permitted by the specification, that is, that the implementation
refines the specification. The developer defines what this means via a
refinement relation (
\(\mathcal {R}\)). For instance, if the state contains a console log, the refinement relation might be that the log in the implementation is a prefix of that in the spec.
Because of the large semantic gap between the implementation and specification, we do not attempt to directly prove refinement. Instead, the developer writes a series of N Armadaprograms to bridge the gap between the implementation (level 0) and the specification (level \(N+1\)). Each pair of adjacent levels \(i, i+1\) in this series should be similar enough to facilitate automatic generation of a refinement proof that respects \(\mathcal {R}\); the developer supplies a short proof recipe that gives Armadaenough information to automatically generate such a proof. Given the pairwise proofs, Armadaleverages refinement transitivity to prove that the implementation indeed refines the specification.
We formally express refinement properties and their proofs in the Dafny language [
29]. To formally describe what refinement means, Armadatranslates each program into its small-step state-machine semantics, expressed in Dafny. For instance, we represent the state of a program as a Dafny datatype and the set of its legal transitions as a Dafny predicate over pairs of states. To formally prove refinement between a pair of levels, we generate a Dafny lemma whose conclusion indicates a refinement relation between their state machines. We use Dafny to verify all proof material we generate so that ultimately the only aspect of Armadawe must trust is its translation of the implementation and specification into state machines.
2.1 Example Specification and Implementation
To introduce Armada, we describe its use on an example program that searches for a good, but not necessarily optimal, solution to an instance of the traveling salesman problem.
The specification, shown in Figure
2, demands that the implementation output a valid solution (i.e., one that visits every city exactly once), and it implicitly requires the program not to crash. Armadaspecifications can use powerful declarations as statements. Here, the
somehow statement expresses that somehow the program updates
s so that
valid_soln(s) holds.
The example implementation, also shown in Figure
2, creates 100 threads. Each thread searches through 10,000 random solutions. If a thread finds a solution shorter than the best length found so far, it updates the global variables storing the best length and solution. The main routine joins the threads and prints the best solution found.
Note that this example has a benign race: the access to the shared variable best_len in the first if (len < best_len). It is benign because the worst consequence of reading a stale value is that the thread unnecessarily acquires the mutex.
2.2 Example Proof Strategy
We bridge the gap between the implementation and specification by introducing intermediate levels that gradually simplify the implementation. Figure
3 depicts the program, called
ArbitraryGuard, at level 1 in our example proof. This program is like the implementation except that it
arbitrarily chooses whether to acquire the lock by using
* in place of the guard condition
len < best_len.
Our transformation of the
Implementation program to the
ArbitraryGuard program is an example of
weakening, in which a statement is replaced by one whose behaviors are a superset of the original. Or, more precisely, a state-transition relation is replaced by a superset of that relation. The two levels’ programs thus exhibit
weakening correspondence, that is, it is possible to map each low-level program step to an equivalent or weaker one in the high-level program. The proof that
Implementation refines
ArbitraryGuard is straightforward but tedious to write. Instead, the developer simply writes a recipe for this proof, shown in Figure
4. This recipe instructs Armadato generate a refinement proof using the weakening correspondence between the program pair.
Having removed the racy read of
best_len, we can now demonstrate an ownership invariant: that threads only access that variable while they hold the mutex and no two threads ever hold the mutex. This allows a further transformation of the program to the one shown in Figure
5. This replaces the assignment
best_len := len with
best_len ::= len, signifying the use of sequentially consistent memory semantics for the update rather than x86-TSO semantics [
38]. Since strong consistency is easier to reason about than weak-memory semantics, proofs for further levels will be easier.
Just as for weakening, Armadagenerates a proof of refinement between programs whose only transformation is a replacement of assignments to a variable with sequentially consistent assignments. For such a proof, the developer’s recipe supplies the variable name and the ownership predicate, as shown in Figure
6.
If the developer mistakenly requests a TSO-elimination proof for a pair of levels that do not permit it (e.g., if the first level still has the racy read and, thus, does not own the location when it accesses it), then Armadawill either generate an error message indicating the problem or generate an invalid proof. In the latter case, running the proof through the theorem prover (i.e., Dafny verifier) will produce an error message. For instance, it might indicate which statement may access the variable without satisfying the ownership predicate or which statement might cause two threads to simultaneously satisfy the ownership predicate.
4 Refinement Framework
Armada’s goals rely on our extensible framework for automatic generation of refinement proofs. The framework consists of:
Strategies. A strategy is a proof generator designed for a particular type of correspondence between a low-level and a high-level program. An example correspondence is weakening; two programs exhibit it if they match except for statements in which the high-level version admits a superset of behaviors of the low-level version.
Library. Our library of generic lemmas is useful in proving refinements between programs. Often, they are specific to a certain correspondence.
Recipes. The developer generates a refinement proof between two program levels by specifying a
recipe. A recipe specifies which strategy should generate the proof, and the names of the two program levels. A recipe can also include strategy-specific annotations; for instance, the variable-hiding strategy described in Section
4.2.8 takes as additional arguments the list of hidden variables. Figure
4 shows an example.
Verification experts can extend the framework with new strategies and library lemmas. Developers can leverage these new strategies via recipes. Armadaensures sound extensibility because for a proof to be considered valid, all of its lemmas and all of the lemmas in the library must be verified by Dafny. Hence, arbitrarily complex extensions can be accommodated. For instance, we need not worry about unsoundness or incorrect implementation of the Cohen-Lamport reduction logic we use in Section
4.2.1 or the rely-guarantee logic we use in Section
4.2.2.
Many of the proofs we must generate are so elaborate that automated verification would take too long to verify them or, worse, only verify them sometimes, leading to proof instability. Thus, we break each proof into modest-sized lemmas, with each lemma small enough to be amenable to automated verification. Generally, our approach is to use one lemma for each step or, in the case of two-step properties, each pair of steps. Furthermore, we enforce local reasoning by selectively revealing definitions of state machines so that only the related elements are visible to the prover (Section
4.1.2). Generating such a large number of lemmas would be unreasonably tedious for a human proof writer; thus, it is fortunate that we can use an automatic proof generator. This lets us obtain stable proofs with little human cost.
4.1 Aspects Common to All Strategies
Each strategy can leverage a set of Armadatools. For instance, we provide machinery to prove that developer-supplied inductive invariants are inductive and to produce a refinement function that maps low-level states to high-level states.
The most important generic proof technique we provide is non-determinism encapsulation. State-transition relations are non-deterministic because some program statements are non-deterministic; for example, a method call will set uninitialized stack variables to arbitrary values. Reasoning about such general relations is challenging. Thus, the state-machine translator encapsulates all non-deterministic parameters of each step in a step object and expresses the transition relation as a deterministic function \(\mathsf {NextState}\) that computes state \(i+1\) from state i and step object i. For instance, if a method M has an uninitialized stack variable x, then each step object corresponding to a call to M has a field newframe_x that stores x’s initial value. This way, the proof can reason about the low-level program using an annotated behavior, which consists of a sequence of states and a sequence of step objects. The relationship between pairs of adjacent states in such an annotated behavior is a deterministic function, making reasoning easier.
4.1.1 Regions.
To simplify proofs about pointers, we use region-based reasoning, in which memory locations (i.e., addresses) are assigned abstract region ids. Proving that two pointers are in different regions shows they are not aliased.
We carefully design our region reasoning to be automation friendly and compatible with any Armadastrategy. To assign regions to memory locations rather than rely on developer-supplied annotations, we use Steensgaard’s pointer analysis algorithm [
45]. Our implementation of Steensgaard’s algorithm begins by assigning distinct regions to all memory locations, then merges the regions of any two variables assigned to each other.
We perform region reasoning purely in Armada-generated proofs without requiring changes to the program or the state machine semantics. Hence, in the future, we can add more complex pointer analysis as needed.
We assign regions to memory locations through a proof construct called the region map. This is a map from addresses to a program-dependent RegionId datatype. For each pointer variable (i.e., variable with type ptr< T>), we generate a region invariant for that variable stating that any non-null address it holds is in the designated region and prove this invariant inductively.
To employ region-based reasoning, the developer simply adds use_regions to a recipe. Armadathen performs the static analysis described earlier, generates the pointer invariants, and generates lemmas to inductively prove the invariants. If regions are overkill and the proof only requires an invariant that all addresses of in-scope variables are valid and distinct, the developer instead adds use_address_invariant.
4.1.2 Selective State-Machine Revelation.
Armadagenerates a state machine tailored to the program to facilitate automated verification (Section
3.2.2). Sometimes, a specialized state machine still contains too much information to verify in a timely fashion. However, lemmas generated by Armadausually reason about only a single step, and the rest of the state machine is irrelevant to the verification. For example, when proving that a particular step obeys the ownership predicate in TSO elimination (Section
4.2.3), one does not care about any other steps.
To aid in automated verification, Armadalocalizes proof contexts to particular step(s). It hides state machine definitions using the opaque Dafny attribute and generates revelation lemmas specialized to individual steps. Because Armadagenerates individual lemmas to reason about specific step(s), it can deterministically decide which revelation lemma(s) to invoke.
Compared with alternative approaches such as using triggers and reliance on SMT solvers’ heuristics, selective revelation enforces better local reasoning and ensures that verification scales with program size.
4.1.3 Lemma Customization.
Occasionally, verification fails for programs that correspond properly, because an automatically generated lemma has insufficient annotation to guide Dafny. For instance, the developer may weaken y := x & 1 to y := x % 2 requires bit-vector reasoning. Thus, Armadalets the developer arbitrarily supplement an automatically generated lemma with additional developer-supplied lemmas (or lemma invocations).
When the developer specifies that a certain customization should be inserted into a specific lemma, Armadachooses an appropriate location in the lemma to insert it. That is, it inserts it after any generic lemma invocations that might be useful to the customization and after any revelations of relevant opaque definitions (Section
4.1.2).
It is useful for strategies to use predictable lemma names so that lemma-customization annotations can refer to them in a way that will not change with minor changes to the programs. The developer can label statements involved in tricky reasoning; our strategies will generate lemmas about those statements with predictable names using those labels. For instance, if the developer writes the statement label BitAnd: y := x & 1 in the method Foo and wants to weaken it to y := x % 2 belongs in the lemma lemma_LiftNext_Update_Foo_BitAnd that is generated by the weakening proof strategy. The developer may add an extra lemma invocation about bit-vectors and modulo on the variable x, which will be automatically inserted in lemma_LiftNext_Update_Foo_BitAnd. It will be inserted after any revelations of relevant opaque definitions and after any generic lemmas invoked by the weakening proof strategy.
Figure
11 shows, as an example, one of two lemma customizations used in a barrier implementation (see Section
6.1). In this case, the developer discovers that a specific automatically generated lemma does not verify. Thus, the developer writes a helper lemma demonstrating that if the barrier variable is non-zero in a thread’s local view then its store buffer must contain one or the barrier variable must contain one. They then use the keyword
extra to tell Armadato invoke certain extra proof text (an invocation of the helper lemma) within the automatically generated lemma. Armadainserts that text verbatim after various helpful lemma calls the strategy always includes, such as
lemma_GetThreadLocalViewAlwaysCommutesWithConvert. That way, the lemma customization can use the postconditions of those helpful lemmas.
Armada’s lemma customization contrasts with static checkers such as CIVL [
21]. The constraints on program correspondence imposed by a static checker must be restrictive enough to ensure soundness. If they are more restrictive than necessary, a developer cannot appeal to more complex reasoning to convince the checker to accept the correspondence. A static checker could, in theory, make up for this by providing a way for developers to supply proofs of specific properties. Our technique is even more general because it lets developers augment
any lemma generated by the proof-generation strategy, not just ones that the strategy’s authors considered as potentially in need of augmentation.
4.1.4 Intermediate Proof Levels.
To simplify the task of proving refinement between a low-level program L and a high-level program H, we use a multi-layered proof approach. That is, we generate intermediate state machines \(L^+\), \(L^A\), and \(H^A\) and prove that L refines \(L^+\), \(L^+\) refines \(L^A\), \(L^A\) refines \(H^A\), and \(H^A\) refines H. We then invoke a lemma showing transitivity of refinement to demonstrate that L refines H. We now describe each of these intermediate state machines in turn.
The first intermediate state machine,
\(L^+\), is
L augmented with auxiliary state. For instance, one piece of state in
\(L^+\) but not
L is the ID of the initial thread running
main. It is often convenient for a proof to refer to this ID, and this obviates the developer explicitly storing it in a ghost variable. Another type of auxiliary state is the region map discussed in Section
4.1.1.
The second intermediate state machine, \(L^A\), introduces atomic substeps to \(L^+\). Recall that the trusted state-machine semantics models block atomicity by forbidding a thread \(T^{\prime }\) from executing while a different thread T is inside an atomic block. It does this by modeling a state machine step as a sequence of substeps that cannot end in the middle of an atomic block (unless the program terminates there, e.g., due to undefined behavior). However, it still models T’s execution of the n instructions of an atomic block as n separate state-machine substeps. In \(L^A\), however, we model those n instructions as a single state-machine substep. This reduces the number of states and transitions that need to be reasoned about, simplifying proofs using this state machine.
A key advantage of using an atomic-substep state machine is that one can establish invariants by showing that each atomic block preserves them, rather than having to show that each individual instruction preserves them. Indeed, it allows one to use simple invariant predicates since those predicates do not have to account for states in the middle of atomic blocks. For instance, suppose that one introduces a ghost variable g with the intent of having some invariant relation between it and a real variable x. One can do this by adding an assignment to g immediately following each update to x and having this assignment be within the same atomic block. In this case, one may want to prove that the intended relation between g and x is an invariant without having to specify ungainly exceptions such as “except if the PC is at one of the points just before the assignment of g.” Using an atomic-substep state machine allows this.
One caveat in the use of atomic substeps is that, unlike the set of instruction steps, the set of atomic substeps is not necessarily finite. For instance, the code block leads to infinitely many atomic substeps: one that increments x zero times, one that increments x once, one that increments x twice, and so on. Having an infinite set of atomic substeps is problematic since for many proofs we automatically generate one lemma per step, which would result in an infinite number of lemmas.
For this reason, our atomic-substep state-machine generator sometimes breaks an atomic substep into pieces and models each piece as a separate state-machine step. Breaking atomic substeps into pieces means that if we want to prove an invariant, we have to show that it holds not only at all yield points but also at all non-yielding break points. We use as break points every loop head and method entry point that may be visited twice without an intervening yield point; this ensures that the set of steps is finite. For example, the head of the while (insufficient(x)) loop in the previous paragraph would be a break point since it can be visited twice without yielding.
The final intermediate state machine is \(H^A\), the atomic-substep state machine version of the high-level state machine H. We construct it from H analogously to how we construct \(L^A\) from \(L^+\).
For any state machine S, S and its atomic analog \(S^A\) are equivalent. That is, they describe exactly the same set of steps and, thus, the same set of behaviors. After all, every step of S completes any atomic block it starts; thus, it can be partitioned into sequences corresponding to atomic substeps from \(S^A\). Also, every atomic substep in \(S^A\) is composed of a sequence of substeps from S; thus, any step in \(S^A\) is a sequence of atomic substeps that can be broken down into a sequence of substeps in S. Because S and \(S^A\) are equivalent, they refine each other trivially: for every behavior of one, there exists a behavior of the other (the same behavior) such that corresponding states refine each other (because they are identical and the refinement relation is reflexive).
For this reason, proving refinement in either direction is fairly mechanical. For instance, to prove that
\(L^+\) refines
\(L^A\), we generate a proof that every step in
\(L^+\) can be partitioned into atomic substeps in
\(L^A\). (See Section
7 for an example.) Proving that
\(H^A\) refines
H is even simpler: we generate a proof that every atomic substep in
\(H^A\) can be divided into substeps in
H.
The proof we generate that L refines \(L^+\) is also fairly mechanical. Thus, the only part of the proof that is strategy specific is the refinement proof between \(L^A\) and \(H^A\). This means that the strategies, which we will discuss next, benefit from having to reason only about atomic substeps and by having the low-level state augmented by the auxiliary state.
4.2 Specific Strategies
Our current implementation has eight strategies for eight different correspondence types. We now describe them.
4.2.1 Reduction.
Because of the complexity of reasoning about all possible interleavings of statements in a concurrent program, a powerful simplification is to replace a sequence of statements with an atomic block. A classic technique for achieving this is reduction [
32], which shows that one program refines another if the low-level program has a sequence of statements
\(R_1, R_2, \ldots , R_n, N, L_1, L_2, \ldots , L_m\) while the high-level program replaces those statements with a single atomic action having the same effect. Each
\(R_i\) (
\(L_i\)) must be a right (left) mover, that is, a statement that commutes to the right (left) with any step of another thread.
An overly simplistic approach is to consider two programs to exhibit the reduction correspondence if they are equivalent except for a sequence of statements in the low-level program that corresponds to an atomic block with those statements as its body in the high-level program. This formulation would prevent us from considering cases in which the atomic blocks span loop iterations (e.g., Figure
12).
Instead, Armada’s approach to sound extensibility gives us the confidence to use a generalization of reduction, due to Cohen and Lamport [
10], that allows steps that do not necessarily correspond to consecutive statements in the program. It divides the states of the low-level program into a first phase (states following a right mover), a second phase (states preceding a left mover), and no phase (all other states). Programs may never pass directly from the second phase to the first phase, and for every sequence of steps starting and ending in no phase, there must be a step in the high-level program with the same aggregate effect.
Hence, our strategy considers two programs to exhibit the reduction correspondence if they are identical except that some yield points in the low-level program are not yield points in the high-level program. The strategy produces lemmas demonstrating that each Cohen-Lamport restriction is satisfied; for example, one lemma establishes that each step ending in the first phase commutes to the right with each other step. This requires generating many lemmas, one for each pair of steps of the low-level program in which the first step in that pair is a right mover.
Our use of encapsulated nondeterminism (Section
4.1) greatly aids the automatic generation of certain reduction lemmas. Specifically, we use it in each lemma showing that a mover commutes across another step, as follows. Suppose that we want to prove commutativity between a step
\(\sigma _i\) by thread
i that goes from
\(s_1\) to
\(s_2\) and a step
\(\sigma _j\) from thread
j that goes from
\(s_2\) to
\(s_3\). We must show that there exists an alternate-universe state
\(s^{\prime }_2\) such that a step from thread
j can take us from
\(s_1\) to
\(s^{\prime }_2\) and a step from thread
i can take us from
\(s^{\prime }_2\) to
\(s_3\). To demonstrate the existence of such an
\(s^{\prime }_2\), we must be able to automatically generate a proof that constructs such an
\(s^{\prime }_2\). Fortunately, our representation of a step encapsulates all non-determinism. Thus, it is straightforward to describe such an
\(s^{\prime }_2\) as
\(\mathsf {NextState}(s_1, \sigma _j)\). This simplifies proof generation significantly, as we do not need code that can construct alternative-universe intermediate states for arbitrary commutations. All we must do is emit lemmas hypothesizing that
\(\mathsf {NextState}(\mathsf {NextState}(s_1, \sigma _j), \sigma _i) = s_3\), with one lemma for each pair of step types. The automated theorem prover can typically dispatch these lemmas automatically.
4.2.2 Rely-Guarantee Reasoning.
Rely-guarantee reasoning [
22,
30] is a powerful technique for reasoning about concurrent programs using Hoare logic. Our framework’s generality lets us leverage this style of reasoning without relying on it as our
only means of reasoning. Furthermore, our level-based approach lets developers use such reasoning piecemeal. That is, they do not have to use rely-guarantee reasoning to establish
all invariants
all at once. Rather, they can establish some invariants and
cement them into their program, that is, add them as enabling conditions in one level so that higher levels can simply assume them.
Two programs exhibit the assume-introduction correspondence if they are identical except that the high-level program has additional enabling constraints on one or more statements. The correspondence requires that each added enabling constraint always holds in the low-level program at its corresponding program position.
Figure
13 gives an example using a variant of our running traveling-salesman example. In this variant, the correctness condition requires that we find the optimal solution. Thus, it is not reasonable to simply replace the guard with
* as we did in Figure
3. Instead, we want to justify the racy read of
best_len by arguing that the result it reads is conservative, that is, that at worst it is an overestimate of the best length so far. We represent this best length with the ghost variable
ghost_best and somehow establish that
best_len > = ghost_best is an invariant. We also establish that between steps of a single thread, the variable
ghost_best cannot increase; this is an example of a rely-guarantee predicate [
22]. Together, these establish that
t > = ghost_best always holds before the evaluation of the guard.
Benefits. The main benefit to using assume-introduction correspondence is that it adds enabling constraints to the program being reasoned about. More enabling constraints means fewer behaviors to be considered while locally reasoning about a step.
Another benefit is that it cements an invariant into the program. That is, it ensures that what is an invariant now will remain so even as further changes are made to the program as the developer abstracts it. For instance, after proving refinement of the example in Figure
13, the developer may produce a next-higher-level program by weakening the assignment
t := best_len to
t := *. This usefully eliminates the racy read to the variable
best_len but has the downside of eliminating the relationship between
t and the variable
best_len. However, now that we have cemented the invariant that
t > = ghost_best, we do not need this relationship any more. Now, instead of reasoning about a program that performs a racy read and then branches based on it, we reason only about a program that chooses an arbitrary value and then blocks forever if that value does not have the appropriate relationship to the rest of the state. Notice, however, that assume-introduction can be used only if this condition is already known to
always hold in the low-level program at this position. Therefore, assume-introduction never introduces any additional blocking in the low-level program.
Proof generation. The proof generator for this strategy uses rely-guarantee logic, letting the developer supply standard Hoare-style annotations. The developer may annotate each method with preconditions and postconditions, may annotate each loop with loop invariants, and may supply invariants and rely-guarantee predicates.
One can add some such annotations directly to the Armadacode in the high-level program using notation borrowed from Dafny. For instance, one can write a precondition on a method with a requires clause, a postcondition on a method with an ensures clause, a loop invariant on a loop with an invariant clause, or a rely-guarantee predicate with a yield_predicate clause.
However, some annotations cannot be expressed easily in the Armadalanguage and are best expressed using a Dafny expression referencing the state-machine representation of the state. Thus, one can add an annotation in the proof recipe by writing out that expression, referring to the current state as s, the next state (if applicable) as s', and the current thread as tid. This lets one describe a condition that always holds (chl_invariant), a condition that always holds before a specific instruction executes (chl_local_invariant), a rely-guarantee predicate (chl_yield_pred), a method pre- or postcondition (chl_precondition or chl_postcondition), or a loop invariant (chl_loop_modifies). For instance, if one wants to write that a precondition for calling method M is that every other thread’s store buffer is empty, one could write:
We use the word “modifies” in chl_loop_modifies because it can express not only an invariant that holds at the beginning of each loop body but also a restriction on how the state can change in the course of zero or more loop iterations. One can do this by referencing s^{\prime }, which refers to the state at the end of zero or more loop iterations.
Our strategy generates one lemma for each program path that starts at a method’s entry and makes no backward jumps. This is always a finite path set; thus, it only has to generate finitely many lemmas. Each such lemma establishes properties of a state machine that resembles the low-level program’s state machine but differs in the following ways. Only one thread ever executes and it starts at the beginning of a method. Calling another method simply causes the state to be havocked subject to its postconditions. Before evaluating the guard of a loop, the state changes arbitrarily subject to the loop invariants. Between program steps, the state changes arbitrarily subject to the rely-guarantee predicates and invariants.
The generated lemmas must establish that each step maintains invariants and rely-guarantee predicates, that method preconditions are satisfied before calls, that method postconditions are satisfied before method exits, and that loop invariants are reestablished before jumping back to loop heads. This requires several lemmas per path: one for each invariant, one to establish preconditions if the path ends in a method call, one to establish maintenance of the loop invariant if the path ends just before a jump back to a loop head, and so on. The strategy uses these lemmas to establish the conditions necessary to invoke a library lemma that proves properties of rely-guarantee logic.
The logic we use is complex; thus, the previous discussion should not be convincing that it is sound or that we have correctly implemented it. Fortunately, as discussed earlier, all Dafny code is verified. Thus, there is no need to trust the proof generator or our libraries.
4.2.3 TSO Elimination.
We observe that even in programs using sophisticated lock-free mechanisms, most variables are accessed via a simple ownership discipline (e.g., “always by the same thread” or “only while holding a certain lock”) that straightforwardly provides data race freedom (DRF) [
2]. It is well understood that x86-TSO behaves indistinguishably from sequential consistency under DRF [
5,
24]. Our level-based approach means that developers need not prove that they follow an ownership discipline for
all variables to get the benefit of reasoning about sequential consistency. In particular, Armadaallows a level at which the sophisticated variables use regular assignments and the simple variables use TSO-bypassing assignments. Indeed, developers need not even prove an ownership discipline for all such variables at once; they may find it simpler to reason about those variables one at a time or in batches. At each point, they can focus on proving an ownership discipline just for the specific variable(s) to which they are applying TSO elimination. As with any proof, if the developer makes a mistake (e.g., by not following the ownership discipline), Armadareports a proof failure.
A pair of programs exhibits the
TSO-elimination correspondence if all assignments to a set of locations
\(\mathcal {L}\) in the low-level program are replaced by TSO-bypassing assignments. Furthermore, the developer supplies an
ownership predicate (as in Figure
14) that specifies which thread (if any) owns each location in
\(\mathcal {L}\). It must be an invariant that no two threads own the same location at once and no thread can read or write a location in
\(\mathcal {L}\) unless it owns that location. Any step releasing ownership of a location must ensure the thread’s store buffer is empty, for example, by being a fence.
4.2.4 Weakening.
As discussed earlier, two programs exhibit the weakening correspondence if they match except for certain statements in which the high-level version admits a superset of behaviors of the low-level version. The strategy generates a lemma for each statement in the low-level program proving that, considered in isolation, it exhibits a subset of behaviors of the corresponding statement of the high-level program.
4.2.5 Non-deterministic Weakening.
A special case of weakening is when the high-level version of the state transition is non-deterministic, with that non-determinism expressed as an existentially quantified variable. For example, in Figure
4, the guard on an
if statement is replaced by the
* expression indicating non-deterministic choice. For simplicity of presentation, that figure shows the recipe invoking the weakening strategy. However, in practice, it would use
non-deterministic weakening.
Proving non-deterministic weakening requires demonstrating a witness for the existentially quantified variable. Our strategy uses various heuristics to identify this witness and generate the proof accordingly.
4.2.6 Combining.
Two programs exhibit the combining correspondence if they are identical except that an atomic block in the low-level program is replaced by a single statement in the high-level program that has a superset of its behaviors. This is analogous to weakening in that it replaces what appears to be a single statement (an atomic block) with a statement with a superset of behaviors. However, it differs subtly because our model for an atomic block is not a single step but rather a sequence of steps that cannot be interrupted by other threads.
The key lemma generated by the combining proof generator establishes that all paths from the beginning of the atomic block to the end of the atomic block exhibit behaviors permitted by the high-level statement. This involves breaking the proof into pieces, one for each path prefix that starts at the beginning of the atomic block and does not pass beyond the end of it.
4.2.7 Variable Introduction.
A pair of programs exhibits the variable-introduction correspondence if they differ only in that the high-level program has variables (and assignments to those variables) that do not appear in the low-level program. The high-level program may read the new variables only in the right-hand side of one of the new assignments.
Our strategy for variable introduction creates refinement proofs for program pairs exhibiting this correspondence. The main use of this is to introduce ghost variables that abstract the concrete state of the program. Ghost variables are easier to reason about because they can be arbitrary types and because they use sequentially consistent semantics.
Another benefit of ghost variables is that they can obviate concrete variables. Once the developer introduces enough ghost variables and establishes invariants linking the ghost variables to the concrete state, they can weaken the program logic that depends on concrete variables to depend on ghost variables instead. Once program logic no longer depends on a concrete variable, the developer can hide it, using the strategy described next.
4.2.8 Variable Hiding.
A pair of programs
\(\langle L, H\rangle\) exhibits the
variable-hiding correspondence if
\(\langle H, L\rangle\) exhibits the variable-introduction correspondence. In other words, the high-level program
H has fewer variables than the low-level program
L, and
L only uses those variables in assignments to them. Our variable-hiding strategy creates refinement proofs for program pairs exhibiting this correspondence. As alluded to in Section
4.2.7, this is useful to remove the concrete state — and, thus, program complexity —once ghost variables have taken their place in program logic.
6 Evaluation
To show Armada’s versatility, we evaluate it on the programs in Table
1. Our evaluations show that we can prove the correctness of programs not amenable to verification via ownership-based methodologies [
43], programs with pointer aliasing, shared counter between threads, lock implementations from previous frameworks [
17], and libraries of real-world high-performance data structures.
6.1 Barrier
The Barrier program includes a barrier implementation described by Schirmer and Cohen [
43]: “each processor has a flag that it exclusively writes (with volatile writes without any flushing) and other processors read, and each processor waits for all processors to set their flags before continuing past the barrier.” They give this as an example of what their ownership-based methodology for reasoning about TSO programs cannot support. Like other uses of Owens’s publication idiom [
37], this barrier is predicated on the allowance of races between writes and reads to the same location.
The key safety property is that each thread does its post-barrier write after all threads do their pre-barrier writes. We cannot use the TSO-elimination strategy since the program has data races; thus, we prove as follows. A first level uses variable introduction to add ghost variables representing initialization progress and which threads have performed their pre-barrier writes. A second level uses rely-guarantee to add an enabling condition on the post-barrier write that all pre-barrier writes are complete. This condition implies the safety property.
One author took \(\sim\)3 days to write the proof levels, mostly to write invariants and rely-guarantee predicates involving x86-TSO reasoning. Due to the complexity of this reasoning, the original recipe had many mistakes; output from verification failures aided discovery and repair.
The implementation is 53 SLOC. The first proof level uses 10 additional SLOC for new variables and assignments, and 5 SLOC for the recipe; Armadagenerates 22,715 SLOC of proof. The next level uses 38 additional SLOC for enabling conditions, loop invariants, preconditions, and postconditions; 109 SLOC of Dafny for lemma customization; and 101 further SLOC for the recipe, mostly for invariants and rely-guarantee predicates. Armadagenerates 98,003 SLOC of proof.
We use this program for an elaborated example of Armada’s use in Section
7.
6.2 Pointers
The Pointers program writes via distinct pointers of the same type. The correctness of our refinement depends on the static pointer analysis in our region-based reasoning (Section
4.1.1), proving that these different pointers do not alias. Specifically, we prove that the program assigning values via two pointers refines a program assigning those values in the opposite order. The automatic pointer analysis reveals that the pointers cannot alias and, thus, that the reversed assignments result in the same state. The program is 29 SLOC, the recipe is 13 SLOC, and Armadagenerates 6,997 SLOC of proof.
6.3 Counter
The Counter program is a variant of a program used by Owicki and Gries [
39] to illustrate the use of auxiliary variables to prove correctness of concurrent programs. It creates a global counter and increments it in two concurrent threads. Those threads acquire a lock before updating the counter.
Our proof establishes that, regardless of the execution order, the final counter always equals two. Following Owicki and Gries, we introduce an auxiliary ghost variable representing each thread’s contribution to the global counter value; we use the variable-introduction strategy for this key step. The complete proof uses most of our proof strategies in nine levels, including reduction, TSO elimination, and rely-guarantee reasoning. The implementation takes 50 SLOC, the recipes use 130 SLOC, and Armadagenerates 169,270 SLOC of proof.
6.4 MCSLock
The MCSLock program includes a lock implementation developed by Mellor-Crummey and Scott [
34]. It uses compare-and-swap instructions and fences for thread synchronization. It excels at fairness and cache-awareness by having threads spin on their own locations. We use it to demonstrate that our methodology allows modeling locks hand-built out of hardware primitives, as done for CertiKOS [
25].
Our proof establishes the safety property that statements between acquire and release can be reduced to an atomic block. For simplicity, we use a large array to simulate a dynamic linked list at the implementation level. This way, we can focus on proving the ownership of the lock. Our refinement proof uses six transformations, including the following two notable ones. The fifth transformation proves that both acquire and release properly maintain the ownership represented by ghost variables. For example, acquire secures ownership and release returns it. We prove this by introducing enabling conditions and annotating the program. The last transformation reduces statements between acquire and release into a single atomic block through reduction.
The implementation is 42 SLOC. Level 1 keeps 42 SLOC and uses 12 SLOC for its recipe. Level 2 adds 16 SLOC to the program and uses 4 SLOC for its recipe. Level 3 keeps 58 SLOC as level 2 and uses 8 SLOC for its recipe. Level 4 removes 3 SLOC from the program and uses 4 SLOC for its recipe. Level 5 adds 19 SLOC to the program and uses 145 SLOC for its recipe. Level 6 adds 2 SLOC to the program and uses 21 SLOC for its recipe. Levels 5 and 6 collectively use a further 457 SLOC for proof customization. In comparison, the authors of CertiKOS verify an MCS lock via concurrent certified abstraction layers [
25] using 3.2K LOC to prove the safety property.
6.5 Queue
The Queue program includes a lock-free queue from the liblfds library [
31,
35], used at AT&T, Red Hat, and Xen. We use it to show that Armadacan handle a practical, high-performance lock-free data structure.
Proof. Our goal is to prove that the enqueue and dequeue methods behave like abstract versions in which enqueue adds to the back of a sequence and dequeue removes the first entry of that sequence, as long as at most one thread of each type is active. Our proof introduces an abstract queue, uses an inductive invariant and weakening to show that logging using the implementation queue is equivalent to logging using the abstract queue, then hides the implementation. This leaves a simpler enqueue method that appends to a sequence and a dequeue method that removes and returns its first element.
It took \(\sim\)6 person-days to write the proof levels. Most of this work involved identifying the inductive invariant to support weakening of the logging using implementation variables to logging using the abstract queue.
The implementation is 70 SLOC. We use eight proof transformations, the fourth of which does the key weakening described in the previous paragraph. The first three proof transformations introduce the abstract queue using recipes with a total of 12 SLOC. The fourth transformation uses a recipe with 67 SLOC, including proof customization, and an external file with 522 SLOC to define an inductive invariant and helpful lemmas. The final four levels hide the implementation variables using recipes with a total of 16 SLOC, leading to a final layer with 46 SLOC. From all of our recipes, Armadagenerates 197,968 SLOC of proof.
Performance. We measure performance in Docker on a machine with an Intel Xeon E5-2687W CPU running at 3.10 GHz with 8 cores and 32 GB of memory. We use GCC 6.3.0 with
-O2 and CompCertTSO 1.13.8255. We use liblfds version 7.1.1 [
31]. We run (1,000 times) its built-in benchmark for evaluating queue performance, using queue size 512. The benchmark runs one thread that repeatedly enqueues and another thread that repeatedly dequeues from a shared queue, with 5 million operations of warmup followed by 50 million operations from which throughput is computed.
Our Armadaport of liblfds’s lock-free queue uses modulo operators, instead of bitmask operators, to avoid invoking bit-vector reasoning. To account for this, we also measure liblfds-modulo, a variant we write with the same modifications.
To account for the maturity difference between CompCertTSO and modern compilers, we also report results for the Armadacode compiled with GCC. Such compilation is not sound, since GCC does not necessarily conform to x86-TSO. We include these results only to give an idea of how much performance loss is due to using CompCertTSO. To constrain GCC’s optimizations and thereby make the comparison somewhat reasonable, we insert the same barriers liblfds uses before giving GCC our generated ClightTSO code.
Figure
15 shows our results. The Armadaversion compiled with CompCertTSO achieves 70% of the throughput of the liblfds version compiled with GCC. Most of this performance loss is due to the use of modulo operations rather than bitmasks and the use of a 2013-era compiler rather than a modern one. After all, when we remove these factors, we achieve virtually identical performance (99% of throughput). This is not surprising since the code is virtually identical.
7 Sample Proof
To more concretely illustrate the use of Armadato prove program properties, we now present a detailed discussion of the barrier program in Section
6.1 and the proof that it refines a version of the program annotated with extra ghost variables. Those ghost variables keep track of which threads have had their barrier entries initialized, whether all threads have had their barrier entries initialized, and which threads have passed the barrier.
As described in Section
3.1.3, the first thing the developer writes is the refinement relation
\(\mathcal {R}\). This constrains what it means for an implementation to refine a specification. Here, we use a log prefix relationship: an implementation state refines a specification state if the log of the implementation’s externally visible events so far is a prefix of that of the specification. In other words, the implementation’s externally visible behavior must always be a prefix of a behavior permitted by the specification.
Figure
16 gives the Armadacode describing this relation. The ghost variable
log describes a sequence of all of the values printed by the program. Because we consider the program stopping to be an externally visible event, our refinement relation requires more than just a prefix relationship for
log: if the implementation has stopped, then its
log must match that of the specification and it must have stopped for the same reason.
The next step for the developer is to write the implementation, as shown in Figure
17. The implementation level is marked as
:concrete to indicate that this level is intended to be compiled and run. The Armadatool ensures that the level uses only compilable features of the language. As an exception, the
print_uint32 method is an external method, meant to be provided by a trusted library. We model it as appending the printed value to the log, and implement it in C with:
The implementation of our barrier program works as follows. The main thread initializes the barrier array with zeros, then uses a memory-fence instruction fence to flush these initial zeros to all cores. It then creates worker threads, each of which writes to the console, sets its array element to 1, loops waiting for all elements of the array to be non-zero, then writes more to the console. Since the workers rely solely on x86-TSO semantics, they perform only normal assignments and no fence instructions.
Figure
18 shows the first level of the proof, level 1. Compared with the implementation level 0, it introduces several new ghost variables useful in proving the safety property:
•
The barrier_initialized array indicates, for each thread, whether its barrier entry has been initialized to 0.
•
The all_initialized Boolean indicates whether all threads have had their barrier entries initialized to 0.
•
The threads_past_barrier array indicates, for each thread, whether it has passed the barrier.
To prove this, the developer simply writes:
The Armadatool takes care of generating the proof.
The generated proof works as follows. As discussed in Section
4.1.4, it contains intermediate state machines
\(L^+\),
\(L^A\), and
\(H^A\) and a proof that
L refines
\(L^+\),
\(L^+\) refines
\(L^A\),
\(L^A\) refines
\(H^A\), and
\(H^A\) refines
H.
To prove that
L refines
\(L^+\), the proof makes use of a library
GenericArmadaPlus.i.dfy that defines the predicate
RequirementsForSpecRefinesPlusSpec and the lemma
lemma_SpecRefinesPlusSpec. The former describes the conditions under which the latter can establish that one specification refines the other because they’re related by “plus.” Figure
19 gives more detail about these elements of the generic proof library. Using this library in this case is quite straightforward, since the requirements can be easily proven by automated verification with little annotation. Figure
20 shows the Dafny code generated to invoke the library and thereby prove refinement.
We prove that \(L^+\) refines \(L^A\) similarly. The library LiftToAtomic.i.dfy defines RequirementsForLiftingToAtomic and lemma_SpecRefinesAtomicSpec. The former describes the conditions under which the latter can establish refinement.
The trickiest part of the proof is that \(L^A\) refines \(H^A\). It is tricky because the high-level program contains instructions (the introduced assignments) that do not exist in the low-level program. The library here requires us to prove that the atomic substeps in \(L^A\) are introducibly liftable to atomic substeps in \(H^A\). That is, whenever we take an atomic substep in \(L^A\), we can always either execute a corresponding substep in \(H^A\) (lifting the substep from the low level to the high level) or execute a substep that exists only in \(H^A\) (executing an introduced assignment).
Our formal specification for the introducible liftability condition is in Figure
21. It states that if: (1) the invariant
inv is satisfied in low-level state
ls, (2) the lifting relation
relation holds between
ls and high-level state
hs, and (3) one takes an atomic substep
lpath in
\(L^A\) from
ls to
ls^{\prime } (defined as
lasf.path_next(ls, lpath, tid)), then one of two conditions must hold. The first condition is that there exists a corresponding step
hpath in
\(H^A\) such that
relation(ls^{\prime }, hs^{\prime }) holds, where
hs^{\prime } is the result of taking step
hpath from
hs (i.e.,
hasf.path_next(hs, hpath, tid)). The other condition is that there exists an “introduced” step
hpath in
\(H^A\) such that
relation(ls, hs^{\prime }) holds and
hs^{\prime } is less than
hs by some given progress measure
progress_measure.
The invariant
inv, the state relation
relation, and the progress measure
progress_measure are arbitrary and customizable. That is, the caller of the library may set these as desired so long as
inv can be proven to be an invariant of the program and
relation implies
\(\mathcal {R}\). In our case, the developer has not expressed any custom invariants; thus, the invariant
inv that we generate is just the default “no thread has a null thread ID.” The
relation that we generate is
LiftingRelation, as given in Figure
22. Here, most of the code is boilerplate that we put in every variable-introduction proof. The exception is the predicate
IsReturnSite_H, which is customized to describe all of the return sites of this particular
\(H^A\) program. For
progress_measure, we always generate the code in Figure
23.
Proving the introducible liftability condition requires, for each step in
\(L^A\), a lemma ensuring that one of the two conditions holds. Figure
24 shows one such lemma for a particularly tricky point in
\(L^A\). Here, we consider the case that the level
\(L^A\) program is about to execute the
i := 0 instruction immediately following the
fence. There are three possibilities to consider. If the program at level
\(H^A\) is also immediately following the
fence, then we show that we can introduce an assignment to
threads_past_barrier and that this decreases the progress measure. If the program at level
\(H^A\) is just past that assignment, we instead show that we can introduce an assignment to
all_initialized and that this decreases the progress measure. The only other case is that the program at level
\(H^A\) has done both assignments, in which case we show that we can lift the assignment
i := 0 in
\(L^A\) to the corresponding assignment in
\(H^A\). This lemma, and the lemmas it depends on, are cumbersome and tedious to write. Thus, it is fortunate that we generate it all automatically.
Once we have proven that \(L^A\) refines \(H^A\), we prove that \(H^A\) refines H. The approach here is similar to showing that \(L^+\) refines \(L^A\): we must satisfy the requirements of a library dedicated to this type of proof.
Finally, we put it all together into a final lemma establishing that
L (in this case, the barrier implementation) refines
H (the level 1 program that introduces the three variables). This is done by invoking all of the lemmas we have built, plus one final lemma that establishes four-way transitivity of refinement. This final lemma, shown in Figure
25, demonstrates what we set out to prove: that
L refines
H using the given refinement relation
\(\mathcal {R}\).
8 Related Work
Concurrent separation logic [
36] is based on unique ownership of heap-allocated memory via locking. Recognizing the need to support flexible synchronization, many program logics inspired by concurrent separation logic have been developed to increase expressiveness [
11,
13,
14,
23,
28]. We are taking an alternative approach of refinement over small-step operational semantics that provides considerable flexibility at the cost of low-level modeling whose overhead we hope to overcome via proof automation.
CCAL and concurrent CertiKOS [
18,
19] propose
certified concurrent abstraction layers.
Cspec [
6] also uses layering to verify concurrent programs. Layering means that a system implementation is divided into layers, each built on top of the other, with each layer verified to conform to an API and specification assuming that the layer below conforms to its API and specification. Composition rules in CCAL ensure end-to-end termination-sensitive contextual refinement properties when the implementation layers are composed together. Armadadoes not (yet) support layers: all components of a program’s implementation must be included in level 0. Thus, Armadacurrently does not allow independent verification of one module whose specification is then used by another module. Also, Armadaproves properties about programs only while CCAL supports general composition, such as the combination of a verified operating system, thread library, and program. On the other hand, CCAL uses a strong memory model disallowing all data races, while Armadauses the x86-TSO memory model and, thus, can verify programs with benign races and lock-free data structures. As we shall discuss in Section
9, Armada’s level-based approach can be considered as a complement to abstraction layers.
Recent work [
7] uses the Iris framework [
27] to reason about a concurrent file system. Like CertiKOS and
Cspec, this approach involves a significant amount of manual effort, making developers write their code in a particular style that may limit both performance optimization opportunities and the ability to port existing code.
QED [
15] is the first verifier for functional correctness of concurrent programs to incorporate reduction for program transformation and to observe that weakening atomic actions can eliminate conflicts and enable further reduction arguments. CIVL [
21] extends and incorporates these ideas into a refinement-oriented program verifier based on the framework of layered concurrent programs [
26]. (Layers in CIVL correspond to levels in Armada, not layers in CertiKOS and
Cspec.) Armadaimproves upon CIVL by providing a flexible framework for soundly introducing new mechanically verified program transformation rules; CIVL’s rules are proven correct only on paper. CSim
\(^2\) [
42] is a verification framework that brings Hoare Logic reasoning and program refinement together. Like the assume-introduction strategy (Section
4.2.2) in Armada, CSim
\(^2\) uses rely-guarantee reasoning to achieve modular verification of concurrent programs. However, Armadasupports many other reasoning strategies, such as reduction and region-based reasoning. Armadais also extensible to new techniques that may be invented in the future.
Unlike Armada, VCC [
9] is designed to verify existing concurrent C code
in situ. Rather than using a simple, low-level state machine model based on x86-TSO memory semantics, as in Armada, VCC assumes a sequentially consistent memory model and bakes in a notion of ownership and object invariants for reasoning about aliasing. We soundly reconstruct some of these techniques in our reduction strategy (Section
4.2.1). Later VCC-related work [
43] proposes moving to a TSO memory model but maintaining ownership information in the ghost state. This brings them closer to Armada’s memory model, but because they rely on ownership as the primary concurrency reasoning tool, they cannot handle correct programs with benign races, for example, instances in which correctness relies on the fact that an address is written to only by a single writer, and the values written are monotonic. Our Barrier case study (Section
6.1) is one such example. It is unclear whether this proposed memory model was incorporated into the VCC tool.
9 Discussion and Future Work
In this section, we discuss the limitations of the current design and prototype of Armadaand suggest items for future work.
Armadacurrently supports the x86-TSO memory model [
38]. Thus, it is not directly applicable to other architectures, such as ARM and Power. Apart from the prevalence of the x86 architecture [
20], we believe that x86-TSO is a good first step as it illustrates how to account for weak memory models while still being simple enough to keep the proof complexity manageable. ARM and Power have even weaker memory models than x86-TSO, which will likely result in more complex invariants as the developer tries to establish refinement to a stronger semantics, like our TSO-bypassing assignments. While we expect the invariants to be more complicated, the principle behind such proofs is still the same: the developer must show some type of ownership of a variable in order to show refinement to the stronger semantics.
Armadacurrently supports verification of a whole program only in contrast to CCAL’s contextual-refinement-based layer system. We have plans to improve verification modularity by supporting modular verification against layer APIs, similar to those in CCAL. In CCAL,
\(L \vdash _{\mathcal {R}} M: H\) denotes that module
M running on top of underlay
L faithfully implements the specification in overlay
H. CCAL further supports composition of layers with composition rules such as the vertical composition rule.
CCAL advocates for dividing programs into tiny modules such that the verification of individual modules is straightforward, while complex behaviors of a whole system emerge as all of the layers compose together. For example, CertiKOS divides an allocation table implementation into layers, one implementing a getter/setter interface, one implementing page allocation logic, and one implementing a locking mechanism. To do this in Armada, one would include the entire implementation in level 0, then abstract parts of the program one at a time in the same order as CCAL.
Furthermore, it is worth pointing out that Armada’s level-based approach can be seen as a special case of CCAL’s layer calculus. If a specification is expressed in the form of a program, then Armada’s refinement between lower-level
L and higher-level
H with respect to refinement relation
\(\mathcal {R}\) can be expressed in the layer calculus as
\(L \vdash _{\mathcal {R}} \emptyset : H\). That is, without introducing any additional implementation in the higher layer, the specification can nevertheless be transformed between the underlay and overlay interfaces. Indeed, the authors of concurrent CertiKOS sometimes use such
\(\emptyset\)-implementation layers when a complex layer implementation cannot be further divided into smaller pieces [
19,
25]. This phenomenon manifests primarily in verification of locking mechanisms, in which fine-grained concurrency prevents division into smaller modules. Proofs of refinement for these modules are tedious and complicated. With support of modular verification, Armadamay help here with its rich set of concurrency-reasoning techniques and its level-based refinement framework.
Armadauses Dafny to verify all proof material that we generate. As such, the trusted computing base (TCB) of Armadaincludes not only the compiler and the code for extracting state machines from the implementation and specification but also the Dafny toolchain. This toolchain includes Dafny, Boogie [
3], Z3 [
12], and our script for invoking Dafny.
Armadauses the CompCertTSO compiler, whose semantics is similar, but not identical, to Armada’s. In particular, CompCertTSO represents memory as a collection of blocks, while Armadaadopts a hierarchical forest representation. Additionally, in CompCertTSO, the program is modeled as a composition of a number of state machines—one for each thread—alongside a TSO state machine that models global memory. Armada, on the other hand, models the program as a single-state machine that includes all threads and the global memory. We currently assume that the CompCertTSO model refines our own. It is future work to formally prove this by demonstrating an injective mapping between the memory locations and state transitions of the two models.
The current prototype of Armadauses Dafny’s finite sequences to reason about behaviors. As such, it can prove safety but not liveness properties. This is not a fundamental limitation, however. Instead of sequences, one can use Dafny’s imaps (infinite maps), to emulate an infinite sequence by mapping an integer i to the \(i^{th}\) state in a behavior. Even with such a change in place, though, we still expect liveness properties to be challenging to establish. Such properties have traditionally been difficult to prove, even in non-concurrent systems. In the Armadacontext, the developer would have to further contend with the complexity incurred by concurrency as well as having to show that wait_until statements eventually return.
Armadacurrently supports state transitions involving only the current state, not future states. Hence, Armadacan encode
history variables but not
prophecy variables [
1]. Note that Armadaproofs manifest the entire behavior as a finite sequence of states, thus allowing the proof to reason about future states, if needed. To fully support prophecy variables, however, Armadamust be expanded to allow invariants to be expressed over an entire behavior, rather than a single state, as the current prototype does.
Since we consider properties of single behaviors only, we cannot verify hyperproperties [
8]. However, we can verify safety properties that
imply hyperproperties, such as the unwinding conditions that Nickel uses to prove noninterference [
41,
44].
Finally, the current prototype of Armadarequires developers to manually define all layers, which requires an unfortunate amount of copy-paste overhead. We believe that much of this overhead can be avoided by reducing the number of layers and by making each layer more concise. To reduce the number of layers, one can introduce composite transformations, such as those used in CIVL [
21]. In CIVL, each layer transition defines five mini-transitions in a certain order. First, the tool performs reduction, followed by variable introduction. Then, it proves invariants and does variable hiding. Finally, it establishes new atomic action specifications. Such composite layer definitions could reduce the number of layers that the developer has to write. To further reduce the developer’s effort, layers can be defined more concisely by expressing them as a delta—for example, a patch—from the previous layer. This would allow layers to be introduced with very little developer effort while also enabling changes at an early layer to automatically propagate to all subsequent layers.