iProverFootnote 1 [10] is an automated theorem prover for first-order logic. It is a saturation prover, and is based primarily on the Inst-Gen calculus [7], but also implements resolution and supports running them in combination in an abstraction-refinement framework [9, 12]. In this work we detail how iProver was extended with support for the superposition calculus.

Currently, iProver deals with equality axiomatically, which can be inefficient for problems heavy on equality. At the same time, the superposition calculus is a set of complete inference rules specialised for first-order logic with equality. It complements the instantiation calculus since it is effective on problems where instantiation struggles, and vice-versa. We show that running the two calculi in combination yields better results than either plain instantiation or plain superposition.

Rules in a calculus can be classified as “generating”, if they derive new clauses, or “simplifying”, if some premise gets deleted. While generating rules are the ones necessary for completeness of a calculus, simplification rules are crucial for practical performance. Intuitively, simplification rules are beneficial to taming the growth of the search space as more clauses get generated. However, the computation of those simplifications itself takes time, so being too eager in applying them will also grind the prover to a halt. It is an open problem, what the optimal strategy to balance these conflicting requirements is, and although there is a huge amount of flexibility in how to perform simplifications (see e.g., [21]), most provers are rather restrictive about this. In iProver we developed a flexible simplification setup that subsumes and generalises most common architectures.

Finally, we have also implemented specialised techniques to deal with associative-commutative (AC) theories. These are theories of great interest which arise in several domains [20], and are traditionally problematic for theorem provers to deal with, due to combinatorial explosion and the non-orientability of the AC axioms.

The paper is structured as follows: first, we give a quick overview of the architecture of iProver. Then, we describe the implementation of superposition in iProver, its modifications, the simplification architecture and given clause loop, and AC reasoning rules. For a more in-depth description of basic features of iProver, see [9].

1 Overview

iProver is based on the Inst-Gen calculus, which is based on the following idea. We approximate the first-order problem to a propositional problem, and submit it to a black box SAT solver. It either finds an inconsistency, which is also an inconsistency at the first-order level, or else it returns a model, which guides the instantiation of new clauses whose abstraction witnesses some inconsistency at the ground level. If no such instantiation exists, then the problem is satisfiable.

The SAT solver is also used to implement “global” simplification rules (in the sense that they involve reasoning with the clause set which is shared between different calculi), such as global propositional subsumption [9]. In a nutshell, we submit ground abstractions of clauses in S to a set \(S_{gr}\). If the SAT solver finds that \(S_{gr}\) propositionally implies \(D\gamma \), with D a strict subset of C and \(\gamma \) an injective substitution of variables to fresh constants, then we can replace \(C\theta \), in S, by D.

(1)

As mentioned before, iProver can also run other calculi. This is beneficial because (i) some problems are solved easily by one strategy and not by others, and (ii) clauses derived in e.g. resolution can be passed to the instantiation solver to participate in simplifications. For example, clauses derived in all calculi are submitted to a shared global propositional subsumption set, which is in turn used by all calculi to simplify its clauses.

Schematically, the high-level architecture of iProver is summarised in Fig. 1. We can view it as a modular architecture where each calculus (Inst-Gen, resolution, superposition) runs its own saturation loop, and can (i) query external SAT and SMT solvers, and (ii) submit clauses to, and retrieve clauses from, the ‘Exchange’ module. The instantiation and resolution modules are discussed in [9]. Here we will focus on the superposition calculus.

Fig. 1.
figure 1

Architecture of iProver.

2 Extension with Superposition

The superposition inference system consists of the following rules [15]:Footnote 2

(2)
(3)
(4)

We assume that \(\prec \) is a simplification ordering. Non-equality predicates P(t) are encoded as \(P(t) \mathbin {\approx }\top \). The calculus can easily be generalised to the many-sorted case, which iProver uses even in untyped problems, since it can perform subtype inference during preprocessing. Superposition is sound and refutationally complete for first-order logic with equality (see [2, 15]) and implemented in a number of state-of-the-art theorem provers: Vampire [11], E [19] and SPASS [22]. Currently, iProver uses non-perfect discrimination trees to find unification candidates efficiently [8, 16]. For the literal selection, to ensure completeness, we must select either a negative literal, or all maximal literals. In iProver we use a variant of the Knuth-Bendix ordering which prioritises non-equational literals.

Simultaneous Superposition. In (2), by t[s] and \(t[s \mapsto r]\) we can mean resp. “a distinguished occurrence of s as a subterm of t” and “replacing that subterm at that position by r. We call the variant simultaneous superposition where we mean instead “replacing all occurrences of s in t by r”. This variant is still refutationally complete [3]. In cases where there are several occurrences of the same term, as in , this avoids producing \({{2^{n}}-1}\) intermediate clauses with \(f(r,s,\dots ,s)\), \(f(s,r,\dots ,s)\), \(f(r,r,\dots ,s)\), etc., instead producing only \(f(r,r,r,\dots ,r)\). In iProver, we implement this variant of superposition.

2.1 Simplifications

Apart from the generating inferences, necessary for completeness, we can add simplification inferences. In our implementation, we use the following rules: tautology deletion, syntactic equality resolution, subsumption, subset subsumption, subsumption resolution, demodulation and global subsumption [9, 11, 17, 22].

Light Normalisation. In addition, we introduce the following rule:

(5)

where \(R\) is a set of interreduced wrt. (5) oriented rewrite rules. It can be seen as a restricted case of the demodulation rule, but it is advantageous to formulate this separately because it may be implemented much more efficiently than demodulation, by simply looking up terms in a hashtable, rather than having to do matching with variable instantiations.

A light normalisation index consists of (i) a hashtable that indexes rewrite rules in \(R\) by their left-hand sides for forward light normalisations and (ii) a map of all subterms in \(R\) for keeping \(R\) interreduced. When we derive a unit equality, we first normalise it wrt. R by recursively replacing each subterm by its normal form wrt. R. Then, if the simplified equality is orientable (wrt. \(\prec \)) we use it to normalise rules in R add it to R. If there is a conflict between two rules \(t\mathbin {\rightarrow }s\) and \(t\mathbin {\rightarrow }u\) where \(t\succ s \succ u\), we keep the rules \(t \mathbin {\rightarrow }u\) and \(s \mathbin {\rightarrow }u\). If s and u are incomparable wrt. \(\succ \) we keep one of the rules in R. Since R is only used for simplifications this choice does not affect the completeness. In general, we can restrict which orientable equations we add to \(R\) (e.g. only ground ones, or small in size).

2.2 Given Clause Algorithm

In a standard given clause loop, the clause set is split into an active set, where inferences among the clauses have been performed, and a passive set, of clauses waiting to participate in inferences. Clauses are initially added to the passive, then in each iteration one given clause is picked from the passive set, added to the active set, and all inferences between given and active are performed. Newly derived clauses are pushed into the passive. The loop finishes when all clauses have been moved to the active set, meaning the initial set is satisfiable, or when a contradiction is derived.

Immediate Simplification Set. Next, we introduce the idea of immediate simplification. The intuition is as follows. Clauses that are derived in each loop are “related” to each other. It may be beneficial to keep the set of immediate conclusions inter-simplified. Also, throughout the execution of the program the set of generated clauses in each loop remains small compared to the set of passive or active clauses. Therefore, we can get away with applying more expensive rules that we do not necessarily want to apply on the set of all clauses (e.g. only “light” simplifications between newly derived clauses and passive clauses, but more expensive “full” simplifications among newly derived clauses). Finally, during this process, it is possible that the given clause itself becomes redundant (e.g. subsumed by one of its children). If this happens, we can add only the clauses responsible for making it redundant to the passive set, then remove the given clause from the active set, and throw away the rest of the iteration’s newly generated clauses, abort the rest of the iteration, and proceed to the next given clause. In some problems, a significant number of iterations may be aborted, which means that fewer new clauses are added to the passive queue, and that we avoid the work of computing those inferences. This can be seen as a variant of orphan elimination [17]. Even when the given clause is not eliminated it is often beneficial to extensively inter-simplify immediate descendants of the given clause.

Simplification Setup. How all these simplifications are performed can greatly impact the performance of the solver, so care is needed, and tuning this part of the solver can pay off significantly. There is a significant amount of choice in how to perform simplifications. We can choose which simplifications to perform, and at what times, and with respect to which clauses. Additionally, some of these simplifications require auxiliary data structures (here referred to generally as “indices”) to be done efficiently, and some indices support several simplification rules. Therefore we also need to choose which clauses to add to which indices at which stages.

For example, Otter-style loops [14] perform simplifications on clauses before adding them to the passive set. The problem with this is that the passive set is often orders of magnitude larger than the active set, therefore performance will degrade significantly as this set grows, and the system will spend most of its time performing simplifications on clauses that may not even end up being used. On the other hand, DISCOUNT-style loops [4] perform simplifications only with clauses that have been added to the active set. This has the benefit of reducing the time spent in simplifications, at the cost of potentially missing many valuable simplifications wrt. passive clauses. It is not clear where the “sweet spot” is, in terms of these setups.

It is possible, for example, to choose to apply only “cheap” simplifications to the full active + passive set (e.g. subset subsumption, and light normalisation), and use more expensive ones only on the small active set (e.g. full subsumption and demodulation). In Listing 1.1 we describe the ‘iProver-Sup’ given-clause saturation loop for superposition. A simplification set consists of a collection of indices, each of which supports one or more simplification rules. In our given clause loop we have four such sets: \(S_\text {passive}\), to which we add the clauses added to the passive set, \(S_\text {active}\), for the clauses in the active set, \(S_\text {immed}\) for newly derived clauses (this set is cleared at the end of every given-clause iteration, and the non-redundant clauses added to the passive queue), and \(S_\text {input}\) for preprocessing input clauses. Each set supports the following operations: add, which adds a clause to all indices in a set S, and simplify, which simplifies via some rules R wrt. a set S. These are called at several points in the loop (see Listing 1.1 ), and the user can configure which indices/rules are involved in each operation. When simplifying, some rules forward simplify the clause wrt. the existing set, and others backward simplify the clauses in the set wrt. the new clause.

The simplification setup is specified by the rules to apply at each stage (\(R_x\)). and the indices to which to add at each stage (\(S_x\)). These can be specified by the user via command-line options. \(S_x\) are lists of indices from . \(R_x\) are lists of rules from Their usage is documented in the command-line help. The default options are presented in Table 1.

Currently iProver uses non-perfect discrimination trees for implementing backward and forward demodulation [8, 16], feature vector indices for subsumption [18], tries for subset subsumption [8], and MiniSat [6] for global subsumption.

Generally, when during immediate simplification a parent clause of a newly derived clause is made redundant, we can remove all the children of that clause from the immediate set (and thus avoid adding them to the passive queues), except for the ones which caused it to be redundant. Currently, we restrict this feature to the given clause rather than to all the parent clauses, therefore, this simplifies to checking whether the given clause is made redundant in \(S_\text {immed}\), and if so abort the loop, add only the clauses that made it redundant to the passive, and remove the given clause.

2.3 AC Reasoning

If a problem contains associativity and commutativity axioms,

$$\begin{aligned} f(x,f(y,z))&= f(f(x,y),z) \,,&f(x,y)&= f(y,x) \,, \end{aligned}$$
(6)

then f is said to be AC.

Table 1. Default simplification options

AC axioms are particularly problematic in theorem proving, because they are non-orientable, which means they can generate permutations of arguments of AC functions. This leads to combinatorial explosion in the number of clauses. In particular, they will combine with each other to produce an exponential number of instances.

figure a

AC problems are ubiquitous and appear in a variety of domains [20]. Although theoretical developments behind AC reasoning have a long history, AC support in most theorem provers is limited due to implementation complexity and is mainly restricted to unit equality problems. We extended some of the techniques to be applicable to the general clausal case, see Theorem 1 below, and implemented them in iProver.

AC Preprocessing. During preprocessing we can transform the input problem into any equisatisfiable form. We can normalise AC terms, by e.g. collecting nested AC subterms into a flat list, sorting wrt. some total extension of the term ordering, and making them right-associative.

Deletion of Joinable Equations. A rewrite system is a set of rules \(l\mathbin {\rightarrow }r\), such that, if \(l\mathbin {\rightarrow }r\), then for any substitution \(\sigma \), \(l\sigma \mathbin {\rightarrow }r\sigma \), and for any term u, \(u[l] \rightarrow u[l\mapsto r]\). By abuse of notation we can also use unorientable equalities \(l \mathbin {\leftrightarrow } r\), in which case they stand for the set of its orientable instances, \(\{l\sigma \mathbin {\rightarrow }r\sigma \mid l\sigma \succ r\sigma \} \cup \{r\sigma \mathbin {\rightarrow }l\sigma \mid r\sigma \succ l\sigma \}\). Two terms s and t are joinable wrt. a rewrite system R (written \(s \mathrel {\downarrow } _Rt\)) if \(s \overset{\star }{\rightarrow }c \overset{\star }{\leftarrow }t\), where ‘\(\rightarrow \)’ denotes a rewrite step with a rule in R and ‘\(\overset{\star }{\rightarrow }\)’ its reflexive-transitive closure. Two terms are ground joinable (\(s \mathrel {\Downarrow } _Rt\)) if all its ground instances are joinable. Two terms are strongly ground joinable (written \(s \mathrel {\Downarrow _\triangleright } _Rt\)) if, for all \(s'=s\sigma \), \(t'=t\sigma \) ground instances of st resp., with \(s' \succeq t'\), either \(s'\) is \(t'\) or else \(s' \mathrel {\xrightarrow {l \mathbin {\rightarrow } r \in R}} u' \downarrow _R t'\) where either \(l \prec s'\) or l is \(s'\) but not \(u' \succ t'\) (see [1, 13]).

Theorem 1

If \(s \mathrel {\Downarrow _\triangleright } _Rt\), then \(s \mathbin {\approx }t \vee C\) is redundant wrt. R. If \(s \mathrel {\Downarrow } _Rt\) then \(s \mathbin {\not \approx }t \vee C\) is redundant wrt. \(R \cup \{C\}\).

Theorem 1 was shown in the context of unit equality reasoning in [1]; we extended this theorem to general clauses and provided a different proof [5].

This abstract theorem can be used for AC reasoning, provided we have a criterion to test \(l \mathrel {\Downarrow _\triangleright } r\). We use the following criterion [1]. Let \(R_\text {AC}\) be

$$\begin{aligned} f(x,y)&\leftrightarrow f(y,x) \,, \end{aligned}$$
(7a)
$$\begin{aligned} f(f(x,y),z)&\rightarrow f(x,f(y,z)) \,, \end{aligned}$$
(7b)
$$\begin{aligned} f(x,f(y,z))&\leftrightarrow f(y,f(x,z)) \,, \end{aligned}$$
(7c)

Unless is an instance of \(R_\text {AC}\) or can be simplified by an equation in \(R_\text {AC}\), \(l =_\text {AC} r\) implies \(l \mathrel {\Downarrow _\triangleright } _{R_\text {AC}}r\), which means we can use Theorem 1 to simplify/delete clauses wrt. \(R_\text {AC}\). This is a cheap test for strong ground joinability to apply in practice, since in order to check whether \(s =_\text {AC} t\) we can simply treat nested applications of f as a flat-list, and then sort wrt. some total order on terms (see above discussion on AC normalisation).

Semantic Detection of Axioms. Some problems are AC even though the input does not contain the axioms explicitly. We say that a problem S is AC if \(S\,\models \,\text {AC}\). The usual syntactic detection checks if \(\text {AC} \in S\). But we wish also to detect AC problems even when this is not the case.

During preprocessing, we query an SMT solver to find out whether \(S \,\models \, \text {AC}\). Since SMT solvers only accept ground problems, we need to use a sound approximation of the entailment relation. We do this using an injective substitution mapping variables to fresh constants similar as it is done for global subsumption [9]. This is a sound approximation, since \(\phi (\bar{c}) \,\models \, \psi (\bar{c}) \,\Rightarrow \, \forall x \phi (\bar{x}) \,\models \, \forall x \psi (\bar{x})\). In order to make SMT reasoning more efficient we can further restrict reasoning to fast rules like unit propagation or place a limit on the number of backtracks. Apart from this, we also check if the AC axioms (6) get produced at some point during saturation, among binary symbols of sort \(\alpha \times \alpha \rightarrow \alpha \).

3 Implementation and Experimental Results

We integrated the simultaneous superposition calculus, with the iProver-Sup saturation loop, into iProver and evaluated it over 15168 first-order problems in TPTP-v7.2.0. The superposition loop can solve 7375 (49%), the instantiation loop (on the previous version of iProver) can solve 7 884 (52%), and their combination can solve 8708 (57%). We can see that the combination with superposition and the iProver-Sup simplification setup improved the performance of iProver over the whole TPTP library.

Among problems that were solved by superposition, (excluding trivial problems solved by preprocessing), immediate simplification was used in 71.7 % of problems and light normalisation was used in 64.5 % of problems. AC axioms were detected in 1903 problems.