Keywords

1 Introduction

We study the subset-sum problem, also known as knapsack problem: given n integers \(\mathbf {a} = (a_1, \ldots a_n)\), and a target integer S, find an n-bit vector \(\mathbf {e} = (e_1, \ldots e_n) \in \{0,1\}^n\) such that \(\mathbf {e} \cdot \mathbf {a} = \sum _i e_i a_i = S\). The density of the knapsack instance is defined as \(d = n/(\log _2 \max _i a_i)\), and for a random instance \(\mathbf {a}\), it is related to the number of solutions that one can expect.

The decision version of the knapsack problem is NP-complete  [16]. Although certain densities admit efficient algorithms, related to lattice reduction  [27, 28], the best algorithms known for the knapsack problem when the density is close to 1 are exponential-time, which is why we name these instances “hard” knapsacks. This problem underlies some cryptographic schemes aiming at post-quantum security (see e.g.  [29]), and is used as a building block in some quantum hidden shift algorithms  [7], which have some applications in quantum cryptanalysis of isogeny-based  [11] and symmetric cryptographic schemes  [9].

In this paper, we focus on the case where \(d = 1\), where expectedly a single solution exists. Instead of naively looking for the solution \(\mathbf {e}\) via exhaustive search, in time \(2^n\), Horowitz and Sahni  [20] proposed to use a meet-in-the-middle approach in \(2^{n/2}\) time and memory. The idea is to find a collision between two lists of \(2^{n/2}\) subknapsacks, i.e. to merge these two lists for a single solution. Schroeppel and Shamir  [37] later improved this to a 4-list merge, in which the memory complexity can be reduced down to \(2^{n/4}\).

The Representation Technique. At EUROCRYPT 2010, Howgrave-Graham and Joux  [21] (HGJ) proposed a heuristic algorithm solving random subset-sum instances in time \( \widetilde{\mathcal {O}} \left( 2^{0.337 n} \right) \), thereby breaking the \(2^{n/2}\) bound. Their key idea was to represent the knapsack solution ambiguously as a sum of vectors in \(\{0,1\}^n\). This representation technique increases the search space size, allowing to merge more lists, with new arbitrary constraints, thereby allowing for a more time-efficient algorithm. The time complexity exponent is obtained by numerical optimization of the list sizes and constraints, assuming that the individual elements obtained in the merging steps are well-distributed. This is the standard heuristic of classical and quantum subset-sum algorithms. Later, Becker, Coron and Joux  [3] (BCJ) improved the asymptotic runtime down to \( \widetilde{\mathcal {O}} \left( 2^{0.291 n} \right) \) by allowing even more representations, with vectors in \(\{-1,0,1\}^n\).

The BCJ representation technique is not only a tool for subset-sums, as it has been used to speed up generic decoding algorithms, classically  [4, 31, 32] and quantumly  [22]. Therefore, the subset-sum problem serves as the simplest application of representations, and improving our understanding of the classical and quantum algorithms may have consequences on these other generic problems.

Quantum Algorithms for the Subset-Sum Problem. Cryptosystems based on hard subset-sums are natural candidates for post-quantum cryptography, but to understand precisely their security, we have to study the best generic algorithms for solving subset-sums. The first quantum time speedup for this problem was obtained in  [6], with a quantum time \( \widetilde{\mathcal {O}} \left( 2^{0.241 n} \right) \). The algorithm was based on the HGJ algorithm. Later on,  [18] devised an algorithm based on BCJ, running in time \( \widetilde{\mathcal {O}} \left( 2^{0.226 n} \right) \). Both algorithms use the corresponding classical merging structure, wrapped in a quantum walk on a Johnson graph, in the MNRS quantum walk framework   [30]. However, they suffer from two limitations.

First, both use the model of quantum memory with quantum random-access (QRAQM), which is stronger than the standard quantum circuit model, as it allows unit-time lookups in superposition of all the qubits in the circuit. The QRAQM model is used in most quantum walk algorithms to date, but its practical realizations are still unclear. With a more restrictive model, i.e., classical memory with quantum random-access (QRACM), no quantum time speedup over BCJ was previously known. This is not the case for some other hard problems in post-quantum cryptography, e.g. heuristic lattice sieving for the Shortest Vector Problem, where the best quantum algorithms to date require only QRACM  [25].

Second, both use a conjecture (implicit in  [6], made explicit in  [18]) about quantum walk updates. In short, the quantum walk maintains a data structure, that contains a merging tree similar to HGJ (resp. BCJ), with lists of smaller size. A quantum walk step is made of updates that changes an element in the lowest-level lists, and requires to modify the upper levels accordingly, i.e., to track the partial collisions that must be removed or added. In order to be efficient, the update needs to run in polynomial time. Moreover, the resulting data structure shall be a function of the lowest-level list, and not depend on the path taken in the walk. The conjecture states that it should be possible to guarantee sound updates without impacting the time complexity exponent. However, it does not seem an easy task and the current literature on subset-sums lacks further justification or workarounds.

Contributions. In this paper, we improve classical and quantum subset-sum algorithms based on representations. We write these algorithms as sequences of “merge-and-filter” operations, where lists of subknapsacks are first merged with respect to an arbitrary constraint, then filtered to remove the subknapsacks that cannot be part of a solution.

First, we propose a more time-efficient classical subset-sum algorithm based on representations. We have two classical improvements: we revisit the previous algorithms and show that some of the constraints they enforced were not needed, and we use more general distributions by allowing “2”s in the representations. Overall, we obtain a better time complexity exponent of 0.283.

Most of our contributions concern quantum algorithms. As a generic tool, we introduce quantum filtering, which speeds up the filtering of representations with a quantum search. We use this improvement in all our new quantum algorithms.

We give an improved quantum walk based on quantum filtering and our extended \(\{-1,0,1,2\}\) representations. Our best runtime exponent is 0.216, under the quantum walk update heuristic of  [18]. Next, we show how to overcome this heuristic, by designing a new data structure for the vertices in the quantum walk, and a new update procedure with guaranteed time. We remove this heuristic from the previous algorithms  [6, 18] with no additional cost. However, we find that removing it from our quantum walk increases its cost to 0.218.

In a different direction, we devise a new quantum subset-sum algorithm based on HGJ, with time \( \widetilde{\mathcal {O}} \left( 2^{0.236 n} \right) \). It is the first quantum time speedup on subset-sums that is not based on a quantum walk. The algorithm performs instead a depth-first traversal of the HGJ tree, using quantum search as its only building block. Hence, by construction, it does not require the additional heuristic of  [18] and it only uses classical memory with quantum random-access, giving also the first quantum time speedup for subset-sum in this memory model.

A summary of our contributions is given in Table 1Footnote 1. All these complexity exponents are obtained by numerical optimization. Our code is available at https://github.com/xbonnetain/optimization-subset-sum.

Table 1. Previous and new algorithms for subset-sum, classical and quantum, with time and memory exponents rounded upwards. We note that the removal of Heuristic 2 in  [6, 18] comes from our new analysis in Sect. 6.4. QW: Quantum Walk. QS: Quantum Search. CF: Constraint filtering (not studied in this paper). QF: Quantum filtering.

Outline. In Sect. 2, we study classical algorithms. We review the representation technique, the HGJ algorithm and introduce our new \(\{-1,0,1,2\}\) representations to improve over  [3]. In Sect. 3, we move to the quantum setting, introduce some preliminaries and the previous quantum algorithms for subset-sum. In Sect. 4, we present and study our new quantum algorithm based on HGJ and quantum search. We give different optimizations and time-memory tradeoffs. In Sect. 5, we present our new quantum algorithm based on a quantum walk. Finally, in Sect. 6 we show how to overcome the quantum walk update conjecture, up to a potential increase in the update cost. We conclude, and give a summary of our new results in Sect. 7.

2 List Merging and Classical Subset-Sum Algorithms

In this section, we remain in the classical realm. We introduce the standard subset-sum notations and heuristics and give a new presentation of the HGJ algorithm, putting an emphasis on the merge-and-filter operation. We introduce our extended \(\{-1,0,1,2\}\) representations and detail our improvements over BCJ.

2.1 Notations and Conventions

Hereafter and in the rest of the paper, all time and memory complexities, classical and quantum, are exponential in n. We use the soft-O notation \(\widetilde{\mathcal {O}}\) which removes polynomial factors in n, and focus on the asymptotic exponent, relative to n. We use \(\mathrm {negl}(n)\) for any function that vanishes inverse-exponentially in n. We often replace asymptotic exponential time and memory complexities (e.g. \( \widetilde{\mathcal {O}} \left( 2^{\alpha n} \right) \)) by their exponents (e.g. \(\alpha \)). We use capital letters (e.g. L) and corresponding letters (e.g. \(\ell \)) to denote the same value, in \(\log _2\) and relatively to n: \(\ell = \log _2(L)/n\).

Definition 1

(Entropies and multinomial functions). We define the following functions:

figure a

Property 1

(Standard approximations). We have the following approximations, asymptotically in n:

$$\begin{aligned} \begin{array}{c}\mathrm {bin}\left( \omega ,\alpha \right) \simeq \frac{1}{n} \log _2{\omega n \atopwithdelims ()\alpha n} ~~;~~ \mathrm {trin}\left( \omega ,\alpha ,\beta \right) \simeq \frac{1}{n} \log _2{\omega n \atopwithdelims ()\alpha n, \beta n} \\ \mathrm {quadrin}\left( \omega ,\alpha ,\beta ,\gamma \right) \simeq \frac{1}{n} \log _2 { \omega n \atopwithdelims ()\alpha n, \beta n, \gamma n } \end{array} \end{aligned}$$

Definition 2

(Distributions of knapsacks). A knapsack or subknapsack is a vector \(\mathbf {e} \in \{-1,0,1,2\}^n\). The set of \(\mathbf {e}\) with \(\alpha n\) “−1”, \((\alpha + \beta - 2\gamma ) n\) “1”, \(\gamma n\) “2” and \((1 - 2\alpha - \beta + \gamma )n\) “0” is denoted \(D^n[\alpha , \beta , \gamma ]\). If \(\gamma = 0\), we may omit the third parameter. This coincides with the notation \(D^n[\alpha , \beta ]\) from  [18].

Note that we always add vectors over the integers, and thus, the sum of two vectors of \(D^n[*,*,*]\) may contain unwanted symbols \(-2, 3\) or 4.

Property 2

(Size of knapsack sets). We have:

$$\begin{aligned} \begin{array}{c} \frac{1}{n} \log _2 | D^n[0, \beta ,0] | \simeq h(\beta ) ~~; ~~\frac{1}{n} \log _2 | D^n[\alpha , \beta , 0] | \simeq g(\alpha , \alpha + \beta )\\ \frac{1}{n} \log _2 | D^n[\alpha , \beta , \gamma ] | \simeq f(\alpha , \alpha + \beta - 2\gamma , \gamma ) . \end{array} \end{aligned}$$

Subset-sum. The problem we will solve is defined as follows:

Definition 3

(Random subset-sum instance of weight n/2). Let \(\mathbf {a}\) be chosen uniformly at random from \(\left( \mathbb {Z}_{N}\right) ^n\), where \(N \simeq 2^n\). Let \(\mathbf {e}\) be chosen uniformly at random from \(D^n[0, 1/2, 0]\). Let \(t = \mathbf {a} \cdot \mathbf {e} \pmod {N}\). Then \((\mathbf {a}, t)\) is a random subset-sum instance. A solution is a vector \(\mathbf {e'}\) such that \(\mathbf {a} \cdot \mathbf {e'} = t \pmod {N}\).

Sampling. Throughout this paper, we assume that we can classically sample uniformly at random from \(D^n[\alpha , \beta , \gamma ]\) in time \(\mathrm {poly}(n)\). (Since \(\alpha n\), \(\beta n\) and \(\gamma n\) will in general not be integer, we suppose to have them rounded to the nearest integer.) This comes from an efficient bijection between representations and integers (see Appendix A in the full version of the paper  [8]). In addition, we can efficiently produce the uniform superposition of vectors of \(D^n[\alpha , \beta , \gamma ]\), using \(\mathrm {poly}(n)\) quantum gates, and we can perform a quantum search among representations.

2.2 Merging and Filtering

In all subset-sum algorithms studied in this paper, we repeatedly sample vectors with certain distributions \(D^n[*, *, *]\), then combine them. Let \(D_1 = D^n[\alpha _1, \beta _1, \gamma _1]\), \(D_2 = D^n[\alpha _2, \beta _2, \gamma _2]\) be two input distributions and \(D = D^n[\alpha , \beta , \gamma ]\) be a target. Given two lists \(L_1 \in D_1^{|L_1|}\) and \(L_2 \in D_2^{|L_2|}\), we define:

  • the merged list \(L = L_1 \bowtie _c L_2\) containing all vectors \(\mathbf {e} = \mathbf {e_1} + \mathbf {e_2}\) such that: \(\mathbf {e_1} \in L_1, \mathbf {e_2} \in L_2\), \((\mathbf {e}_1 + \mathbf {e}_2) \cdot \mathbf {a} = s \mod M\), \(s \le M\) is an arbitrary integer and \(M \approx 2^{cn}\) (we write \(L_1 \bowtie _c L_2\) because s is an arbitrary value, whose choice is without incidence on the algorithm)

  • the filtered list \(L^f = (L \cap D) \subseteq L\), containing the vectors with the target distribution of \(1, -1, 2\) (the target D will always be clear from context).

In general, L is exponentially bigger than \(L^f\) and does not need to be written down, as vectors can be filtered on the fly. The algorithms then repeat the merge-and-filter operation on multiple levels, moving towards the distribution \(D^n[0, 1/2]\) while increasing the bit-length of the modular constraint, until we satisfy \(\mathbf {e} \cdot \mathbf {a} = t \mod 2^n\) and obtain a solution. Note that this merging-and-filtering view that we adopt, where the merged list is repeatedly sampled before an element passes the filter, has some similarities with the ideas developed in the withdrawn article  [15].

The standard subset-sum heuristic assumes that vectors in \(L^f\) are drawn independently, uniformly at random from D. It simplifies the complexity analysis of both classical and quantum algorithms studied in this paper. Note that this heuristic, which is backed by experiments, actually leads to provable probabilistic algorithms in the classical setting (see  [3, Theorem 2]). We adopt the version of  [18].

Heuristic 1

If input vectors are uniformly distributed in \(D_1 \times D_2\), then the filtered pairs are uniformly distributed in D (more precisely, among the subset of vectors in D satisfying the modular condition).

Filtering Representations. We note \(\ell = (1/n)\log _2|L|\), and so on for \(\ell _1, \ell _2, \ell ^f\). By Heuristic 1, the average sizes of \(L_1\), \(L_2\), L and \(L^f\) are related by:

  • \(\ell = \ell _1 + \ell _2 - c\)

  • \(\ell ^f = \ell + \mathsf {pf}\), where \(\mathsf {pf}\) is negative and \(2^{\mathsf {pf}n}\) is the probability that a pair \((\mathbf {e_1}, \mathbf {e_2})\), drawn uniformly at random from \(D_1 \times D_2\), has \((\mathbf {e_1} + \mathbf {e_2}) \in D\).

In particular, the occurrence of collisions in \(L^f\) is a negligible phenomenon, unless \(\ell ^f\) approaches (\(\log _2 |D|/n) - c\), which is the maximum number of vectors in D with constraint c. For a given random knapsack problem, with high probability, the size of any list built by sampling, merging and filtering remains very close to its average (by a Chernoff bound and a union bound on all lists).

Here, \(\mathsf {pf}\) depends only on \(D_1, D_2\) and D. Working with this filtering probability is especially useful for writing down our algorithm in Sect. 4. We give its formula for \(\{0,1\}\) representations below. Two similar results for \(\{-1,0,1\}\) and \(\{-1,0,1,2\}\) can be found in the full version of the paper  [8].

Lemma 1

(Filtering HGJ-style representations). Let \(\mathbf {e_1} \in D^n[0, \alpha ]\) and \(\mathbf {e_2} \in D^n[0, \beta ]\) be drawn uniformly at random. The probability that \(\mathbf {e_1} + \mathbf {e_2} \in D^n[0, \alpha + \beta ]\) is 0 if \(\alpha + \beta > 1\), and \(2^{\mathsf {pf}_1 \left( \alpha , \beta \right) n}\) otherwise, with

$$\mathsf {pf}_1 \left( \alpha , \beta \right) = \mathrm {bin}\left( 1-\alpha ,\beta \right) - h(\beta ) = \mathrm {bin}\left( 1-\beta ,\alpha \right) - h(\alpha ) .$$

Proof

The probability that a \(\mathbf {e}_1 + \mathbf {e}_2\) survives the filtering is:

$$ {n - \alpha n \atopwithdelims ()\beta n}/{n \atopwithdelims ()\beta n} = {n - \beta n \atopwithdelims ()\alpha n}/{n \atopwithdelims ()\alpha n} .$$

Indeed, given a choice of \(\alpha n\) bit positions among n, the other \(\beta n\) bit positions must be compatible, hence chosen among the \((1-\alpha )n\) remaining positions. By taking the \(\log _2\), we obtain the formula for the filtering probability.    \(\square \)

Time Complexity of Merging. Classically, the time complexity of the merge-and-filter operation is related to the size of the merged list.

Lemma 2

(Classical merging with filtering). Let \(L_1\) and \(L_2\) be two sorted lists stored in classical memory with random access. In \(\log _2\), relatively to n, and discarding logarithmic factors, merging and filtering \(L_1\) and \(L_2\) costs a time \(\max (\min (\ell _1, \ell _2), \ell _1 + \ell _2 - c)\) and memory \(\max (\ell _1, \ell _2, \ell ^f)\), assuming that we must store the filtered output list.

Proof

Assuming sorted lists, there are two symmetric ways to produce a stream of elements of \(L_1 \bowtie _c L_2\): we can go through the elements of \(L_1\), and for each one, find the matching elements in \(L_2\) by dichotomy search (time \(\ell _1 + \max (0, \ell _2 - c)\)) or we can exchange the role of \(L_1\) and \(L_2\). Although we do not need to store \(L_1 \bowtie _c L_2\), we need to examine all its elements in order to filter them.   \(\square \)

2.3 Correctness of the Algorithms

While the operation of merging and filtering is the same as in previous works, our complexity analysis differs  [3, 6, 18, 21]. We enforce the constraint that the final list contains a single solution, hence if it is of size \(2^{n \ell _0}\), we constrain \(\ell _0 = 0\). Next, we limit the sizes of the lists so that they do not contain duplicate vectors: these are saturation constraints. A list of size \(2^{n \ell }\), of vectors sampled from a distribution D, with a constraint of cn bits, has the constraint: \(\ell \le \frac{1}{n}\log _2{|D|} - c\). This says that there are not more than \(|D|/2^{cn}\) vectors \(\mathbf {e}\) such that \(\mathbf {e} \cdot \mathbf {a} = r \pmod {2^{cn}}\) for the (randomly chosen) arbitrary constraint r.

Previous works focus on the solution vector \(\mathbf {e}\) and compute the number of representations of \(\mathbf {e}\), that is, the number of ways it can be decomposed as a sum: \(\mathbf {e} = \mathbf {e}_1 + \ldots + \mathbf {e}_t\) of vectors satisfying the constraints on the distributions. Then, they compare this with the probability that a given representation passes the arbitrary constraints imposed by the algorithm. As their lists contains all the subknapsacks that fulfill the constraint, this really reflects the number of duplicates, and it suffices to enforce that the number of representations is equal to the inverse probability that a representation fulfills the constraint. If the two lists we merge are not of maximal size, the size of the merged list is the number of elements that fulfill the corresponding distribution times the probability that such an element is effectively the sum of two elements in the initial lists.

The two approaches are strictly equivalent, as the probability that the sum of two subknapsacks is valid is exactly the number of representations of the sum, divided by the number of pairs of subknapsacks.

2.4 The HGJ Algorithm

We start our study of classical subset-sum by recalling the algorithm of Howgrave-Graham and Joux  [21], with the corrected time complexity of  [3]. The algorithm builds a merging tree of lists of subknapsacks, with four levels, numbered 3 down to 0. Level j contains \(2^j\) lists. In total, 8 lists are merged together into one.

  • Level 3. We build 8 lists denoted \(L_0^3 \ldots L_7^3\). They contain all subknapsacks of weight \(\frac{n}{16}\) on \(\frac{n}{2}\) bits, either left or right:

    From Property 2, these level-3 lists have size \(\ell _3 = h(1/8)/2\). As the positions set to 1 cannot interfere, these is no filtering when merging \(L_{2i}^3\) and \(L_{2i+1}^3\).

  • Level 2. We merge the lists pairwise with a (random) constraint on \(c_2 n\) bits, and obtain 4 filtered lists. The size of the filtered lists plays a role in the memory complexity of the algorithm, but the time complexity depends on the size of the unfiltered lists.

    In practice, when we say “with a constraint on \(c_j n\) bits”, we assume that given the subset-sum objective t modulo \(2^n\), random values \(r_i^j\) such that \(\sum _i r_i^j = t \mod 2^{c_jn}\) are selected at level j, and the \(r_i^j\) have \(c_j n\) bits only. Hence, at this step, we have selected 4 integers on \(c_2 n\) bits \(r^1_0, r^1_1, r^1_2, r^1_3\) such that \(r^1_0 + r^1_1 + r^1_2 + r^1_3 = t \mod 2^{c_2n}\). The 4 level-2 lists \(L_0^2, L_1^2, L_2^2, L_3^2\) have size \(\ell _2 = (h(1/8) - c_2)\), they contain subknapsacks of weight \(\frac{n}{8}\) on n bits.

Remark 1

The precise values of these \(r_i\) are irrelevant, since they cancel out each other in the end. They are selected at random during a run of the algorithm, and although there could be “bad” values of them that affect significantly the computation, this is not expected to happen.

  • Level 1. We merge the lists pairwise with \((c_1 - c_2) n\) new bits of constraint, ensuring that the constraint is compatible with the previous ones. We obtain two filtered lists \(L_0^1, L_1^1\), containing subknapsacks of weight n/4. They have size:

    $$ \ell _1 = 2 \ell _2 - (c_1 - c_2) + \mathsf {pf}_1 \left( 1/8, 1/8 \right) $$

    where \(\mathsf {pf}_1 \left( 1/8, 1/8 \right) \) is given by Lemma 1.

  • Level 0. We find a solution to the subset-sum problem with the complete constraint on n bits. This means that the list \(L^0\) must have expected length \(\ell _0 = 0\). Note that there remains \((1 - c_1) n\) bits of constraint to satisfy, and the filtering term is similar as before, so:

    $$ \ell _0 = 2 \ell _1 - (1-c_1) + \mathsf {pf}_1 \left( 1/4, 1/4 \right) .$$
Fig. 1.
figure 1

The HGJ algorithm (duplicate lists are omitted)

By Lemma 2, the time complexity of this algorithm is determined by the sizes of the unfiltered lists:  \( \max \left( \ell _3, 2\ell _3 - c_2, 2\ell _2 - (c_1 - c_2), 2 \ell _1 - (1-c_1) \right) \). The memory complexity depends of the sizes of the filtered lists: \(\max \left( \ell _3, \ell _2, \ell _1\right) \). By a numerical optimization, one obtains a time exponent of 0.337n.

2.5 The BCJ Algorithm and Our Improvements

The HGJ algorithm uses representations to increase artificially the search space. The algorithm of Becker, Coron and Joux  [3] improves the runtime exponent down to 0.291 by allowing even more freedom in the representations, which can now contain “\(-1\)”s. The “\(-1\)”s have to cancel out progressively, to ensure the validity of the final knapsack solution.

Fig. 2.
figure 2

Our improved algorithm (duplicate lists are omitted).

We improve over this algorithm in two different ways. First, we relax the constraints \(\ell _j + c_j = g(\alpha _j, 1/2^{j+1})\) enforced in  [3], as only the inequalities \(\ell _j + c_j \leqslant g(\alpha _j, 1/2^{j+1})\) are necessary: they make sure the lists are not larger than the number of distinct elements they can contain. This idea was also implicitly used in  [13], in the context of syndrome decoding. When optimizing the parameters under these new constraints, we bring the asymptotic time exponent down to 0.289n.

\(\{-1,0,1,2\}\) representations. Next, we allow the value “2” in the subknapsacks. This allows us to have more representations for the final solution from the same initial distributions. Indeed, in BCJ, if on a bit the solution is the sum of a “−1” and two “1”s, then it can only pass the merging steps if we first have the “−1” that cancels a “1”, and then the addition of the second “1”. When allowing “2”s, we can have the sum of the two “1’s and then at a later step the addition of a “−1”. The algorithm builds a merging tree with five levels, numbered 4 down to 0. Level j contains \(2^j\) lists. In total, 16 lists are merged together into one.

  • Level 4. We build 16 lists \(L_0^4 \ldots L_{15}^4\). They contain complete distributions on \(\frac{n}{2}\) bits, either left or right, with \(\frac{n}{32} + \frac{\alpha _3 n}{2} - \gamma _3 n\) “1”, \(\frac{\alpha _3 n}{2}\) “−1” and \(\frac{\gamma _3 n}{2}\) “2”:

    As before, this avoids filtering at the first level. These lists have size: \(\ell _4 = f(\alpha _3, 1/16 + \alpha _3 - 2\gamma _3, \gamma _3) / 2\).

  • Level 3. We merge into 8 lists \(L_0^3 \ldots L_7^3\), with a constraint on \(c_3\) bits. As there is no filtering, these lists have size: \(\ell _3 = f(\alpha _3, 1/16 + \alpha _3 - 2\gamma _3, \gamma _3) - c_3\).

  • Level 2. We now merge and filter. We force a target distribution \(D^{n}[\alpha _2, 1/8, \gamma _2]\), with \(\alpha _2\) and \(\gamma _2\) to be optimized later. There is a first filtering probability \(p_2\). We have \(\ell _2 = 2\ell _3 - (c_2 - c_3) + p_2\).

  • Level 1. Similarly, we have: \(\ell _1 = 2\ell _2 - (c_1 - c_2) + p_1\).

  • Level 0. We have \(\ell _0 = 2\ell _1 - (1-c_1) + p_0 = 0\), since the goal is to obtain one solution in the list \(L_0\).

With these constraints, we find a time \( \widetilde{\mathcal {O}} \left( 2^{0.2830n} \right) \) (rounded upwards) with the following parameters:

$$\begin{aligned}&\alpha _1 = 0.0340, \alpha _2 = 0.0311, \alpha _3 = 0.0202, \gamma _1 = 0.0041, \gamma _2 = 0.0006, \gamma _3 = 0.0001 \\&c_1 = 0.8067, c_2 = 0.5509, c_3 = 0.2680, p_0 = -0.2829, p_1 = -0.0447, p_2 = -0.0135 \\&\ell _1 = 0.2382, \ell _2 = 0.2694, \ell _3 = 0.2829, \ell _4 = 0.2755 \end{aligned}$$

Remark 2

(On numeric optimizations). All algorithms since HGJ, including quantum ones, rely on (nonlinear) numeric optimizations. Their correctness is easy to check, since the obtained parameters satisfy the constraints, but there is no formal proof that the parameters are indeed optimal for a given constraint set. The same goes for all algorithms studied in this paper. In order to gain confidence in our results, we tried many different starting points and several equivalent rewriting of the constraints.

Remark 3

(Adding more symbols). In general, adding more symbols (“−2”s, “3”s, etc.) can only increase the parameter search space and improve the optimal time complexity. However, we expect that the improvements from adding more symbols will become smaller and smaller, while the obtained constraints will become more difficult to write down and the parameters harder to optimize. Note that adding “−1”s decreases the time complexity exponent by 0.048, while adding “2”s decreases it only by 0.006.

Remark 4

(On the number of levels). Algorithms based on merging-and-filtering, classical and quantum, have a number of levels (say, 4 or 5) which must be selected before writing down the constraints. The time complexity is a decreasing function of the number of levels, which quickly reaches a minimum. In all algorithms studied in this paper, adding one more level does not change the cost of the upper levels, which will remain the most expensive.

3 Quantum Preliminaries and Previous Work

In this section, we recall some preliminaries of quantum computation (quantum search and quantum walks) that will be useful throughout the rest of this paper. We also recall previous quantum algorithms for subset-sum. As we consider all our algorithms from the point of view of asymptotic complexities, and neglect polynomial factors in n, a high-level overview is often enough, and we will use quantum building blocks as black boxes. The interested reader may find more details in  [35].

3.1 Quantum Preliminaries

All the quantum algorithms considered in this paper run in the quantum circuit model, with quantum random-access memory, often denoted as qRAM. “Baseline” quantum circuits are simply built using a universal gate set. Many quantum algorithms use qRAM access, and require the circuit model to be augmented with the so-called “qRAM gate”. This includes subset-sum, lattice sieving and generic decoding algorithms that obtain time speedups with respect to their classical counterparts. Given an input register \(1 \le i \le r\), which represents the index of a memory cell, and many quantum registers \(\left| x_1, \ldots x_r \right\rangle \), which represent stored data, the qRAM gate fetches the data from register \(x_i\):

$$ \left| i \right\rangle \left| x_1, \ldots x_r \right\rangle \left| y \right\rangle \mapsto \left| i \right\rangle \left| x_1, \ldots x_r \right\rangle \left| y \oplus x_i \right\rangle .$$

We will use the terminology of  [24] for the qRAM gate:

  • If the input i is classical, then this is the plain quantum circuit model (with classical RAM);

  • If the \(x_j\) are classical, we have quantum-accessible classical memory (QRACM)

  • In general, we have quantum-accessible quantum memory (QRAQM)

All known quantum algorithms for subset-sum with a quantum time speedup over the best classical one require QRAQM. For comparison, speedups on heuristic lattice sieving algorithms exist in the QRACM model  [23, 26], including the best one to date  [25]. While no physical architecture for quantum random access has been proposed that would indeed produce a constant or negligible overhead in time, some authors  [24] consider the separation meaningful. If we assign a cost \( \mathcal {O} \left( N \right) \) to a QRACM query of N cells, then we can replace it by classical memory. Subset-sum algorithms were studied in this setting by Helm and May  [19].

Quantum Search. One of the most well-known quantum algorithms is Grover’s unstructured search algorithm  [17]. We present here its generalization, amplitude amplification  [12].

Lemma 3

(Amplitude amplification, from [12]). Let \(\mathcal {A}\) be a reversible quantum circuit, f a computable boolean function over the output of \(\mathcal {A}\), \(O_f\) its implementation as a quantum circuit, and a be the initial success probability of \(\mathcal {A}\), that is, the probability that \(O_f\mathcal {A}\left| 0 \right\rangle \) outputs “true”. There exists a quantum reversible algorithm that calls \( \mathcal {O} \left( \sqrt{1/a} \right) \) times \(\mathcal {A}\), \(\mathcal {A}^\dagger \) and \(O_f\), uses as many qubits as \(\mathcal {A}\) and \(O_f\), and produces an output that passes the test f with probability greater than \(\max (a,1-a)\).

This is known to be optimal when the functions are black-box oracles  [5].

As we will use quantum search as a subprocedure, we make some remarks similar to  [33, Appendix A.2] and  [10, Sect. 5.2] to justify that, up to additional polynomial factors in time, we can consider it runs with no errors and allows to return all the solutions efficiently.

Remark 5

(Error in a sequence of quantum searches). Throughout this paper, we will assume that a quantum search in a search space of size S with T solutions runs in exact time \(\sqrt{S/T}\). In practice, there is a constant overhead, but since S and T are always exponential in n, the difference is negligible. Furthermore, this is a probabilistic procedure, and it will return a wrong result with a probability of the order \(\sqrt{T/S}\). As we can test if an error occurs, we can make it negligible by redoing the quantum search polynomially many times.

Remark 6

(Finding all solutions). Quantum search returns a solution among the T possibilities, selected uniformly at random. Finding all solutions is then an instance of the coupon collector problem with T coupons  [34]; all coupons are collected after on average \( \mathcal {O} \left( T\log (T) \right) \) trials. However, in the QRACM model, which is assumed in this paper, this logarithmic factor disappears. We can run the search of Lemma 3 with a new test function that returns 0 if the output of \(\mathcal {A}\) is incorrect, or if it is correct but has already been found. The change to the runtime is negligible, and thus, we collect all solutions with only \( \mathcal {O} \left( T \right) \) searches.

Quantum Walks. Quantum walks can be seen as a generalization of quantum search. They allow to obtain polynomial speedups on many unstructured problems, with sometimes optimal results (e.g. Ambainis’ algorithm for element distinctness  [1]). In this paper, we consider walks in the MNRS framework  [30].

Let \(G = (V,E)\) be an undirected, connected, regular graph, such that some vertices of G are “marked”. Let \(\epsilon \) be the fraction of marked vertices, that is, a random vertex has a probability \(\epsilon \) of being marked. Let \(\delta \) be the spectral gap of G, which is defined as the difference between its two largest eigenvalues.

In a classical random walk on G, we can start from any vertex and reach the stationary distribution in approximately \(\frac{1}{\delta }\) random walk steps. Then, such a random vertex is marked with probability \(\epsilon \). Assume that we have a procedure Setup that samples a random vertex to start with in time \(\mathsf {S}\), Check that verifies if a vertex is marked or not in time \(\mathsf {C}\) and Update that performs a walk step in time \(\mathsf {U}\), then we will have found a marked vertex in expected time: \( \mathsf {S} + \frac{1}{\epsilon } \left( \frac{1}{\delta } \mathsf {U} + \mathsf {C} \right) \!.\)

Quantum walks reproduce the same process, except that their internal state is not a vertex of G, but a superposition of vertices. The walk starts in the uniform superposition \(\sum _{v \in V} \left| v \right\rangle \), which must be generated by the Setup procedure. It repeats \(\sqrt{1/ \epsilon }\) iterations that, similarly to amplitude amplification, move the amplitude towards the marked vertices. An update produces, from a vertex, the superposition of its neighbors. Each iteration does not need to repeat \(\frac{1}{\delta }\) vertex updates and, instead, takes a time equivalent to \(\sqrt{1/\delta }\) updates to achieve a good mixing. Thanks to the following theorem , we will only need to specify the setup, checking and update unitaries.

Theorem 1

(Quantum walk on a graph (adapted from  [30])). Let \(G=(V,E)\) be a regular graph with spectral gap \(\delta >0\). Let \(\epsilon >0\) be a lower bound on the probability that a vertex chosen randomly of G is marked. For a random walk on G, let \(\mathsf {S},\mathsf {U}, \mathsf {C}\) be the setup, update and checking cost. Then there exists a quantum algorithm that with high probability finds a marked vertex in time

$$ \mathcal {O} \left( \mathsf {S} + \frac{1}{\sqrt{\epsilon }} \left( \frac{1}{\sqrt{\delta }}\mathsf {U} + \mathsf {C} \right) \right) .$$

3.2 Solving Subset-Sum with Quantum Walks

In 2013, Bernstein, Jeffery, Lange and Meurer  [6] constructed quantum subset sum algorithms inspired by Schroeppel-Shamir  [37] and HGJ  [21]. We briefly explain the idea of their quantum walk for HGJ. The graph G that they consider is a product Johnson graph. We recall formal definitions from  [22].

Definition 4

(Johnson graph). A Johnson graph J(NR) is an undirected graph whose vertices are the subsets of R elements among a set of size N, and there is an edge between two vertices S and \(S'\) iff \(|S \cap S'| = R-1\), in other words, if \(S'\) can be obtained from S by replacing an element. Its spectral gap is given by \(\delta = \frac{N}{R(N-R)}\).

Theorem 2

(Cartesian product of Johnson graphs  [22]). Let \(J^m(N, R)\) be defined as the cartesian product of m Johnson graphs J(NR), i.e., a vertex in \(J^m(N, R)\) is a tuple of m subsets \(S_1, \ldots S_m\) and there is an edge between \(S_1, \ldots S_m\) and \(S_1', \ldots S_m'\) iff all subsets are equal at all indices except one index i, which satisfies \(|S_i \cap S_i'| = R-1\). Then it has \({N \atopwithdelims ()R}^{m}\) vertices and its spectral gap is greater than \(\frac{1}{m} \frac{N}{R(N-R)}\).

In   [6], a vertex contains a product of 8 sublists \(L_0'^3 \subset L_0^3, \ldots , L_7'^3 \subset L_7^3\) of a smaller size than the classical lists: \(\ell < \ell _3\). There is an edge between two vertices if we can transform one into the other by replacing only one element in one of the sublists. The spectral gap of such a graph is (in \(\log _2\), relative to n) \(- \ell \).

In addition, each vertex has an internal data structure which reproduces the HGJ merging tree, from level 3 to level 0. Since the initial lists are smaller, the list \(L^0\) is now of expected size \(8(\ell - \ell _3)\) (in \(\log _2\), relative to n), i.e., the walk needs to run for \(4(\ell _3 - \ell )\) steps. Each step requires \(\ell /2\) updates.

In the Setup procedure, we simply start from all choices for the sublists and build the tree by merging and filtering. Assuming that the merged lists have decreasing sizes, the setup time is \(\ell \). The vertex is marked if it contains a solution at level 0. Hence, checking if a vertex is marked takes time \(\mathsf {C} = 1\), but the update procedure needs to ensure the consistency of the data structure. Indeed, when updating, we remove an element \(\mathbf {e}\) from one of the lists \(L_i'^3\) and replace it by a \(\mathbf {e'}\) from \(L_i^3\). We then have to track all subknapsacks in the upper levels where \(\mathbf {e}\) intervened, to remove them, and to add the new collisions where \(\mathbf {e'}\) intervenes.

Assuming that the update can run in \(\mathrm {poly}(n)\), an optimization with the new parameter \(\ell \) yields an exponent 0.241. In  [6], the parameters are such that on average, a subknapsack intervenes only in a single sum at the next level. The authors propose to simply limit the number of elements to be updated at each level, in order to guarantee a constant update time.

Quantum Walk Based on BCJ. In  [18], Helm and May quantize, in the same way, the BCJ algorithm. They add “−1” symbols and a new level in the merging tree data structure, reaching a time exponent of 0.226. But they remark that this result depends on a conjecture, or a heuristic, that was implicit in  [6].

Heuristic 2

(Helm-May). In these quantum walk subset-sum algorithms, an update with expected constant time \(\mathsf {U}\) can be replaced by an update with exact time \(\mathsf {U}\) without affecting the runtime of the algorithm, up to a polynomial factor.

Indeed, it is easy to construct “bad” vertices and edges for which an exact update, i.e. the complete reconstruction of the merging tree, will take exponential time: by adding a single new subknapsack \(\mathbf {e}\), we find an exponential number of pairs \(\mathbf {e} + \mathbf {e'}\) to include at the next level. So we would like to update only a few elements among them. But in the MNRS framework, the data structure of a vertex must depend solely on the vertex itself (i.e. on the lowest-level lists in the merging tree). And if we do as proposed in  [6], we add a dependency on the path that lead to the vertex, and lose the consistency of the walk.

In a related context, the problem of “quantum search with variable times” was studied by Ambainis  [2]. In a quantum search for some x such that \(f(x) = 1\), in a set of size N, if the time to evaluate f on x is always 1, then the search requires time \( \mathcal {O} \left( \sqrt{N} \right) \). Ambainis showed that if the elements have different evaluation times \(t_1, \ldots t_N\), then the search now requires \(\widetilde{\mathcal {O}}(\sqrt{ t_1^2 + \ldots + t_N^2 })\), the geometric mean of \(t_1, \ldots t_N\). As quantum search can be seen as a particular type of quantum walk, this shows that Heuristic 2 is wrong in general, as we can artificially create a gap between the geometric mean and expectation of the update time \(\mathsf {U}\); but also, that it may be difficult to actually overcome. In this paper, we will obtain different heuristic and non-heuristic times.

4 Quantum Asymmetric HGJ

In this section, we give the first quantum algorithm for the subset-sum problem, in the QRACM model, with an asymptotic complexity smaller than BCJ.

4.1 Quantum Match-and-Filter

We open this section with some technical lemmas that replace the classical merge-and-filter Lemma 2. In this section, we will consider a merging tree as in the HGJ algorithm, but this tree will be built using quantum search. The following lemmas bound the expected time of merge-and-filter and match-and-filter operations performed quantumly, in the QRACM model. This will have consequences both in this section and in the next one.

First, we remark that we can use a much more simple data structure than the ones in  [1, 6]. In this data structure, we store pairs \(\mathbf {e}, \mathbf {e} \cdot \mathbf {a}\) indexed by \(\mathbf {e} \cdot \mathbf {a} \mod M\) for some \(M \simeq 2^m\).

Definition 5

(Unique modulus list). A unique modulus list is a qRAM data structure \(\mathcal {L}(M)\) that stores at most M entries \((\mathbf {e}, \mathbf {e} \cdot \mathbf {a})\), indexed by \(\mathbf {e} \cdot \mathbf {a} \mod M\), and supports the following operations:

  • Insertion: inserts the entry \((\mathbf {e}, \mathbf {e} \cdot \mathbf {a})\) if the modulus is not already occupied;

  • Deletion: deletes \((\mathbf {e}, \mathbf {e} \cdot \mathbf {a})\) (not necessary in this section)

  • Query in superposition: returns the superposition of all entries \((\mathbf {e}, \mathbf {e} \cdot \mathbf {a})\) with some modular condition on \(\mathbf {e} \cdot \mathbf {a}\), e.g. \(\mathbf {e} \cdot \mathbf {a} = t \mod M'\) for some t and some modulus \(M'\).

Note that all of these operations, including the query in superposition of all the entries with a given modulus, cost \( \mathcal {O} \left( 1 \right) \) qRAM gates only. For the latter, we need only some Hadamard gates to prepare the adequate superposition of indices. Furthermore, the list remains sorted by design.

Next, we write a lemma for quantum matching with filtering, in which one of the lists is not written down. We start from a unitary that produces the uniform superposition of the elements of a list \(L_1\), and we wrap it into an amplitude amplification, in order to obtain a unitary that produces the uniform superposition of the elements of the merged-and-filtered list.

Lemma 4

(Quantum matching with filtering). Let \(L_2\) be a list stored in QRACM (with the unique modulus list data structure of Definition 5). Assume given a unitary U that produces in time \(t_{L_1}\) the uniform superposition of \(L_1 = x_0, \ldots x_{2^m -1}\) where \(x_i = (\mathbf {e_i}, \mathbf {e_i} \cdot \mathbf {a})\). We merge \(L_1\) and \(L_2\) with a modular condition of cn bits and a filtering probability p. Let L be the merged list and \(L^f\) the filtered list. Assume \(|L^f| \ge 1\). Then there exists a unitary \(U'\) producing the uniform superposition of \(L^f\) in time: \( \mathcal {O} \left( \frac{t_{L_1}}{\sqrt{p}} \max ( \sqrt{2^{cn}/|L_2|}, 1) \right) \).

Notice that this is also the time complexity to produce a single random element of \(L^f\). If we want to produce and store the whole list \(L^f\), it suffices to multiply this complexity by the number of elements in \(L^f\) (i.e. \(p|L_1||L_2|/2^{cn}\)). We would obtain: \( \mathcal {O} \left( t_{L_1} \sqrt{p} \max \left( |L_1| \sqrt{\frac{|L_2|}{2^{cn}}} , \frac{|L_1| |L_2|}{2^{cn}} \right) \right) \!.\)

Proof

Since \(L_2\) is stored in a unique modulus list, all its elements have distinct moduli. Note that the expected sizes of L and \(L^f\) follow from Heuristic 1. Although the number of iterations of quantum search should depend on the real sizes of these lists, the concentration around the average is so high (given by Chernoff bounds) that the error remains negligible if we run the search with the expected number of iterations. We separate three cases.

  • If \(|L_2| < 2^{cn}\), then we have no choice but to make a quantum search on elements of \(L_1\) that match the modular constraint and pass the filtering step, in time: \( \mathcal {O} \left( t_{L_1} \sqrt{ \frac{2^{cn}}{L_2 p} } \right) \).

  • If \(|L_2| > 2^{cn}\) but \(|L_2| < 2^{cn}/p\), an element of \(L_1\) will always pass the modular constraint, with more than one candidate, but in general all these candidates will be filtered out. Given an element of \(L_1\), producing the superposition of these candidates is done in time 1, so finding the one that passes the filter, if there is one, takes time \(\sqrt{|L_2|/2^{cn}}\). Next, we wrap this in a quantum search to find the “good” elements of \(L_1\) (passing the two conditions), with \( \mathcal {O} \left( \sqrt{ 2^{cn}/p L_2} \right) \) iterations. The total time is:

    $$ \mathcal {O} \left( \sqrt{ \frac{2^{cn}}{L_2 p} } \times \left( \sqrt{|L_2|/2^{cn}} \times t_{L_1} \right) = \frac{t_{L_1}}{\sqrt{p}} \right) \!. $$
  • If \(|L_2| > 2^{cn}/ p\), an element of \(L_1\) yields on average more than one filtered candidate. Producing the superposition of the modular candidates is done in time \( \mathcal {O} \left( 1 \right) \) thanks to the data structure, then finding the superposition of filtered candidates requires \(1/\sqrt{p}\) iterations. The total time is: \( \mathcal {O} \left( t_{L_1}/\sqrt{p} \right) \).

The total time in all cases is: \( \mathcal {O} \left( \frac{t_{L_1}}{\sqrt{p}} \max (\sqrt{2^{cn}/|L_2|}, 1) \right) \). Note that classically, the coupon collector problem would have added a polynomial factor, but this is not the case here thanks to QRACM (Remark 6).    \(\square \)

In the QRACM model, we have the following corollary for merging and filtering two lists of equal size. This result will be helpful in Sect. 4.3 and 5.

Corollary 1

Consider two lists \(L_1, L_2\) of size \(|L_1| = |L_2| = |L|\) exponential in n. We merge \(L_1\) and \(L_2\) with a modular condition of cn bits, and filter with a probability p. Assume that \(2^{cn} < |L|\). Then \(L^f\) can be written down in quantum time: \( \mathcal {O} \left( \sqrt{p} \frac{|L|^2}{2^{cn}} \right) \).

Proof

We do a quantum search to find each element of \(L^f\). We have \(t_{L_1} = \mathcal {O} \left( 1 \right) \) since it is a mere QRACM query, and we use Lemma 4.    \(\square \)

4.2 Revisiting HGJ

We now introduce our new algorithm for subset-sum in the QRACM model.

Our starting point is the HGJ algorithm. Similarly to  [33], we use a merging tree in which the lists at a given level may have different sizes. Classically, this does not improve the time complexity. However, quantumly, we will use quantum filtering. Since our algorithm does not require to write data in superposition, only to read from classical registers with quantum random access, we require only QRACM instead of QRAQM.

In the following, we consider that all lists, except \(L_0^3, L_0^2, L_0^1, L^0\), are built with classical merges. The final list \(L^0\), containing (expectedly) a single element, and a branch leading to it, are part of a nested quantum search. Each list \(L_0^3, L_0^2, L_0^1, L^0\) corresponds either to a search space, the solutions of a search, or both. We represent this situation on Fig. 3. Our procedure runs as follows:

  1. 1.

    (Classical step): build the intermediate lists \(L_1^3, L_1^2, L_1^1\) and store them using a unique modulus list data structure (Definition 5).

  2. 2.

    (Quantum step): do a quantum search on \(L_0^3\). To test a vector \(\mathbf {e} \in L_0^3\):

    • Find \(\mathbf {e}_3 \in L_1^3\) such that \(\mathbf {e} + \mathbf {e}_3\) passes the \(c^2_0n\)-bit modular constraint (assume that there is at most one such solution). There is no filtering here.

    • Find \(\mathbf {e}_2 \in L_1^2\) such that \((\mathbf {e} + \mathbf {e}_3) + \mathbf {e}_2\) passes the additional \((c^1-c^2_0)n\)-bit constraint.

    • If it also passes the filtering step, find \(\mathbf {e}_1 \in L_1^1\) such that \((\mathbf {e} + \mathbf {e}_3 + \mathbf {e}_2) + \mathbf {e}_1\) is a solution to the knapsack problem (and passes the filter).

Structural constraints are imposed on the tree, in order to guarantee that there exists a knapsack solution. The only difference between the quantum and classical settings is in the optimization goal: the final time complexity.

Fig. 3.
figure 3

Quantum HGJ algorithm. Dotted lists are search spaces (they are not stored). Bold lists are stored in QRACM. In Sect. 4.3, \(L_2^2\) and \(L_3^2\) are also stored in QRACM.

Structural Constraints. We now introduce the variables and the structural constraints that determine the shape of the tree in Fig. 3. The asymmetry happens both in the weights at level 0 and at the constraints at level 1 and 2. We write \(\ell _i^j = (\log _2 | L_i^j|) / n\). With the lists built classically, we expect a symmetry to be respected, so we have: \(\ell _2^3 = \ell _3^3\), \(\ell _4^3 = \ell _5^3 = \ell _6^3 = \ell _7^3\), \(\ell _2^2 = \ell _3^2\). We also tweak the left-right split at level 0: lists from \(L_2^3\) to \(L_7^3\) have a standard balanced left-right split; however, we introduce a parameter r that determines the proportion of positions set to zero in list \(L_0^3\): in \(L_0^3\), the vectors weigh \(cn(1-r)\) on a support of size \(n(1-r)\), instead of cn/2 on a support of size n/2. In total we have \(c + b + 2a = \frac{1}{2}\), as the weight of the solution is supposed to be exactly n/2.

Then we note that:

  • The lists at level 3 have a maximal size depending on the corresponding weight of their vectors:

    $$ \ell _0^3 \le h(c) (1-r),~~ \ell _1^3 \le h(c) r,~~ \ell _2^3 = \ell _3^3 \le h(b)/2,~~ \ell _4^3 \le h(a)/2 $$
  • The lists at level 2 cannot contain more representations than the filtered list of all subknapsacks of corresponding weight:

    $$ \ell _0^2 \le h(c) - c_0^2,~~ \ell _1^2 \le h(b) - c_0^2,~~ \ell _2^2 = \ell _3^2 \le h(a) - c_1^2 $$
  • Same at levels 1 and 0:    \( \ell _0^1 \le h(c + b) - c^1,~~ \ell _1^1 \le h(2a) - c^1 \)

  • The merging at level 2 is exact (there is no filtering):

    $$ \ell _0^2 = \ell _0^3 + \ell _1^3 - c_0^2,~~ \ell _1^2 = \ell _2^3 + \ell _3^3 - c_0^2,~~ \ell _2^2 = \ell _4^3 + \ell _5^3 - c_1^2, ~~\ell _3^2 = \ell _6^3 + \ell _7^3 - c_1^2, $$
  • At level 1, with a constraint \(c^1 \ge c_0^2, c_1^2\) that subsumes the previous ones:

    $$ \ell _0^1 = \ell _0^2 + \ell _1^2 - c^1 + c_0^2 + \mathsf {pf}_1 \left( b, c \right) ~,~~~~ \ell _1^1 = \ell _2^2 + \ell _3^2 - c^1 + c_1^2 + \mathsf {pf}_1 \left( a, a \right) $$
  • And finally at level 0:    \( \ell ^0 = 0 = \ell _0^1 + \ell _1^1 - (1-c^1) + \mathsf {pf}_1 \left( b +c, 2a \right) \)

Classical Optimization. All the previous constraints depend on the problem, not on the computation model. Now we can get to the time complexity in the classical setting, that we want to minimize:

$$\begin{aligned}&\quad \ \max \big (\ell _4^3, \ell _2^3, \ell _1^3, \ell _2^3 + \ell _3^3 - c_0^2, \ell _4^3 + \ell _5^3 - c_1^2, \ell _2^2 + \ell _3^2 - c^1 + c_1^2, \\&\ell _0^3 + \max (\ell _1^3 - c_0^2, 0) + \max (\ell _1^2 - c^1 + c_0^2,0 )+\max (\mathsf {pf}_1 \left( b, c \right) +\ell ^1_1-(1-c^1), 0) \big ) . \end{aligned}$$

The last term corresponds to the exhaustive search on \(\ell _0^3\). In order to keep the same freedom as before, it is possible that an element of \(L_0^3\) matches against several elements of \(L_1^3\), all of which yield a potential solution that has to be matched against \(L_1^2\), etc. Hence for each element of \(L_0^3\), we find the expected \(\max (\ell _1^3 - c_0^2, 0)\) candidates matching the constraint \(c_0^1\). For each of these candidates, we find the expected \(\max (\ell _1^2 - c^1 + c_0^2,0)\) candidates matching the constraint \(c^1\). For each of these candidates, if it passes the filter, we search for a collision in \(L_1^1\); this explains the \(\max (\mathsf {pf}_1 \left( b, c \right) +\ell ^1_1-(1-c^1), 0) \) term. In the end, we check if the final candidates pass the filter on the last level.

We verified that optimizing the classical time under our constraints gives the time complexity of HGJ.

Quantum Optimization. The time complexity for producing the intermediate lists is unchanged. The only difference is the way we find the element in \(L_0^3\) that will lead to a solution, which is a nested sequence of quantum searches.

  • We can produce the superposition of all elements in \(L_0^2\) in time

    $$ t_{2} = \frac{1}{2} \max (c_0^2 - \ell _1^3, 0) $$
  • By Lemma 4, we can produce the superposition of all elements in \(L_0^1\) in time

    $$ t_2 - \frac{1}{2} \mathsf {pf}_1 \left( b, c \right) + \frac{1}{2} \max \left( c^1 - c_0^2 - \ell _1^2, 0 \right) $$
  • Finally, we expect that there are \( \left( \ell _0^2 + \ell _1^2 - c^1 + c_0^2 + \mathsf {pf}_1 \left( b, c \right) \right) \) elements in \(L_0^1\), which gives the number of iterations of the quantum search.

The time of this search is:

$$ \frac{1}{2} \left( \ell _0^1 + \max (c_0^2 - \ell _1^3, 0) - \mathsf {pf}_1 \left( b, c \right) + \max \left( c^1 - c_0^2 - \ell _1^2, 0 \right) \right) $$

and the total time complexity is:

$$\begin{aligned}&\max \big (\ell _4^3, \ell _2^3, \ell _1^3, \ell _2^3 + \ell _3^3 - c_0^2, \ell _4^3 + \ell _5^3 - c_1^2, \ell _2^2 + \ell _3^2 - c^1 + c_1^2, \\&\qquad \qquad \qquad \ \frac{1}{2} \left( \ell _0^1 + \max (c_0^2 - \ell _1^3, 0) - \mathsf {pf}_1 \left( b, c \right) + \max \left( c^1 - c_0^2 - \ell _1^2, 0 \right) \right) \big ) \end{aligned}$$

We obtain a quantum time complexity exponent of 0.2374 with this method (the detailed parameters are given in Table 2).

4.3 Improvement via Quantum Filtering

Let us keep the tree structure of Fig. 3 and its structural constraints. The final quantum search step is already made efficient with respect to the filtering of representations, as we only pay half of the filtering term \(\mathsf {pf}_1 \left( b, c \right) \). However, we can look towards the intermediate lists in the tree, i.e., \(L_1^3, L_1^2, L_1^1\). The merging at the first level is exact: due to the left-right split, there is no filtering of representations, hence the complexity is determined by the size of the output list. However, the construction of \(L_1^1\) contains a filtering step. Thus, we can use Corollary 1 to produce the elements of \(L_1^1\) faster and reduce the time complexity from: \(\ell _2^2 + \ell _3^2 - c^1 + c_1^2\) to: \(\ell _2^2 + \ell _3^2 - c^1 + c_1^2 + \frac{1}{2} \mathsf {pf}_1 \left( a, a \right) \). By optimizing with this time complexity, we obtain a time exponent 0.2356 (the detailed parameters are given in Table 2). The corresponding memory is 0.2356 (given by the list \(L_1^3\)).

Table 2. Optimization results for the quantum asymmetric HGJ algorithm (in \(\log _2\) and relative to n), rounded to four digits. The time complexity is an upper bound.

Remark 7

(More improvements). We have tried increasing the tree depth or changing the tree structure, but it does not seem to bring any improvement. In theory, we could allow for more general representations involving “−1” and “2”. However, computing the filtering probability, when merging two lists of subknapsacks in \(D^n[\alpha , \beta , \gamma ]\) with different distributions becomes much more technical. We managed to compute it for \(D^n[\alpha , \beta ]\), but the number of parameters was too high for our numerical optimizer, which failed to converge.

4.4 Quantum Time-Memory Tradeoff

In the original HGJ algorithm, the lists at level 3 contain full distributions \(D^{n/2}[0, 1/8]\). By reducing their sizes to a smaller exponential, one can still run the merging steps, but the final list \(L^0\) is of expected size exponentially small in n. Hence, one must redo the tree many times. This general time-memory tradeoff is outlined in  [21] and is also reminiscent of Schroeppel and Shamir’s algorithm  [37], which can actually be seen as repeating \(2^{n/4}\) times a merge of lists of size \(2^{n/4}\), that yields \(2^{-n/4}\) solutions on average.

Asymmetric Tradeoff. The tradeoff that we propose is adapted to the QRACM model. It consists in increasing the asymmetry of the tree: we reduce the sizes of the intermediate lists \(L_1^3, L_1^2, L_1^1\) in order to use less memory; this in turn increases the size of \(L^3_0, L^2_0\) and \(L^1_0\) in order to ensure that a solution exists. We find that this tradeoff is close to the time-memory product curve \(T M = 2^{n/2}\), and actually slightly better (the optimal point when \(m = 0.2356\) has \(T M = 2^{0.4712 n}\)). This is shown on Fig. 4. At \(m = 0\), we start at \(2^{n/2}\), where \(L_0^3\) contains all vectors of Hamming weight n/2.

Fact 1

For any memory constraint \(m \le 0.2356\) (in \(\log _2\) and proportion of n), the optimal time complexity in the quantum asymmetric HGJ algorithm of Sect. 4.3 is lower than \( \widetilde{\mathcal {O}} \left( 2^{n/2 - m} \right) \).

Fig. 4.
figure 4

Quantum time-memory tradeoff of the asymmetric HGJ algorithm

Improving the QRACM Usage. In trying to reduce the quantum or quantum-accessible hardware used by our algorithm, it makes sense to draw a line between QRACM and classical RAM, i.e., between the part of the memory that is actually accessed quantumly, and the memory that is used only classically. We now try to enforce the constraint only on the QRACM, using possibly more RAM. In this context, we cannot produce the list \(L_1^1\) via quantum filtering. The memory constraint on lists \(L_1^3, L_1^2, L_1^1\) still holds; however, we can increase the size of lists \(L_4^3, L_5^3, L_6^3, L_7^3, L_2^2, L_3^2\).

Fact 2

For any QRACM constraint \(m \le 0.2356\), the optimal time complexity obtained by using more RAM is always smaller than the best optimization of Sect. 4.3.

The difference remains only marginal, as can be seen in Table 3, but it shows a tradeoff between quantum and classical resources.

Table 3. Time-memory tradeoffs (QRACM) for three variants of our asymmetric HGJ algorithm, obtained by numerical optimization, and rounded upwards. The last variant uses more classical RAM than the QRACM constraint.

5 New Algorithms Based on Quantum Walks

In this section, we improve the algorithm by Helm and May  [18] based on BCJ and the MNRS quantum walk framework. Our algorithm is a quantum walk on a product Johnson graph, as in Sect. 3.2. There are two new ideas involved.

5.1 Asymmetric 5th Level

In our new algorithm, we can afford one more level than BCJ. We then have a 6-level merging tree, with levels numbered 5 down to 0. Lists at level i all have the same size \(\ell _i\), except at level 5. Recall that the merging tree, and all its lists, is the additional data structure attached to a node in the Johnson graph. In the original algorithm of  [18], there are 5 levels, and a node is a collection of 16 lists, each list being a subset of size \(\ell _4\) among the \(g(1/16+\alpha _3,\alpha _3)/2\) vectors having the right distribution.

In our new algorithm, at level 5, we separate the lists into “left” lists of size \(\ell _5^l\) and “right” lists of size \(\ell _5^r\). The quantum walk will only be performed on the left lists, while the right ones are full enumerations. Each list at level 4 is obtained by merging a “left” and a “right” list. The left-right-split at level 5 is then asymmetric: vectors in one of the left lists \(L_5^l\) are sampled from \(D^{\eta n}[\alpha _4, 1/32, \gamma _4] \times \{0^{(1 -\eta ) n}\}\) and the right lists \(L_5^r\) contain all the vectors from \(\{0^{\eta n}\} \times D^{(1-\eta ) n}[\alpha _4, 1/32, \gamma _4]\). This yields a new constraint: \(\ell _5^r = f(1/32+\alpha _4-2\gamma _4, \alpha _4, \gamma _4) (1 - \eta )\).

While this asymmetry does not bring any advantage classically, it helps in reducing the update time. We enforce the constraint \(\ell _5^r = c_4\), so that for each element of \(L_5^l\), there is on average one matching element in \(L_5^r\). So updating the list \(L_4\) at level 4 is done on average time 1. Then we also have \(\ell _4 = \ell _5^l\).

With this construction, \(\ell _5^r\) and \(\ell _5^l\) are actually unneeded parameters. We only need the constraints \(c_4 (= \ell _5^r) = f(1/32+\alpha _4-2\gamma _4, \alpha _4, \gamma _4) (1 - \eta )\) and \(\ell _4 (= \ell _5^l) \le f(1/32+\alpha _4-2\gamma , \alpha _4, \gamma _4) \eta \). The total setup time is now:

figure b

and the expected update time for level 5 (inserting a new element in a list \(L_5^l\) at the bottom of the tree) and at level 4 (inserting a new element in \(L_4\)) is 1. The spectral gap of the graph is \(\delta = - \ell _5^l \) and the proportion of marked vertices is \(\epsilon = -\ell _0\).

Saturation Constraints. In the quantum walk, we have \(\ell _0 < 0\), since we expect only some proportion of the nodes to be marked (to contain a solution). This proportion is hence \(\ell _0\). The saturation constraints are modified as follows:

$$\begin{aligned} \begin{array}{ll} \ell _5^l \le \frac{\ell _0}{16} + f(\frac{1}{32} + \alpha _4 - 2\gamma _4, \alpha _4, \gamma _4) \eta , &{} \ell _4 \le \frac{\ell _0}{16} + f(\frac{1}{32} + \alpha _4 - 2\gamma _4, \alpha _4, \gamma _4) - c_4\\ \ell _3 \le \frac{\ell _0}{8} + f(\frac{1}{16} + \alpha _3 - 2\gamma _3, \alpha _3, \gamma _3) - c_3, &{} \ell _2 \le \frac{\ell _0}{4} + f(\frac{1}{8} + \alpha _2 - 2\gamma _2, \alpha _2, \gamma _2) - c_2 \\ \ell _1 \le \frac{\ell _0}{2} + f(1/4 + \alpha _1 - 2\gamma _1, \alpha _1, \gamma _1) - c_1 &{} \\ \end{array} \end{aligned}$$

Indeed, the classical walk will go through a total of \(-\ell _0\) trees before finding a solution. Hence, it needs to go through \(-\ell _0/16\) different lists at level 5 (and 4), which is why we need to introduce \(\ell _0\) in the saturation constraint: there must be enough elements, not only in \(L_5^l\), but in the whole search space that will be spanned by the walk. These constraints ensure the existence of marked vertices in the walk.

5.2 Better Setup and Updates Using Quantum Search

Along the lines of Lemma 4 and Corollary 1, we now show how to use a quantum search to speed up the Setup and Update steps in the quantum walk. As the structure of the graph is unchanged, we still have \(\epsilon = - \ell _0\) and a spectral gap \(\delta = - \ell _5^l\).

Setup. Let \(p_i, (1 \le i \le 3)\) be the filtering probabilities at level i, i.e., the (logarithms of the) probabilities that an element that satisfies the modulo condition resp. at level i also has the desired distribution of 0s, 1s, \(-1\)s and 2s, and appears in list \(L_i\). Notice that \(p_i \le 0\). Due to the left-right split, there is no filtering at level 4.

We use quantum filtering (Corollary 1) to speed up the computation of lists at levels 3, 2 and 1 in the setup, reducing in general a time \(2 \ell - c\) to \(2 \ell -c + \mathsf {pf}/2\). It does not apply for level 0, since \(L_0\) has a negative expected size. At this level, we will simply perform a quantum search over \(L_1\). If there is too much constraint, i.e., \((1-c_1) > \ell _1\), then for a given element in \(L_1\), there is on average less than one modular candidate. If \((1-c_1) < \ell _1\), there is on average more than one (although less than one with the filter) and we have to do another quantum search on them all. This is why the setup time at level 0, in full generality, becomes \((\ell _1 + \max (\ell _1 - (1-c_1), 0))/2\). The setup time can thus be improved to:

figure c

Update. Our update will also use a quantum search. First of all, recall that the updates of levels 5 and 4 are performed in (expected) time 1. Having added an element in \(L_4\), we need to update the upper level. There are on average \(\ell _4 - (c_3 - c_4)\) candidates satisfying the modular condition. To avoid a blowup in the time complexity, we forbid to have more than one element inserted in \(L_3\) on average, which means: \(\ell _4 - (c_3 - c_4) + p_3 \le 0 \iff \ell _3 \le \ell _4\). We then find this element, if it exists, with a quantum search among the \(\ell _4 - (c_3 - c_4)\) candidates.

Similarly, as at most one element is updated in \(L_3\), we can move on to the upper levels 2, 1 and 0 and use the same argument. We forbid to have more than one element inserted in \(L_2\) on average: \(\ell _3 - (c_2 - c_3) + p_2 \le 0 \iff \ell _2 \le \ell _3\), and in \(L_1\): \(\ell _1 \le \ell _2\). At level 0, a quantum search may not be needed, hence a time \(\max (\ell _1 - (1 - c_1), 0)/2\). The expected update time becomes:

figure d

5.3 Parameters

Using the following parameters, we found an algorithm that runs in time \( \widetilde{\mathcal {O}} \left( 2^{0.2156n} \right) \):

figure e

There are many different parameters that achieve the same time. The above set achieves the lowest memory that we found, at \( \widetilde{\mathcal {O}} \left( 2^{0.2110n} \right) \). Note that time and memory complexities are different in this quantum walk, contrary to previous works, since the update procedure has now a (small) exponential cost.

Remark 8

(Time-memory tradeoffs). Quantum walks have a natural time-memory tradeoff which consists in reducing the vertex size. Smaller vertices have a smaller chance of being marked, and the walk goes on for a longer time. This is also applicable to our algorithms, but requires a re-optimization with a memory constraint.

6 Mitigating Quantum Walk Heuristics for Subset-Sum

In this section, we provide a modified quantum walk NEW-QW for any quantum walk subset-sum algorithm QW, including [6, 18] and ours, that will no longer rely on Heuristic 2. In NEW-QW, the Johnson graph is the same, but the vertex data structure and the update procedure are different (Sect. 6.2). It allows us to guarantee the update time, at the expense of losing some marked vertices. In Sect. 6.3, we will show that most marked vertices in QW remain marked.

6.1 New Data Structure for Storing Lists

The main requirement of the vertex data structure is to store lists of subknapsacks with modular constraints in QRAQM. For each list, we will use two data structures. The first one is the combination of a hash table and a skip list given in  [1] (abbreviated skip list below) and the second one is a Bucket-modulus list data structure, adapted from Definition 5, that we define below.

Hash Table and Skip List. We use the data structure of  [1] to store lists of entries \((\mathbf {e}, \mathbf {e} \cdot \mathbf {a})\), sorted by knapsack value \(\mathbf {e} \cdot \mathbf {a}\). The data structure for M entries, that we denote \(\mathcal {SL}(M)\), uses \( \widetilde{\mathcal {O}} \left( M \right) \) qRAM memory cells and supports the following operations: inserting an entry in the list, deleting an entry from the list and producing the uniform superposition of entries in the list. All these operations require time \(\mathsf {polylog}(M)\).

We resort to this data structure because the proposal of “radix trees” in  [6] is less detailed. It is defined relatively to a choice of \(\mathsf {polylog}(M) = \mathrm {poly}(n)\) hash functions selected from a family of independent hash functions of the entries (we refer to  [1] for more details). For a given choice of hash functions, the insertion or deletion operations can fail. Thus, the data structure is equipped with a superposition of such choices. Instead of storing \(\mathcal {SL}(M)\), we store: \(\sum _h \left| h \right\rangle \left| \mathcal {SL}_h(M) \right\rangle \) where \(\mathcal {SL}_h\) is the data structure flavored with the choice of hash functions h. Insertions and deletions are performed depending on h. This allows for a globally negligible error: if sufficiently many hash functions are used, the insertion and deletion of any element add a global error vector of amplitude \(o(2^{-n})\) regardless of the current state of the data. The standard “hybrid argument” from  [5] and  [1, Lemma 5] can then be used in the context of an MNRS quantum walk.

Proposition 1

( [1], Lemma 5, adapted). Consider an MNRS quantum walk with a “perfect” (theoretical) update unitary U, managing data structures, and an “imperfect” update unitary \(U'\) such that, for any basis state \(\left| x \right\rangle \):

$$ U' \left| x \right\rangle = U \left| x \right\rangle + \left| \delta _x \right\rangle $$

where \(\left| \delta _x \right\rangle \) is an error vector of amplitude bounded by \(o(2^{-n})\) for any x. Then running the walk with \(U'\) instead of U, after T steps, the final “imperfect” state \(\left| \psi ' \right\rangle \) deviates from the “perfect” state \(\left| \psi \right\rangle \) by: \(\Vert \left| \psi ' \right\rangle - \left| \psi \right\rangle \Vert \le o(2^{-n} T) \).

This holds as a general principle: in the update unitary, any perfect procedure can be replaced by an imperfect one as long as its error is negligible (with respect to the total number of updates) and data-independent. In contrast, the problem with Heuristic 2 is that a generic constant-time update induces data-dependent errors (bad cases) that do not seem easy to overcome.

Bucket-modulus List. Let \(B = \mathrm {poly}(n)\) be a “bucket size” that will be chosen later. The bucket-modulus list is a tool for making our update time data-independent: it limits the number of vectors that can have a given modulus (where moduli are of the same order as the list size).

Definition 6

(Bucket-modulus list). A Bucket-modulus list \(\mathcal {BL}(B, M)\) is a qRAM data structure that stores at most \(B \times M\) entries \((\mathbf {e}, \mathbf {e} \cdot \mathbf {a})\), with at most B entries sharing the same modulus \(\mathbf {e} \cdot \mathbf {a} \mod M\). Thus, \(\mathcal {BL}(B, M)\) contains M “buckets”. Buckets are indexed by moduli, and kept sorted. It supports the following operations:

  • Insertion: insert \((\mathbf {e}, \mathbf {e} \cdot \mathbf {a})\). If the bucket at index \(\mathbf {e} \cdot \mathbf {a} \mod M\) contains B elements, empty the bucket. Otherwise, sort it using a simple sorting circuit.

  • Deletion: remove an entry from the corresponding bucket.

  • Query in superposition: similar as in Definition 5.

In our new quantum walks, each list will be stored in a skip list \(\mathcal {SL}(M)\) associated with a bucket-modulus \(\mathcal {BL}(B, M)\). Each time we insert or delete an element from \(\mathcal {SL}(M)\), we update the bucket-modulus list accordingly, according to the following rules.

Upon deletion of an element \(\mathbf {e}\) in \(\mathcal {SL}(M)\), let \(\mathbf {e} \cdot \mathbf {a} = T \mod M\), there are three cases for \(\mathcal {BL}(B, M)\):

  • If \(|\{ \mathbf {e'} \in \mathcal {SL}(M), \mathbf {e'} \cdot \mathbf {a} = T \}| > B+1\), then bucket number T in \(\mathcal {BL}(B,M)\) stays empty;

  • If \(|\{ \mathbf {e'} \in \mathcal {SL}(M), \mathbf {e'} \cdot \mathbf {a} = T \}| = B + 1\), then removing \(\mathbf {e}\) makes the number of elements reach the bound B, so we add them all in the bucket at index T;

  • If \(|\{ \mathbf {e'} \in \mathcal {SL}(M), \mathbf {e'} \cdot \mathbf {a} = T \}| \le B\), then we remove \(\mathbf {e}\) from its bucket.

Upon insertion of an element \(\mathbf {e}\) in \(\mathcal {SL}(M)\), there are also three cases for \(\mathcal {BL}(B, M)\):

  • If \(|\{ \mathbf {e'} \in \mathcal {SL}(M), \mathbf {e'} \cdot \mathbf {a} = T \}| = B\), then we empty the bucket at index T;

  • If \(|\{ \mathbf {e'} \in \mathcal {SL}(M), \mathbf {e'} \cdot \mathbf {a} = T \}| < B\), then we add \(\mathbf {e}\) to the bucket at index T in \(\mathcal {BL}(B,M)\);

  • If \(|\{ \mathbf {e'} \in \mathcal {SL}(M), \mathbf {e'} \cdot \mathbf {a} = T \}| > B\), then the bucket is empty and remains empty.

In all cases, there are at most B insertions or deletions in a single bucket. Note that \(\mathcal {BL}(B,M) \subseteq \mathcal {SL}(M)\) but that some elements of \(\mathcal {SL}(M)\) will be dropped.

Remark 9

The mapping from a skip list of size M (considered as perfect), which does not “forget” any of its elements, to a corresponding bucket-modulus list with M buckets, which forgets some of the previous elements, is deterministic. Given a skip list L, a corresponding bucket modulus list \(L'\) can be obtained by inserting all elements of L into an empty bucket modulus list.

6.2 New Data Structure for Vertices

The algorithms that we consider use multiple levels of merging. However, we will focus only on a single level. Our arguments can be generalized to any constant number of merges (with an increase in the polynomial factors involved). Recall that the product Johnson graph on which we run the quantum walk is unchanged, only the data structure is adapted.

In the following, we will consider the merging of two lists \(L_l\) and \(L_r\) of subknapsacks of respective sizes \(\ell _l\) and \(\ell _r\), with a modular constraint c and a filtering probability \(\mathsf {pf}\). The merged list is denoted \(L^c= L_l \bowtie _c L_r\) and the filtered list is denoted \(L^f\). We assume that pairs \((\mathbf {e_1}, \mathbf {e_2})\) in \(L^c\) must satisfy \((\mathbf {e_1} + \mathbf {e_2}) \cdot \mathbf {a} = 0 \mod 2^{cn}\) (the generalization to any value modulo any moduli is straightforward).

On the positive side, our new data structure can be updated, by design, with a fixed time that is data-independent. On the negative side, we will not build the complete list \(L^f\), and miss some of the solutions. As we drop a fraction of the vectors, some nodes that were previously marked will potentially appear unmarked, but this fraction is polynomial at most. We defer a formal proof of this fact to Sect. 6.3 and focus on the runtime.

We will focus on the case where \(\ell _l = \ell _r\) and either \(L_l\) or \(L_r\) are updated, which happens at all levels in our quantum walk, except the first level. Because there is no filtering at the first level, it is actually much simpler to study with the same arguments. In previous quantum walks, we had \(\ell ^c = 2\ell - c \le \ell \), i.e. \(\ell \le c\); now we will have \(2\ell - c \ge \ell \) and \(2 \ell - c + \mathsf {pf}\le \ell \).

Recall that our heuristic time complexity analysis assumes an update time \((\ell -c)/2\). Indeed, the update of an element in \(L_l\) or \(L_r\) modifies on average \((\ell -c)\) elements in \(L_l \bowtie _c L_r\), among which we expect at most one filtered pair \((\mathbf {e_1}, \mathbf {e_2})\) (by the inequality \(2 \ell - c + \mathsf {pf}\le \ell \)). We find this solution with a quantum search. In the following, we modify the data structure of vertices in order to guarantee the best update time possible, up to additional polynomial factors. We will see however that it does not reach \((\ell -c)/2\). We now define our intermediate lists and sublists, before giving the update procedure and its time complexity.

Definitions. Both lists \(L_l, L_r\) are of size \(M \simeq 2^{\ell n}\). We store them in skip lists. In both \(L_r\) and \(L_l\), for each \(T \le M\), we expect on average only one element \(\mathbf {e}\) such that \(\mathbf {e} \cdot \mathbf {a} = T \mod M\). We introduce two Bucket-modulus lists (Definition 6) \(L_l'(B,M)\) and \(L_r'(B,M)\) that we will write as \(L_l'\) and \(L_r'\) for simplicity, indexed by \(\mathbf {e} \cdot \mathbf {a} \mod M\), with an arbitrary bound \(B = \mathrm {poly}(n)\) for the bucket sizes. They are attached to \(L_l\) and \(L_r\) as detailed in Sect. 6.1. When an element in \(L_l\) or \(L_r\) is modified, they are modified accordingly.

In \(L_l'\) and \(L_r'\), we consider the sublists of subknapsacks having the same modulo \(C \mod 2^{cn}\), and we denote by \(L_{l,C}'\) and \(L_{r,C}'\) these sublists. They can be easily considered separately since the vectors are sorted by knapsack weight. By design of the bucket-modulus lists, \(L_{l,C}'\) and \(L_{r,C}'\) both have size at most \(B 2^{(\ell -c)n}\). We have:

$$ L_l' \bowtie _c L_r' = \bigcup _{0 \le C \le 2^{cn} - 1} L_{l,C}' \times L_{r,C}' .$$

Next, we have a case disjunction to make. The most complicated case is when \(2\ell - 2c + \mathsf {pf}> 0\), that is, each product \(L_{l,C}' \times L_{r,C}'\) for a given C yields more than one filtered pair on average. In that case, we define sublists \(L_{l,C,i}'\) of \(L_{l,C}'\) and sublists \(L_{r,C,j}'\) of \(L_{r,C}'\) using a new arbitrary modular constraint, so that each of these sublists is of size \(-\mathsf {pf}/2\) (at most). There are \(\ell -c + \mathsf {pf}/2\) sublists (exactly). The rationale of this cut is that a product \(L_{l,C,i}' \times L_{r,C,j}'\) for a given ij now yields on average a single filtered pair (or less). When \(2\ell - 2c + \mathsf {pf}\le 0\), we don’t perform this last cut and consider the product \(L_{l,C}' \times L_{r,C}'\) immediately. By a slight abuse of notation, we denote: \((L_{l,C,i}' \times L_{r,C,j}')^f\) the set of filtered pairs from \(L_{l,C,i}' \times L_{r,C,j}'\), and we have:

$$ L^f = \bigcup _{0 \le C \le 2^{cn} - 1} \bigcup _{i,j}(L_{l,C,i}' \times L_{r,C,j}')^f . $$
figure f

Algorithm and Complexity. Algorithm 1 details our update procedure. We now compute its time complexity and explain why it remains data-independent. Recall that we want to avoid the “bad cases” where an update goes on for too long: this is the case where an update in \(L_l\) (or \(L_r\)) creates too many updates in \(L^f\). In Algorithm 1, we avoid this by deliberately limiting the number of elements that can be updated. We can see that \(L^f\) will be smaller than the “perfect” one for two reasons: \(\bullet \) the bucket-modulus data structure loses some vectors, since the buckets are dropped when they overflow. \(\bullet \) filtered pairs are lost. Indeed, the algorithm ensures that in \(L^f\), at most B solutions \(\mathbf {e}_l + \mathbf {e}_r\) come from a cross-product \(L_{l,C,i}' \times L_{r,C,j}'\).

This makes the update procedure history-independent and its time complexity data-independent. Indeed:

Lemma 5

The state of the data structures \(L_l, L_r, L^f\) after Algorithm 1 depends only on \(L_l, L_r, L^f\) before and on the element that was inserted/deleted.

We omit a formal proof, as it follows from our definition of the bucket-modulus list and of Algorithm 1.

Lemma 6

With a good choice of B, Algorithm 1 runs with a data-independent error in \(o(2^n)\). The time complexity of Algorithm 1 is \( \widetilde{\mathcal {O}} \left( 2^{(\ell -c)n} \right) \) and an update modifies \( \widetilde{\mathcal {O}} \left( 2^{\max (\ell -c + \mathsf {pf}/2, 0)n} \right) \) elements in the filtered list \(L^f\) at the next level (respectively \(\ell -c\) and \(\max (\ell -c + \mathsf {pf}/2, 0)\) in log scale).

Proof

We check step by step the time complexity of Algorithm 1:

  • Insertion and deletion from the skip list for \(L_l\) is done in \(\mathrm {poly}(n)\), with a global error that can be omitted.

  • The bucket-modulus list \(L_l'\) is updated in time \( \mathcal {O} \left( B \right) = \mathrm {poly}(n)\) without errors. At most B elements must be inserted or removed.

  • For each insertion or removal in \(L_l'\), we select the corresponding sublist \(L_{l,C,i}'\) (or simply \(L_{l,C}'\) if \(2\ell - 2c + \mathsf {pf}\le 0\)). We look at the sublists \(L_{r,C,j}'\) and we estimate the number of filtered pairs in the products \(L_{l,C,i}' \times L_{r,C,j}'\) (of size \(-\mathsf {pf}\)), checking whether it is smaller or bigger than B. We explain in [8, Appendix C] how to do that reversibly in time \( \widetilde{\mathcal {O}} \left( B \times 2^{-\mathsf {pf}n/2} \right) \) (\(-\mathsf {pf}/2\) in log scale). There are \(\ell -c + \mathsf {pf}/2\) classical iterations, thus the total time is \(\ell -c\).

  • Depending whether we have found more or less than B filtered pairs, we will have to remove or to add all of them in \(L^f\). This means that \(B \times 2^{(\ell -c + \mathsf {pf}/2)n}\) insertion or deletion instructions will be passed over to \(L^f\).

There are two sources of data-independent errors: first, the skip list data structure (see Sect. 6.1). Second, the procedure of [8, Appendix C]. Both can be made exponentially small at the price of a polynomial overhead. Note that B will be set in order to get a sufficiently small probability of error (see the next section), and can be a global \( \mathcal {O} \left( n \right) \). However, the polynomial overhead of our update unitary grows with the number of levels.    \(\square \)

6.3 Fraction of Marked Vertices

Now that we have computed the update time of NEW-QW, it remains to compute its fraction \(\epsilon _{new}\) of marked vertices. We will show that \(\epsilon _{new}= \epsilon \left( 1 - \frac{1}{\mathrm {poly}(n)} \right) \) with overwhelming probability on the random subset-sum instance, where \(\epsilon \) is the previous fraction in QW.

Consider a marked vertex in QW. There is a path in the data structure leading to the solution, hence a constant number of subknapsacks \(\mathbf {e_1}, \ldots , \mathbf {e_t}\) such that the vertex will remain marked if and only if none of them is “accidentally” discarded by our new data structure. Thus, if G is the graph of the walk, we want to upper bound:

figure g

We focus on some level in the tree, on a list L of average size \(2^{\ell n}\), and on a single vector \(\mathbf {e_0}\) that must appear in L. Subknapsacks in L are taken from \(\mathcal {B}\subseteq D^n[\alpha ,\beta , \gamma ]\). We study the event that \(\mathbf {e_0}\) is accidentally discarded from L. This can happen for two reasons:

  • we have \(|\{\mathbf {e} \in L, \mathbf {e}\varvec{\cdot }\mathbf {a} = \mathbf {e_0}\varvec{\cdot }\mathbf {a} \mod 2^{\ell n}\}| > B\): the vector is dropped at the bucket-modulus level;

  • at the next level, there are more than B pairs from some product of lists \(L_{l,C,i}' \times L_{r,C,j}'\) to which the vector \(\mathbf {e_0}\) belongs, that will pass the filter.

We remark the following to make our computations easier.

Fact 3

We can replace the L from our new data structure NEW-QW by a list of exact size \(2^{\ell n}\), which is a sublist from the list L in QW.

At successive levels, our new data structure discards more and more vectors. Hence, the actual lists are smaller than in QW. However, removing a vector \(\mathbf {e}\) from a list, if it does not unmark the vertex, does not increase the probability of unmarking it at the next level, since \(\mathbf {e}\) does not belong to the unique solution.

Fact 4

When a vertex in NEW-QW is sampled uniformly at random, given a list L at some merging level, we can assume that the elements of L are sampled uniformly at random from their distribution \(\mathcal {B}\) (with a modular constraint).

This fact translates Heuristic 1 as a global property of the Johnson graph. At the first level, nodes contain lists of exponential size which are sampled without replacement. However, when sampling with replacement, the probability of collisions is exponentially low. Thus, we can replace \(\Pr _{v \in G}\) by \(\Pr _{v \in G'}\) where \(G'\) is a “completed” graph containing all lists sampled uniformly at random with replacement. This adds only a negligible number of vertices and does not impact the probability of being discarded.

Number of Vectors Having the Same Modulus. Let \(N \simeq 2^n\) and M be a divisor of N. Given a particular \(\mathbf {e}_0\in \mathcal {B}\) and a vector \(\mathbf {a}\in \mathbb {Z}_{N}^n\),

$$ \text {For }\mathbf {e}\in \mathcal {B}\text {, define } X_\mathbf {e}(a)= {\left\{ \begin{array}{ll} 1 &{} \text { if } \mathbf {e}\varvec{\cdot }\mathbf {a} = \mathbf {e}_0 \varvec{\cdot }\mathbf {a} \pmod M\\ 0 &{} \text { otherwise} \end{array}\right. }$$

We prove the following Lemma in the full version of the paper  [8].

Lemma 7

If \(|\mathcal {B}|\gg M \simeq |L|\), then for a \(1-\mathrm {negl}(n)\) proportion of \(\mathbf {a} \in \mathbb {Z}_{N}^n\), and with an appropriate \(B = \mathcal {O} \left( n \right) \):

(1)

For the number of filtered pairs, we use the fact that the vectors at each level are sampled uniformly at random from their distribution. If this is the case, then a Chernoff bound (similar to the proof of Lemma 7) limits the deviation of the number of filtered pairs in \(L_{l,C,i}' \times L_{r,C,j}'\) from its expectation (which is 1 by construction): the probability that there are more than \(B + 1\) pairs is smaller than \(e^{- (B+1)/3}\). By taking a sufficiently big \(B = \mathcal {O} \left( n \right) \), we can take a union bound over all products of lists \(L_{l,C,i}' \times L_{r,C,j}'\) in which \(\mathbf {e_0}\) intervenes. We also take a union bound over the intermediate subknapsacks that we are considering. The loss of vertices remains inverse polynomial.

6.4 Time Complexities Without Heuristic 2

Previous quantum subset-sum algorithms  [6, 18] have the same time complexities without Heuristic 2, as they fall in parameter ranges where the bucket-modulus data structure is enough. However, this is not the case of our new quantum walk. We keep the same set of constraints and optimize with a new update time. Although using the extended \(\{-1,0,1,2\}\) representations brings an improvement, neither do the fifth level, nor the left-right split. This simplifies our constraints. Let \(\widehat{\mathrm {max}}(\cdot ) = \max (\cdot ,0)\). The guaranteed update time becomes:

figure h

We obtain the time exponent 0.2182 (rounded upwards) with the following parameters (rounded). The memory exponent is 0.2182 as well.

figure i

7 Conclusion

In this paper, we proposed improved classical and quantum heuristic algorithms for subset-sum, building upon several new ideas. First, we used extended representations (\(\{-1,0,1,2\}\)) to improve the current best classical and quantum algorithms. In the quantum setting, we showed how to use a quantum search to speed up the process of filtering representations, leading to an overall improvement on existing work. We built an “asymmetric HGJ” algorithm that uses a nested quantum search, leading to the first quantum speedup on subset-sum in the model of classical memory with quantum random access. By combining all our ideas, we obtained the best quantum walk algorithm for subset-sum in the MNRS framework. Although its complexity still relies on Heuristic 2, we showed how to partially overcome it and obtained the first quantum walk that requires only the classical subset-sum heuristic, and the best to date for this problem.

Open Questions. We leave as open the possibility to use representations with “−1”s (or even “2”s) in a quantum asymmetric merging tree, as in Sect. 4.3. Another question is how to bridge the gap between heuristic and non-heuristic quantum walk complexities. In our work, the use of an improved vertex data structure seems to encounter a limitation, and we may need a more generic result on quantum walks, similar to  [2]. Finally, it would be of interest to study representations with a larger set of integers.