We propose a new numerical algorithm for computing the tensor rank decomposition or canonical polyadic decomposition of higher-order tensors subject to a rank and genericity constraint. Reformulating this computational problem as a system of polynomial equations allows us to leverage recent numerical linear algebra tools from computational algebraic geometry. We characterize the complexity of our algorithm in terms of an algebraic property of this polynomial system—the multigraded regularity. We prove effective bounds for many tensor formats and ranks, which are of independent interest for overconstrained polynomial system solving. Moreover, we conjecture a general formula for the multigraded regularity, yielding a (parameterized) polynomial time complexity for the tensor rank decomposition problem in the considered setting. Our numerical experiments show that our algorithm can outperform state-of-the-art numerical algorithms by an order of magnitude in terms of accuracy, computation time, and memory consumption.
1 Introduction
We introduce an original direct numerical algorithm for tensor rank decomposition or canonical polyadic decomposition(CPD) in the low-rank regime. By “direct,” we mean an algorithm that does not rely on numerical optimization or other iteratively refined approximations with a data-dependent number of iterations.
Consider the vector space \(\mathbb {C}^{(n_1 + 1) \times \cdots \times (n_{d} + 1)}\) whose elements represent order-\(d\) tensors in coordinates relative to a basis. We say that a tensor in such a space is of rank 1 or elementary if it is of the following form:
If \(r\) is minimal among all such expressions of \(\mathcal {A}\), then \(r\) is called the rank of the tensor according to Hitchcock [1927], and Equation (CPD) is called a CPD. The problem of computing a CPD of \(\mathcal {A}\), i.e., determining a set of rank-1 tensors summing to \(\mathcal {A}\), has many applications in science and engineering, see for instance Kolda and Bader [2009] and Sidiropoulos et al. [2017].
The strategy we propose for computing (Equation (CPD)) relies on the fact that the problem is equivalent to solving a certain system of polynomial equations. Under suitable assumptions, these equations can be obtained from the nullspace of a flattening\(\mathcal {A}_{(1)} \in \mathbb {C}^{(n_1+1) \times \prod _{k=2}^d (n_k+1)}\) of the tensor \(\mathcal {A}\), as in Kuo and Lee [2018]. Once we have obtained these polynomial equations, whose solutions correspond to the rank-1 terms in Equation (CPD), we use recent numerical normal form techniques from Bender and Telen [2020] and Telen [2020b];, 2020a] to solve them. The following example, which is used as a running example throughout the article, illustrates how this works.
Contributions
We formulate our results for the case \(d = 3\), as the general case can be handled using the standard reshaping trick (Section 2.2). Our main contribution is a new numerical algorithm for computing the rank-\(r\) CPD of a third-order tensor \(\mathcal {A}\) based on linear algebra, under the assumption that \(\mathcal {A}\) is \(r\)-identifiable and a flattening of \(\mathcal {A}\) has rank \(r\), see assumption 2.1. We call this algorithm cpd_hnf. It is based on a well-known reformulation of the problem as a system of polynomial equations, which we solve using state-of-the-art methods from computational algebraic geometry. This results in 3.1. We show that this algorithm generalizes pencil-based algorithms [Beltrán et al. 2019; Leurgans et al. 1993; Lorber 1985; Sanchez and Kowalski 1990; De Lathauwer 2006] from very low ranks to much higher ranks in the unbalanced case; see theorem 3.13.23.3.
We give a new, explicit description of the complexity of the tensor rank decomposition problem in terms of an algebraic property of aforementioned polynomial system: the multigraded regularity of a non-saturated ideal in the homogeneous coordinate ring of \(\mathbb {P}^m \times \mathbb {P}^n\) (proposition 3.1). We characterize the regularity in terms of the rank of a structured matrix obtained directly from the rank-1 terms in Equation (CPD), see example 4.1. These new insights allow us to formulate a conjecture regarding the regularity (conjecture 1). We prove this conjecture for many formats, see theorem 4.2. These results are of independent interest for the field of polynomial system solving. They have the following consequence related to the complexity of our algorithm.
Our numerical experiments in Section 6 show that the proposed algorithm is highly efficient and gives accurate results (see Figure 2). For instance, we can compute the decomposition of a \(7 \times 7 \times 7 \times 7 \times 6 \times 6 \times 5 \times 5\) tensor of rank 1,000 in double-precision arithmetic with an accuracy of order \(10^{-15}\) in 441 s—a feat we believe has not been matched by other tensor decomposition algorithms. Moreover, cpd_hnf seems to behave well in the presence of noise.
The idea of computing tensor decompositions via polynomial root finding is central in apolarity-based approaches such as Bernardi et al. [2011];, 2013]; Bernardi and Taufer [2020], and Brachat et al. [2010]; Nie [2017] for the symmetric case. The Hankel operators play the role of normal forms in this context. These operators can be obtained partially from the tensor \(\mathcal {A}\). In most cases, an additional polynomial system needs to be solved to complete the Hankel operators [Mourrain 2018, Section 4]. For instance, this is step (2) in Algorithm 5.1 of Brachat et al. [2010]. Although our method works only under certain assumptions on the rank of \(\mathcal {A}\), in contrast to apolarity-based methods it requires only linear algebra computations, and it operates in polynomial rings with fewer variables. Moreover, the method from Brachat et al. [2010] uses connected-to-one bases for its normal form computations. The choice of such a basis is discussed at length in Sections 4.3 and 7.2 of Bernardi and Taufer [2020]. In this article, we exploit the flexibility of truncated normal forms [Telen et al. 2018] to achieve better numerical results. For a comparison, see Section 4.3.3 of Telen [2020b].
Kuo and Lee [2018] obtained an affine system of polynomial equations as in example 1.1, but it is solved using homotopy continuation methods. This approach is infeasible for some formats that are handled without any problems by our algorithm. For instance, for the aforementioned eighth-order tensor of rank 1,000, the method of Kuo and Lee [2018] would need to track over \(40 \times 10^9\) homotopy paths. We argue that the eigenvalue-based methods proposed in this article are more natural to use in an overdetermined setting.
The state-of-the-art algorithms for tensor rank decomposition using only linear algebra computations were proposed by Domanov and De Lathauwer [2014];, 2017]. Although these methods work under slightly milder conditions, our numerical experiments suggest that Algorithm 1 of Domanov and De Lathauwer [2017] often requires the construction of larger matrices than those in our algorithm. There is no explicit connection with polynomial equations. The algorithm and its complexity depend on a parameter \(l\), which is chosen incrementally by trial and error for each format. In an analogous way, the complexity of our algorithm is governed by the choice of a parameter. However, our analysis in Section 4 tells us a priori which parameter value should be used, circumventing a trial-and-error approach. In Section 6, we demonstrate that cpd_hnf improves on Domanov and De Lathauwer [2014];, 2017] in terms of computational complexity and accuracy.
Outline
In Section 2, we state our assumptions and show how computing the CPD of a tensor \(\mathcal {A}\) is formulated as a system of polynomial equations. In Section 3, we make the connection with normal form methods explicit. That is, we describe how a pre-normal form can be computed directly from the tensor \(\mathcal {A}\) and how this allows us to reduce the above polynomial system to an eigenvalue problem. We explain how the approach generalizes so-called pencil-based algorithms in Section 3.3. This leads to a complete, high-level description of our algorithm cpd_hnf in Section 3.4. In Section 4, we study the regularity of the ideal associated to our polynomial system. These results are the starting point for our analysis of the complexity of cpd_hnf in Section 5. In Section 6, we demonstrate the efficiency and accuracy of cpd_hnf relative to the state of the art through several numerical experiments. The article is completed in Section 7 with some final conclusions.
2 From Tensor Decomposition to Polynomial Equations
In this section, we explain how tensor rank decomposition, under some restrictions, can be reduced to solving a polynomial system of equations whose coefficients are directly obtained from the tensor \(\mathcal {A}\). The next two subsections state the restrictions under which the proposed algorithm operates. In Section 2.3, the polynomial system is constructed. We show that it gives a nice algebraic description of a certain projection of the rank-1 tensors appearing in \(\mathcal {A}\)’s decomposition. Section 2.4 provides a pseudo-algorithm summarizing the main steps of the proposed numerical algorithm.
2.1 Identifiability, Flattenings, and the Main Assumption
For the moment, let us assume \(\mathcal {A} \in \mathbb {C}^{\ell +1}\otimes \mathbb {C}^{m+1}\otimes \mathbb {C}^{n+1}\) is a third-order tensor of rank \(r\). The cpd_hnf algorithm works under the following assumption.
By generic, we mean that \(\mathcal {A}\) is contained in a Zariski dense open subset of the set of all rank-\(r\) tensors. We will describe this open subset more explicitly in lemma 2.2. For now, we point out that in this open set, each tensor \(\mathcal {A}\) is such that
(i)
\(\mathcal {A}\) is \(r\)-identifiable, and
(ii)
the standard 1-flattening\(\mathcal {A}_{(1)}\) of the tensor \(\mathcal {A}\) is of the same rank as \(\mathcal {A}\).
These are necessary conditions for our algorithm to work. We now briefly recall their meaning.
Condition (i) is usually very weak and only of a technical nature. Recall that the set of all rank-1 tensors forms an algebraic variety, i.e., the solution set of a system of polynomial equations, called the Segre variety \(\mathcal {S}\). A rank-\(r\) CPD of a tensor \(\mathcal {A}\) is a set of \(r\) rank-1 tensors whose sum is \(\mathcal {A}\). The set of all such rank-\(r\) CPDs is denoted by \(\mathcal {S}^{[r]} = \lbrace \mathcal {X} \subset \mathcal {S} \mid |\mathcal {X}|=r \rbrace\). Tensor rank decomposition consists of computing an element of the fiber of
Tensor \(\mathcal {A}\) is called \(r\)-identifiable if the fiber \(f^{-1}(\mathcal {A})\) contains exactly one element. That is, there is a unique set in \(\mathcal {S}^{[r]}\) whose sum is \(\mathcal {A}\). Generally, \(r\)-identifiability fails only on a strict closed subvariety of (the Zariski closure of) \(\mathcal {S}_r\); see Chiantini and Ottaviani [2012]; Chiantini et al. [2014];, 2017]; Bocci et al. [2014]. This property is called the generic \(r\)-identifiability of \(\mathcal {S}\). These results entail that \(r\)-identifiability fails only on a subset of \(\mathcal {S}_r\) of Lebesgue measure zero if the rank \(r\) and dimensions \((\ell +1,m+1,n+1)\) satisfy some very weak conditions; see Section 1 in Chiantini et al. [2014] and Section 3 in Chiantini et al. [2017] for more details and statements for higher-order tensors as well. If \(\mathcal {S}_r\) is generically \(r\)-identifiable, then there is a Zariski-open subset of the closure of \(\mathcal {S}_r\) so that \(f^{-1}\) is an analytic, bijective tensor decomposition function. We aspire to solve the tensor decomposition problem only in this well-behaved setting.
Condition (ii) is more restrictive, but allows us to tackle the tensor decomposition problem using only efficient linear algebra. Recall that the standard 1-flattening of \(\mathcal {A} \in \mathbb {C}^{\ell +1} \otimes \mathbb {C}^{m+1} \otimes \mathbb {C}^{n+1}\) consists of interpreting \(\mathcal {A}\) as the matrix \(\mathcal {A}_{(1)} \in \mathbb {C}^{(\ell +1) \times (m+1)(n+1)}\). For a rank-1 tensor \(\mathcal {A}=\alpha \otimes \beta \otimes \gamma\) this identification is defined by
where the standard bases were assumed for these Euclidean spaces and the indices \((i,j)\) are sorted by the reverse lexicographic order. Note that the 1-flattening is easy to compute when \(\mathcal {A}\) is given in coordinates relative to the standard tensor product basis. In that case it suffices to reshape the coordinate array to an \((\ell +1) \times (m+1)(n+1)\) array (e.g., as in Julia’s or Matlab’s reshape function).
2.2 The Reshaping Trick
For dealing with higher-order tensors \(\mathcal {A}\in \mathbb {C}^{n_1+1}\otimes \cdots \otimes \mathbb {C}^{n_d+1}\) with \(d \gt 3\), we rely on the well-known reshaping trick. It consists of interpreting a higher-order tensor as a third-order tensor. While the approach described in the remainder of the article could be applied directly to higher-order tensors as well, the range of ranks \(r\) to which this version would apply is (much) more restrictive than the range obtained from reshaping.
Recall that reshaping \(\mathcal {A}\) to a third-order tensor \(\mathcal {A}_{(I,J,K)}\) is a linear isomorphism
that identifies \(\alpha ^1 \otimes \cdots \otimes \alpha ^d\) with \((\otimes _{i\in I} \alpha ^i) \otimes (\otimes _{j\in J} \alpha ^j) \otimes (\otimes _{k\in K} \alpha ^k)\), wherein \(I \sqcup J \sqcup K\) partitions \(\lbrace 1,\ldots ,d\rbrace\). The general case follows by linearity from the universal property [Greub 1978].
It was shown in Section 7 of Chiantini et al. [2017] that under some conditions, the unique CPD of \(\mathcal {A}\) can be recovered from the CPD of a reshaping \(\mathcal {A}_{(I,J,K)}\). Here, we exploit the following slightly more general result that requires no \(r\)-identifiability of \(\mathcal {A}\).
For decomposing higher-order tensors \(\mathcal {A}\) of rank \(r\) via the reshaping trick, we proceed as follows:
(1)
Find \(I \sqcup J \sqcup K =\lbrace 1,\ldots ,d\rbrace\) such that \(\mathcal {A}_{(I,J,K)}\) is \(r\)-identifiable and its rank satisfies (R).
(2)
Compute the CPD \(\mathcal {A}_{(I,J,K)} = \sum _{q=1}^r \alpha _q \otimes \beta _q \otimes \gamma _q\), e.g., as in 2.1.
(3)
Viewing the recovered vectors as \(\alpha _q \in \otimes _{i\in I} \mathbb {C}^{n_i+1}\), \(\beta _q \in \otimes _{j\in J} \mathbb {C}^{n_j+1}\), and \(\gamma _q \in \otimes _{k\in K} \mathbb {C}^{n_k+1}\), a CPD of \(\mathcal {A}\) is \(\mathcal {A}_1+\cdots +\mathcal {A}_r\) where the tensors \(\mathcal {A}_q\) are isomorphic to \(\alpha _q \otimes \beta _q \otimes \gamma _q\) under Equation (I).
Note that the rank-1 decompositions in the third step can be computed for example with a sequentially truncated higher-order singular value decomposition [Vannieuwenhoven et al. 2012], as in our implementation, or with a cross approximation [Oseledets et al. 2008].
Because of the reshaping trick, we will henceforth describe our approach only for \(d=3\).
2.3 Polynomial Systems Defined by Flattenings
Having delineated the range (R) in which the proposed cpd_hnf algorithm will work, we continue by describing it. Our strategy to compute the tensor rank decomposition of
first as the solution of a system of (low-degree) polynomial equations. Thereafter, we compute the \(\alpha _{i}\)’s with efficient linear algebra by plugging \(\beta _{i}\) and \(\gamma _{i}\) into Equation (A) and solving the resulting linear system of equations. Indeed, since
and assumption 2.1 guarantees that the second \(r \times (m+1)(n+1)\) matrix is of full rank \(r\) (see lemma 2.2 below), so it has a right inverse. Applying the latter on the right to \(\mathcal {A}_{(1)}\) yields the corresponding \(\alpha _i\)’s. Note that it suffices to compute the points \(\beta _{i}\) and \(\gamma _{i}\) up to a nonzero scaling factor. Therefore, it is natural to consider our problem in complex projective space.
Recall that the \(k\)-dimensional complex projective space \(\mathbb {P}^{k}\) is the space of equivalence classes \([x] = \lbrace \lambda x \mid \lambda \in \mathbb {C} \setminus \lbrace 0\rbrace \rbrace\) for \(x \in \mathbb {C}^{k+1} \setminus \lbrace 0\rbrace\). The entries of the vector \(x = (x_0, \ldots , x_k) \in \mathbb {C}^{k+1} \setminus \lbrace 0\rbrace\) are called homogeneous coordinates of \([x]\). With a slight abuse of notation, we will write \(x = (x_0 : \cdots : x_k) \in \mathbb {P}^k\) for both the equivalence class \([x]\) and a set of homogeneous coordinates \(x \in \mathbb {C}^{k+1} \setminus \lbrace 0\rbrace\).
The proposed cpd_hnf algorithm exploits the fact that the points \(\zeta _i = (\beta _i, \gamma _i) \in \mathbb {P}^m \times \mathbb {P}^n\) are defined by algebraic relations on \(X = \mathbb {P}^{m} \times \mathbb {P}^n\) that can be computed directly from the tensor \(\mathcal {A}\). Such algebraic relations are homogeneous polynomials in a bi-graded polynomial ring. In this context, it is natural to think of degrees as 2-tuples \((d,e) \in \mathbb {N}^2\), where \(d\) is the degree in the variables corresponding to \(\mathbb {P}^m\) and \(e\) is the degree in the variables corresponding to \(\mathbb {P}^n\). Note that this differs from the more familiar setting where the degree is a natural number \(d \in \mathbb {N}\). For more on multi-graded rings, see Miller and Sturmfels [2005]. The flexibility provided by this bi-graded setting reduces the complexity of our algorithm. Concretely, we work in the \(\mathbb {N}^2\)-graded polynomial ring
Here, we used the notation \(x^{a} = x_{0}^{a_{0}} \cdots x_{m}^{a_{m}}\) and \(|a| = a_{0} + \cdots + a_{m}\) for \(a = (a_{0}, \ldots , a_{m}) \in \mathbb {N}^{m + 1}\), and analogously for \(b\). The graded pieces\(S_{(d,e)}\) are vector spaces over \(\mathbb {C}\). The variables \(x\) correspond to homogeneous coordinates on the factor \(\mathbb {P}^{m}\) and in \(S_{(d,e)}\), \(d\) is the degree in the \(x\)-variables. Analogously, elements of \(S_{(d,e)}\) have degree \(e\) in the \(y\)-variables, which are related to \(\mathbb {P}^n\).
An element \(f \in S\) is called homogeneous if \(f \in S_{(d,e)}\) for some \((d,e)\in \mathbb {N}^2\). The key reason that the ring \(S\), with its grading (2.2), is naturally associated to \(X\) is that homogeneous elements \(f \in S_{(d,e)}\) have well-defined zero sets on \(X\). By this, we mean that for \(f \in S_{(d,e)}\) and for any \(\zeta = (x, y) \in X\), \(f(x,y) = 0\) is independent of the choice of homogeneous coordinates. Indeed, this follows from
Therefore, whenever \(f\) is homogeneous it makes sense to write \(f(\zeta) = 0\) if \(f\) vanishes on some set of homogeneous coordinates for \(\zeta\), and to define the subvariety of \(X\) defined by \(f\) as \(V_X(f) = \lbrace \zeta \in X ~|~ f(\zeta) = 0\rbrace\). For a homogeneous ideal \(I \subset S\) (i.e., \(I\) is generated by homogeneous elements with respect to the grading (2.2)), we denote the subvariety of \(X\) corresponding to \(I\) by
\begin{equation*} V_X(I) = \lbrace \zeta \in X ~|~ f(\zeta) = 0, \text{ for all homogeneous } f \in I \rbrace . \end{equation*}
That is, \(V_X(I)\) contains the solutions of the polynomial system defined by the simultaneous vanishing of all homogeneous equations \(f \in I\).
With this notation in place, we turn back to the tensor \(\mathcal {A}\) and show that, under suitable assumptions, \(V_X(I) = \lbrace \zeta _1,\ldots ,\zeta _r\rbrace\) for some homogeneous ideal \(I\) defined by \(\mathcal {A}\). This will generalize the procedure in example 1.1. Consider the flattening \(\mathcal {A}_{(1)}\) from Equation (2.1). This flattening has a natural interpretation as a \(\mathbb {C}\)-linear map
Indeed, we can identify the space \(\mathbb {C}^{m+1} \otimes \mathbb {C}^{n+1}\) with the graded piece \(S_{(1,1)}\) of degree \((1,1)\) by
\begin{equation} e_{k} \otimes e_{l} ~ \longmapsto ~ x_k y_l, \qquad 0 \le k \le m,\; 0 \le l \le n, \end{equation}
(2.4)
where \(e_{k}\) represents the \((k+1)\)-st standard basis vector of \(\mathbb {C}^{m + 1}\), and analogously for \(e_{l}\) and \(\mathbb {C}^{n+1}\). For brevity, we also write \(\mathcal {A}_{(1)}\) for a matrix representation of the map \(\mathcal {A}_{(1)}\), where the standard basis of monomials (2.4) is used for \(S_{(1,1)}\). The \(\ell +1\) rows of \(\mathcal {A}_{(1)}\) are elements of the dual space \((S_{(1,1)})^\vee\) represented in its dual basis
\begin{equation} \left\lbrace \partial _{kl} = \frac{\partial ^2}{\partial {x_k} \partial {y_l}} \mid 0 \le k \le m,~ 0 \le l \le n \right\rbrace . \end{equation}
(2.5)
From this vantage point, the rows of the second factor in Equation (2.1) are
\begin{equation*} (\beta _i \otimes \gamma _i)^\top = \sum _\begin{array}{c}0 \le k \le m \\ 0 \le l \le n\end{array} \beta _{i,k} \gamma _{i,l} \partial _{kl} ~ \in S_{(1,1)}^\vee . \end{equation*}
It is clear that for any \(f \in S_{(1,1)}\), we have \((v \otimes w)^\top (f) = f(v,w)\).
Let \(f_1, \ldots , f_s \in S_{(1,1)}\) be a \(\mathbb {C}\)-basis for the kernel \(\ker \mathcal {A}_{(1)}\). The \(f_i\) generate a homogeneous ideal \(I = \langle {\ker \mathcal {A}_{(1)}} \rangle = \langle {f_1,\ldots ,f_s} \rangle \subset S\). The following lemma characterizes precisely what we mean by generic in assumption 2.1.
Point (iii) in lemma 2.2 is a technicality we will need in Section 3 to invoke the results from Section 5.5 in Telen [2020b]. For the reader who is familiar with algebraic geometry, we included a consequence of lemma 2.2 relating \(I\) to the vanishing ideal of \(V_X(I)\) in corollary A.1.
We were not able to construct an example for which (i)+(ii) are satisfied, but (iii) is not. This raises the question whether (i)+(ii) implies (iii), and if not, how our method extends to counterexamples.
2.4 The High-level Algorithm
We conclude the section by presenting a high-level pseudo-algorithm for computing the tensor rank decomposition (A) of \(\mathcal {A}\). This is presented as 2.1. Its steps summarize the discussion up to this point.
The main focus of this article is dealing with step 4 of 2.1. We will employ a state-of-the-art method for solving polynomial equations on \(X\), based on homogeneous normal forms. This strategy is described in the next section.
3 From Polynomial Equations to Eigenvalues
In this section, we employ tools from computational algebraic geometry for computing points in a product of projective spaces in step 4 of 2.1. The next subsection summarizes the relevant results in the present setting. We show in Section 3.1 that the solutions can be obtained from an eigenvalue problem defined by pre-normal forms. How to compute the latter is explained in Section 3.2. Thereafter, in Section 3.3, we demonstrate that so-called pencil-based algorithms for decomposing tensors of very low rank are closely related to our cpd_hnf algorithm. The full algorithm for performing step 4 of 2.1 is presented in Section 3.4. Finally, we conclude this section with some examples.
3.1 The Eigenvalue Theorem
Our algorithm is built on a multi-homogeneous version of the eigenvalue theorem (theorem 3.1 below), which allows us to find solutions of systems of polynomial equations via eigenvalue computations. Behind this is the theory of homogeneous normal forms. In our context, these are \(\mathbb {N}^2\)-graded versions of truncated normal forms, introduced in Telen et al. [2018], and special cases of the more general toric homogeneous normal forms used by Telen [2020a]; Bender and Telen [2020] and formally introduced in Telen [2020b]. For our purpose, it suffices to work with slightly simpler objects, called pre-normal forms, so we use homogeneous normal forms only implicitly. For full proofs and more details, we refer to Section 5.5.4 in Telen [2020b].
Consider the space \(X = \mathbb {P}^m \times \mathbb {P}^n\) and its associated ring \(S\), which is the \(\mathbb {N}^2\)-graded polynomial ring from Equation (2.2). Let \(I \subset S\) be a homogeneous ideal. The ideal \(I\) inherits the grading on \(S\):
\begin{equation*} I = \bigoplus _{(d,e) \in \mathbb {N}^2} I_{(d,e)}, \qquad \text{where } \qquad I_{(d,e)} = I \cap S_{(d,e)} \end{equation*}
is a \(\mathbb {C}\)-subspace of \(S_{(d,e)}\), and the quotient ring
(these are quotients of vector spaces) similarly inherits this grading. Our objective is to compute a set of homogeneous coordinates of the points in \(V_X(I)\) by using linear algebra computations. To accomplish this, it is necessary to work in graded pieces of \(S\), \(I\), and \(S/I\). Let \(M\) denote either of the latter. The (multi-graded) Hilbert function\(\mbox{HF}_M : \mathbb {N}^2 \rightarrow \mathbb {N}\) is given by
and keeps track of the dimension of the vector space \(M_{(d,e)}\). Note that for a homogeneous ideal \(I \subset S\) and \((d,e) \in \mathbb {N}^2\), we have \(\mbox{HF}_{S/I}(d,e) = \mbox{HF}_S(d,e) - \mbox{HF}_I(d,e)\).
Before stating the main result of this section, theorem 3.1, we present three auxiliary lemmas. Lemma 3.13.2 are well-known in commutative algebra. For lack of a precise reference and completeness, we included short proofs in the Appendix.
Linear maps \(N: S_{(d,e)} \rightarrow \mathbb {C}^r\) satisfying Equation (3.1) will play an important role in translating our polynomial root finding problem into an eigenvalue problem. We will therefore give them a name.
If \(r = \ell +1\), then assumption 2.1 implies that \(\mathcal {A}_{(1)}: S_{(1,1)} \rightarrow \mathbb {C}^r\) is a pre-normal form on \(S_{(1,1)}\).
We use the terminology pre-normal form, because, following Definition 5.5.6 in Telen [2020b], the term normal form is reserved for linear maps satisfying an extra technical condition. Normal forms are closely related to a classical result in computational algebraic geometry called the eigenvalue theorem [Cox et al. 2006, Section 2.4], which is used for computing isolated solutions to polynomial systems. The pre-normal forms introduced above will be useful for formulating a multi-homogeneous version of this theorem, namely, theorem 3.1 below. Before stating it, we need to fix some additional notation.
In lemma 3.3, \(N_{h_0}\) is the composition of \(N\) with the linear map that represents multiplication by \(h_0 \in S_{(d^{\prime },e^{\prime })}\). More generally, for \(g \in S_{(d^{\prime },e^{\prime })}\), we define \(N_g : S_{(1,1)} \rightarrow S_{(d,e)}\) as \(N_g(f) = N(gf)\). In theorem 3.1, we will restrict the map \(N_{h_0}\) to an \(r\)-dimensional subspace \(B \subset S_{(1,1)}\) such that the resulting map is invertible. We denote this restriction by \((N_{h_0})_{|B}\). In practice, it suffices to select \(r\) columns of the matrix of \(N_{h_0}\) so that the resulting \(r \times r\) submatrix \((N_{h_0})_{|B}\) is invertible.
In what follows, we will write \(w_i ^\top = (\beta _i \otimes \gamma _i)^\top _{|B} \in B^\vee\) for the linear functional representing evaluation at \(\zeta _i = (\beta _i, \gamma _i)\). This is the restriction of the functional \((\beta _i \otimes \gamma _i)^\top \in S_{(1,1)}^\vee\) to the vector space \(B\). Concretely, we set
After fixing a basis for \(B^\vee\), \(w_i^\top\) can be represented as a row vector.
If \(g, h \in S_{(d^{\prime },e^{\prime })} \setminus \lbrace 0\rbrace\) are two homogeneous polynomials of the same degree, then the fraction \(g/h\) is a well-defined function on \(X \setminus V_X(h)\). Indeed, the evaluation of this function does not depend on the choice of homogeneous coordinates. Therefore, we may write \((g/h)(\zeta)\) for the evaluation of this rational function at \(\zeta \in X \setminus V_X(h)\).
After fixing a basis for \(B\) and representing \(w_i\) in the dual basis for \(B^\vee\), Equation (3.3) is a standard matrix eigenproblem: \(w_i^\top M_{g/h_0} = \lambda _i w_i^\top\). That is, \((\lambda _i,w_i)\) is a left eigenpair of the \(r\)\(\times\)\(r\) matrix \(M_{g/h_0}\). Note that theorem 3.1 implies that all maps of the form \(M_{g/h_0}\) share a set of eigenvectors.
We now sketch one way of using theorem 3.1 to retrieve the coordinates of \(\zeta _i = (\beta _i, \gamma _i)\) from eigenvalues, assuming a pre-normal form \(N: S_{(d,e)} \rightarrow \mathbb {C}^r\) is given. The problem of computing a pre-normal form is addressed in the next subsection. We assume \(d \ge 2\).4 Let \(h \in S_{(d^{\prime }-1,e^{\prime })}\) and \(h_0 \in S_{(d^{\prime },e^{\prime })}\) be homogeneous polynomials that do not vanish at any of the points \(\zeta _i\). These can be chosen generically. Set \(g_j = x_j h \in S_{(d^{\prime },e^{\prime })}, j = 0, \ldots , m\). Choose \(B \subset S_{(1,1)}\) of dimension \(r\) such that \((N_{h_0})_{|B}\) is invertible and compute the matrices \(M_j = M_{g_j/h_0} = (N_{h_0})_{|B}^{-1} \circ (N_{g_j})_{|B}\). By theorem 3.1, the eigenvalues of \(M_j\) are given by \(\lambda _{ji} = (g_j/h_0)(\zeta _i)\). Writing \(\beta _{ij}\) for the \(j\)th coordinate of \(\beta _i\), we have
The foregoing approach requires that \((d,e) \ne (1,1)\). Otherwise \(S_{(d^{\prime },e^{\prime })}=S_{(0,0)} = \mathbb {C}\), and we can only evaluate constant functions using theorem 3.1.
3.2 Computing Pre-normal Forms
As illustrated in the previous subsection, once we have computed a pre-normal form, the points \(\zeta _i\) can be recovered using basic linear algebra computations. A natural next issue to address is how to compute a pre-normal form.
Our starting point is a basis \(f_1, \ldots , f_s \in S_{(1,1)}\) of \(\ker \mathcal {A}_{(1)}\), generating our ideal \(I = \langle {\ker \mathcal {A}_{(1)}}\rangle = \left\langle {f_1,\ldots ,f_s} \right\rangle\). For any tuple \((d,e) \in \mathbb {N}^2\) such that \((d,e) \ge (1,1)\), the degree-\((d,e)\) part \(I_{(d,e)}\) of \(I\) is the \(\mathbb {C}\)-vector space spanned by
where \((d^{\prime },e^{\prime }) = (d-1,e-1)\). If \(d = 0\) or \(e = 0\), then we have \(I_{(d,e)} = \lbrace 0\rbrace\). In analogy with Equation (1.1), we construct a matrix whose rows are indexed by the monomial basis elements of \(S_{(d,e)}\) (i.e., the monomials \(\lbrace x^ay^b ~|~ |a| = d, |b|=e \rbrace\)), and whose columns are the polynomials (3.5) expanded in this basis. We denote this matrix by
Such matrices represent graded resultant maps in the terminology of Section 5.5.4 in Telen [2020b]. They are multihomogeneous versions of the classical Macaulay matrices [Macaulay 1916]. We present an explicit example below in example 3.2.
Observe that the Hilbert function \(\mbox{HF}_I(d,e)\) is given by the rank of \(R_I(d,e)\) and \(\mbox{HF}_{S/I}(d,e)\) by its corank. This follows immediately from the observation that the columns of \(R_I(d,e)\) span \(I_{(d,e)}\). A left nullspace matrix\(N\) of \(R_I(d,e)\) represents a map \(S_{(d,e)} \longrightarrow S_{(d,e)}/I_{(d,e)} \simeq \mathbb {C}^{\mbox{HF}_{S/I}(d,e)}\). This has the following consequence.
We conclude that a pre-normal form \(N\) can be computed, for instance, from a full singular value decomposition (SVD) of \(R_I(d,e)\), where \(\mbox{HF}_{S/I}(d,e) = r\). This solves the problem of computing a pre-normal form, assuming that we know a degree \((d,e) \in \mathbb {N}^2\) for which \(\mbox{HF}_{S/I}(d,e) = r\). The problem of finding such degrees is addressed in Section 4.
3.3 Relation to Pencil-based Algorithms
To obtain the homogeneous coordinates for \(\zeta _1, \ldots , \zeta _r\) as eigenvalues of the matrices \(M_{g/h_0}\), we usually have to work with pre-normal forms on \(S_{(d,e)}\), where \((d,e) \ne (1,1)\) and \((d^{\prime },e^{\prime }) \ge (0,0)\). An exception is the case where \(r \le m+1 \le \ell + 1\). For these tensors of very low rank, a pre-normal form \(N: S_{(1,1)} \rightarrow \mathbb {C}^r\) will suffice under the mild condition that
The underlying reason is that vanishing at \(\lbrace \zeta _1,\ldots , \zeta _r\rbrace\) gives \(r\) linearly independent conditions on \(S_{(1,0)}\). The proof of the following theorem is another consequence of the theory of homogeneous normal forms and is deferred to the Appendix.
This theorem is exploited to compute \(\zeta _i = (\beta _i, \gamma _i)\) efficiently as follows. If \(\mathcal {A}\) satisfies assumption 2.1 and Equation (3.7), then we take a basis of the \(r\)-dimensional row span of \(\mathcal {A}_{(1)}\) in theorem 3.2 as our pre-normal form. This can be obtained from a compact SVD of \(\mathcal {A}_{(1)}\). Once the \(\gamma _i\) are computed from the eigenvalues of the \(M_{y_j/h_0}\), the \(\beta _i\) can be obtained as in Equation (3.4). Alternatively, one can use the eigenvectors of these commuting matrices \(M_{y_j/h_0}\) for \(j=1,\ldots ,r\) [Telen 2020b, Theorem 5.5.3].
Theorem 3.2 is intimately related to what Beltrán et al. [2019] called pencil-based algorithms for solving Equation (CPD) when the rank satisfies \(r \le m+1 \le \ell +1\), such as those by Leurgans et al. [1993]; Lorber [1985]; Sanchez and Kowalski [1990]. Recall that pencil-based algorithms assume that \(\mathcal {A}\in \mathbb {C}^{r \times r \times (n+1)}\) is a rank-\(r\) tensor.5 In addition, they assume that the \(\alpha _i\) form a linearly independent set, and likewise for the \(\beta _i\)’s. Then, we have that the tensor contraction of \(\mathcal {A}\) with \(h_0^\top \in (\mathbb {C}^{n+1})^\vee\), i.e.,
so that the points \(\beta _i\) can be recovered uniquely from the matrix of eigenvectors\(B^\top\), provided that \(h_0^\top \gamma _i \ne 0\) for all \(i=1,\ldots ,r\). The \(\alpha _i\)’s and \(\gamma _i\)’s can then be recovered from the 2-flattening; see Leurgans et al. [1993]; Lorber [1985]; Sanchez and Kowalski [1990]; Beltrán et al. [2019]; De Lathauwer [2006] for more details. With the foregoing suggestive notation, it is easy to see that the matrix of \(\widetilde{N_{h_0}} : f \mapsto N(fh_0)\) with respect to the standard bases is precisely Equation (3.8). Indeed, note that since we can take \(N = \mathcal {A}_{(1)}\), we have \(f h_0 \simeq f \otimes h_0\) and so \(N(f h_0) = \mathcal {A}_{(1)}(f \otimes h_0)\).
Pencil-based algorithms may thus be interpreted as a special case of the proposed cpd_hnf algorithm based on homogeneous normal forms when \(r \le m+1 \le \ell +1\). Note that because of the numerical instabilities analyzed by Beltrán et al. [2019] caused by extracting \(\beta _i\) from the eigenvectors, we prefer to extract the \(\zeta _i = (\beta _i,\gamma _i)\) in a different way. We compute \(\beta _i\) from the eigenvalues of \(M_{x_i/h_0}\) and the corresponding \(\gamma _i\) from the linear system Equation (3.4).
3.4 The Algorithm
The discussion so far is distilled into 3.1. This algorithm implements step 4 of 2.1. Note that we dropped the tilde on top of the \(N_\star\)’s in lines 2–7 to streamline the presentation.
The first phase of the algorithm, up to line 21, constructs the pre-normal form \(N\) and chooses an \(N_{h_0}\). This phase depends on whether we can invoke the more efficient theorem 3.2 (\(r \le m+1\)) or we need the full power of theorem 3.1. In the former case, we can take \(N = \mathcal {A}_{(1)}\), while in the latter case, we need to take \(N\) equal to the left null space of \(R_I(d,e)\). How we choose the degree \((d,e)\) in line 9 is explained in Section 4. The matrix \(R_I(d,e) \in \mathbb {C}^{\mbox{HF}_S(d,e) \times s \mbox{HF}_S(d^{\prime },e^{\prime })}\) can be constructed efficiently column-by-column without polynomial multiplication. Indeed, by Equation (3.5) it suffices to copy the coefficients of \(f_i\) relative to the monomial basis of \(S_{(1,1)}\) into the correct rows; see also example 3.2 below. The left null space \(N\) can be extracted from the last \(r\) columns of the \(U\)-factor in the SVD \(R_I(d,e)=USV^H\), where \(\cdot ^H\) denotes the conjugate transpose. In our implementation, the matrix \(N_{h_0} \in \mathbb {C}^{r \times \mbox{HF}_S(1,1)}\) is chosen by sampling the coefficients of \(h_0 \in S_{(d^{\prime },e^{\prime })}\) identically and independently distributed from a Gaussian distribution. With probability 1, \(h_0\) satisfies \(h_0(\gamma _i) \ne 0\) for all \(i\); hence, this is a valid choice of \(h_0\).
The next phase of the algorithm, in lines 21–22, chooses a basis \(B\). Although in theory theorem 3.1 enables us to choose any \(B\) such that \((N_{h_0})_{|B}^{-1}\) is invertible, Telen et al. [2018] showed that for reasons of numerical stability it is crucial to choose \(B\) such that \((N_{h_0})_{|B}\) is well-conditioned. In practice, such a subspace \(B\) can be found using a QR decomposition with optimal column pivoting or by using the SVD [Telen 2020b, Chapter 4]. We stated the QR approach in 3.1.
The multiplication matrices are constructed straightforwardly as the formula suggests in lines 23 to 25. Note that the upper triangular matrix \((N_{h_0})_{|B}\) does not need to be inverted explicitly, rather the system can be solved by backsubstitution.
In line 26, the matrices \(M_{(h x_k)/h_0}\) need to be simultaneously diagonalized, as we have that \(M_{(h x_k)/h_0} = V^{-1} \operatorname{diag}(\beta _k) V\). We compute \(V\) from a random linear combination of the \(M_{(h x_k)/h_0}\)’s, and then extract \(\beta _k\) as the diagonal entries from the (approximately) diagonalized matrix \(V^{-1} M_{(h x_k)/h_0} V\).
The system in line 28 is solved efficiently by noting that the coefficients of \(f_j\) can be arranged in a matrix \(F_j\) of size \((m+1) \times (n+1)\), so that \(f_j(x,y) = x^\top F_j y\). Hence, Equation (3.4) boils down to computing the kernel of \(Ay = 0\) where the rows of \(A \in \mathbb {C}^{s \times (n+1)}\) are the row vectors \(\beta _i^\top F_j\). Note that by assumption 2.1, \(\ker A\) is spanned by \(\gamma _i\). In line 29 the obtained solution \((\beta _i,\gamma _i)\) is refined using standard Newton iterations. Here the pseudo-inverse of the Jacobian matrix of \(f_1, \ldots , f_s\) is used. This matrix has full rank if and only if \((\beta _i, \gamma _i)\) has multiplicity one. This can be used to check condition (iii) in lemma 2.2.
Table 1.
3.5 Some Examples
We now present two illustrative examples. The first one shows how to use the techniques explained above on the tensor \(\mathcal {A}\) in example 1.1.
The ideal \(I\) in the previous example has the property that \(\mbox{HF}_{S/I}(1+d^{\prime },1+e^{\prime }) = r\) for all \((d^{\prime },e^{\prime }) \in \mathbb {N}^2\). Our next example shows that this is not the case in general.
4 Regularity
One key step of the proposed cpd_hnf algorithm has not been investigated. As explained in the previous section (see also line 9 of 3.1), we should determine a correct degree \((d,e)\). This choice has a major impact on the computational complexity of the proposed algorithm. Indeed, it determines the dimensions of the graded resultant matrix \(R_I(d,e)\) from Equation (3.6) whose left nullspace is required. The goal of this section is determining which degree \((d,e)\) is needed for theorem 3.1 to apply. From this we can then deduce our algorithm’s computational complexity.
As before, let \(\mathcal {A}\) be a tensor as in Equation (A) that satisfies assumption 2.1, and let the \(\mathbb {N}^2\)-graded ring from Equation (2.2) be denoted by \(S = \mathbb {C}[x_0, \ldots , x_m, y_0, \ldots , y_n]\). We assume that \(r\gt m+1\), for otherwise theorem 3.2 applies and no choice of \((d,e)\) is required.
To compute \(\beta _i\) and \(\gamma _i\) via theorem 3.1, we need to compute a pre-normal form on \(S_{(d,e)}\) for \((d,e) \ne (1,1)\) and \((d^{\prime },e^{\prime }) \ge (0,0)\). Proposition 3.1 tells us that we must find such a tuple \((d,e)\) for which additionally \(\mbox{HF}_{S/I}(d,e) = r\), where \(I\) is the ideal \(\langle { \ker \mathcal {A}_{(1)}}\rangle\). Motivated by this, we make the following definition.
Hence, our task is to find a tuple \((d,e) \in \mbox{Reg}(I) \setminus \lbrace (1,1)\rbrace\). Recall that such a tuple exists by lemma 3.2. In this section, for given \(\ell , m, n\) and \(r\) satisfying (R), we conjecture an explicit formula for \(d\) and \(e\) so that \((d,1), (1,e) \in \mbox{Reg}(I) \setminus \lbrace (1,1)\rbrace\) for generic tensors of this format and rank. We prove it in many practical cases.
Because the results in this section are of independent interest for solving structured, overdetermined systems of polynomial equations, we formulate them in a slightly more general context. The first statement of this section, proposition 4.1, is concerned with homogeneous ideals \(J\) of \(S\) that are generated by elements of degree \((1,1)\). After that, we specialize to a particular type of such \((1,1)\)-generated ideals. More precisely, to a tuple \(Z = (\zeta _1, \ldots , \zeta _r) \in X^r\), we associate an ideal \(J(Z)\), which is generated by elements of degree \((1,1)\), and we investigate its Hilbert function (4.1). In our tensor setting, we will have \(J(Z) = I = \left\langle {\ker \mathcal {A}_{(1)}} \right\rangle\). After pointing out in lemma 4.1 that for \(r \le mn\), most configurations \(Z = (\zeta _1, \ldots , \zeta _r) \in X^r\) lead to an ideal \(J(Z)\) such that \(V_X(J(Z)) = \lbrace \zeta _1, \ldots , \zeta _r \rbrace\), where each of the \(\zeta _i\) occurs with multiplicity one, we use 4.1 to characterize \(\mbox{Reg}(J(Z))\) in theorem 4.14.21.
For \(a = (a_0, \ldots , a_m) \in \mathbb {N}^{m+1}\), \(b = (b_0, \ldots , b_n) \in \mathbb {N}^{n+1}\), we write \(\partial _{a,b} : S \rightarrow S\) for the differential operator
such that the basis of \(S_{(d,e)}^\vee\) dual to \(\lbrace x^a y^b ~|~ |a| = d, |b| = e \rbrace\) is given by \(\lbrace \partial _{a,b} ~|~ |a| = d, |b| = e \rbrace\). We write \(e_k\) for the standard basis vector \((0, \ldots , 1, \ldots , 0)\) with a 1 in the \((k+1)\)st position, such that
\begin{equation*} \partial _{e_k,e_l} = \frac{\partial ^2}{\partial x_k \partial y_l}, \qquad 0 \le k \le m,~ 0 \le l \le n. \end{equation*}
Note that this differs from the shorter notation \(\partial _{kl}\) used in previous sections to have a general notation for derivatives of arbitrary order. For \((d,e) \in \mathbb {N}^2\) and \(J \subset S\) a homogeneous ideal, the linear space \(J_{(d,e)}^\perp \subset S_{(d,e)}^\vee\) is defined as
\begin{equation*} J_{(d,e)}^\perp = \left\lbrace v \in S_{(d,e)}^\vee ~|~ v(f) = 0 \text{ for all } f \in I \right\rbrace . \end{equation*}
It follows from basic linear algebra that \(J_{(d,e)}^\perp \simeq (S/J)_{(d,e)}\), such that \(\mbox{HF}_{S/J}(d,e) = \dim _{\mathbb {C}} J_{(d,e)}^\perp\). As before, we denote \((d^{\prime },e^{\prime }) = (d-1,e-1)\). To simplify the notation in this section, we will use the abbreviations
We focus on ideals \(J\) that are generated by elements of degree \((1,1)\). In this case, a functional belongs to \(J_{(d,e)}^\perp\) if and only if it induces functionals in \(J_{(1,1)}^\perp\).
In our tensor rank decomposition setting, we are mainly interested in investigating the Hilbert function for \((1,1)\)-generated ideals defined by point configurations in \(X = \mathbb {P}^m \times \mathbb {P}^n\). To that end, fix \(r\) points \(Z = (\zeta _1, \ldots , \zeta _r) \in X^r\) and let \(\zeta _i = (\beta _i, \gamma _i)\). We denote \(w_i = (\beta _i \otimes \gamma _i)^\top \in S_{(1,1)}^\vee\) such that \(w_i(f) = f(\beta _i,\gamma _i)\) for \(f \in S_{(1,1)}\).6 In coordinates, the \(w_i\) are \(w_i = \sum _{k,l}\beta _{ik} \gamma _{il} \partial _{e_k,e_l}.\) To the point configuration \(Z\), we associate an ideal \(J(Z) \subset S\) by setting
Note that the ideal \(I = \left\langle {\ker {\mathcal {A}_{(1)}}} \right\rangle\) from previous sections arises in this way.7 We denote \(W \subset X^r\) for the Zariski-open subset in which \(w_1, \ldots , w_r\) are \(\mathbb {C}\)-linearly independent. If \(r \le \mbox{HF}_S(1,1)\), then \(W \subset X^r\) is non-empty and therefore dense in the Euclidean topology.8 We have the following consequences of proposition 4.1.
The space \(S_{(1,1)}^\vee \otimes S_{(d^{\prime },e^{\prime })}^\vee\) is identified with the space of matrices of size \(\mbox{HF}_S(1,1) \times \mbox{HF}_S(d^{\prime },e^{\prime })\), where the rows are indexed by \(\partial _{e_k,e_l}\) for \(0 \le k \le m\), \(0 \le l \le n\) and columns are indexed by \(\partial _{a^{\prime },b^{\prime }}\) where \((a^{\prime },b^{\prime }) \in \mathbb {N}^{m+1} \times \mathbb {N}^{n+1}\) with \(|a^{\prime }| = d^{\prime }, |b^{\prime }| = e^{\prime }\). For such a matrix to be contained in \(\operatorname{\mbox{im}}\iota\), a collection of partial symmetry conditions needs to be satisfied. For instance, if \((a^{\prime }+e_k, b^{\prime }+e_l) = (a^{\prime \prime } + e_{k^{\prime }}, b^{\prime \prime } + e_{l^{\prime }})\), then the entry in the row indexed by \(\partial _{e_k,e_l}\) and column indexed by \(\partial _{a^{\prime },b^{\prime }}\) should be equal to the entry in the row indexed by \(\partial _{e_{k^{\prime }},e_{l^{\prime }}}\) and column indexed by \(\partial _{a^{\prime \prime }, b^{\prime \prime }}\) (see example 4.1). Matrices in \(\operatorname{\mbox{im}}\iota\) are called catalecticant matrices [Iarrobino and Kanev 1999, Definition 1.3].
We can use 4.1 to compute the Hilbert function of \(S/J(Z)\) via a rank computation of a matrix whose entries are monomials evaluated at the points \((\beta _i,\gamma _i)\). This is important for our proof of theorem 4.2. It is most easily explained by means of an example.
From the discussion in example 4.1, we would like to conclude that, generically, \((2,1) \in \mbox{Reg}(J(Z))\). For this to make sense, i.e., to apply definition 4.1, we need to show that for most configurations \(Z\), \(J(Z)\) defines \(r\) points with multiplicity one. By an argument analogous to lemma 2.2, this happens for small enough ranks.
Our next goal is to show that, to prove that \((d,e) \in \mbox{Reg}(J(Z))\) for almost all \(Z \in X^r\), it suffices to find one particular instance \(Z^* \in W\) for which \(\mbox{HF}_{S/J(Z^*)}(d,e) = r\).
This result will be particularly useful for proving theorem 4.2 below. First, we investigate which combinations of \(m,n,d,e,r\) are possible to have \(\mbox{HF}_{S/J(Z)}(d,e) = r\) for generic points \(Z\).
For \((d,e) \ge (1,1)\) and \(m,n \in \mathbb {N}_0\), we define
If \((d,e) = (1,1)\), then we set \(\mathcal {R}(m,n,(d,e)) = \infty\). For given \(m,n\) and \((d,e) \ge (1,1)\), the following result shows that \(\mathcal {R}(m,n,(d,e))\) bounds the rank \(r\), for which we could possibly have \(\mbox{HF}_{S/J(Z)}(d,e) = r\) for \(Z \in W \subset X^r\).
Note that if \((d,e) = (1,1)\), the range (4.5) is empty. This agrees with the fact that \((1,1) \in \mbox{Reg}(J(Z))\) for all \(Z \in W\). From corollary 4.2, we know that \(\mbox{HF}_{S/J(Z)}(d,e) \ge r\) for \((d,e) \ge (1,1)\) and \(Z \in W\). Combining this with a dimension argument as in the proof of theorem 4.1, we see that
For \(e^{\prime } = 0\) or \(d^{\prime } = 0\), we observe experimentally that equality holds for generic configurations \(Z = (\zeta _1, \ldots , \zeta _r) \in X^r\). In this case, it suffices to check for which \(r\) the maximum equals \(r\). This leads us to the following conjecture.
Note that the role of \(m\) and \(n\) in conjecture 1 can be interchanged, implying the analogous statement that for \(e \ge 1\) and generic \(Z \in W \subset X^r = (\mathbb {P}^m \times \mathbb {P}^n)^r\) with \(r \le \min \left\lbrace \mathcal {R}(m,n,(1,e)), mn \right\rbrace ,\) we have \((1,e) \in \mbox{Reg}(J(Z))\).
Let \(\mathcal {A}\in \mathbb {C}^{\ell + 1} \otimes \mathbb {C}^{m + 1} \otimes \mathbb {C}^{n+1}\) be a general tensor whose rank satisfies Equation (R) and let \(I = \langle { \ker \mathcal {A}_{(1)}}\rangle = J(Z)\), where \(Z = ((\beta _i, \gamma _i))_{i=1, \ldots , r}\). A consequence of conjecture 1 would be that \((d,1) \in \mbox{Reg}(I)\), for any \(d\) such that \(\mathcal {R}(m,n,(d,1)) \ge r\) and \((1,e) \in \mbox{Reg}(I)\) for any \(e\) such that \(\mathcal {R}(m,n,(1,e)) \ge r\).
The first factor is always greater than 1 and the second factor is at least \(mn\) if \(d \ge n+1\). If conjecture 1 holds, then for all tensors \(\mathcal {A}\) of rank \(r\) in the range (R), we have \((n+1,1) \in \mbox{Reg}(I)\). In that case, cpd_hnf can take \((d,e) = (n+1,1)\) to treat all identifiable rank-\(r\) tensors in the unbalanced regime. That is, when \(\ell \gt mn\). For such shapes, Bocci et al. [2014] proved that generic \(r\)-identifiability holds up to \(r \le mn\).
Conjecture 1 gives a way of finding degrees of the form \((d,1) \in \mbox{Reg}(I) \setminus \lbrace (1,1)\rbrace\). There might exist other tuples of the form \((d,e) \in \mbox{Reg}(I) \setminus \lbrace (1,1)\rbrace\), which could lead to a lower computational cost. For such degrees, Equation (4.7) may be a strict inequality, which means that other tools are needed. We leave the further exploration of effective bounds for the regularity for future research.
5 Computational Complexity
One motivation for studying the regularity of the ideal \(I = \langle \ker \mathcal {A}_{(1)} \rangle\) is to understand the computational complexity of 2.1.
Assume that we are given an \(r\)-identifiable tensor \(\mathcal {A}\) in \(\mathbb {C}^{\ell +1} \otimes \mathbb {C}^{m+1}\otimes \mathbb {C}^{n+1}\), where \(\ell \ge m\ge n\). We additionally assume that \(\ell +1 \le (m+1)(n+1)\).10 The size of the input will be denoted by \(M = (\ell +1)(m+1)(n+1)\). Because of the constraint \((m+1)(n+1) \ge \ell +1 \ge m+1 \ge n+1\), we have
Since cpd_hnf applies only if the rank \(r\) satisfies Equation (R), we have \(r = \mathcal {O}(M^\frac{5}{6})\).
Without going into details, it can be verified that a crude upper bound for the computational complexity of the steps in 2.1, excluding step 4, is \(\mathcal {O}(M^3)\).
The key contribution to the time complexity originates from 3.1. Its complexity is ultimately determined by the choice of \((d,e)\) in line 9 of the algorithm. We analyze what happens when the degree is selected as \((d,1)\). In this case, bounding the size of the matrix \(R_I(d,e)\) in 3.1 is critical. Assuming conjecture 1 holds, we have
The upper bound follows from \(\binom{m+d}{d} \le (m+d)^d \le (2(m+1))^d \le ((m+1)(n+1))^d\). For establishing the lower bound, note that \((1 + \frac{m}{d})^d \le \binom{m+d}{d}\) and use the facts that \(d \le n+1 \le m+1\) (see the discussion below Equation (4.9)), so that \(d\ge 2\) implies that \(\frac{m}{d} \ge \frac{1}{2}\). Computing a basis for the left null space of \(R_{I}(d,1)\) requires at most \(\mathcal {O}((\mbox{HF}_S(d,1))^3)\) and at least \(r \cdot \mbox{HF}_S(d,1)\) operations, as we know \((d,1)\in \mbox{Reg}(I)\) so that the dimension of the left null space is \(r\). Consequently, line 11 has a time complexity that is at least exponential in \(d\).
One might conclude from this result that cpd_hnf is not an effective algorithm for tensor rank decomposition. However, Hillar and Lim [2013] proved that computing the rank of a tensor over \(\mathbb {C}\) is an NP-hard problem. Hence, a polynomial-time algorithm that applies to all inputs for this problem is not anticipated. A typical instance of an NP-hard problem can often be solved more efficiently than the worst-case instance. Theorem 1.1 is in this spirit.
We can observe that the formula \(d \ge \frac{1}{1-\phi }\) is asymptotically sharp in the sense that as \(\phi \rightarrow 1\) the exponent in \(\mathcal {O}(M^\star)\) needs to blow up to \(\infty\) and no polynomial in the input size can control the growth. Indeed, proposition 5.1 shows that exactly when \(\phi =1\) there exist cases that require at least an exponential growth. Note that by the discussion below Equation (4.9), assuming conjecture 1, we can use \(d = n+1\) for any \(\phi \in [0,1]\). This gives the universal, exponential complexity bound \(\mathcal {O}(M^{\frac{5}{2}n + \frac{7}{2}})\).
6 Numerical Experiments
We present several numerical results demonstrating the effectiveness of cpd_hnf. All experiments were performed on KU Leuven/UHasselt’s Tier-2 Genius cluster of the Vlaams Supercomputer Centrum (VSC). Specifically, the supercomputer’s scheduling software allocated standard skylake nodes containing two Xeon Gold 6140 CPUs (18 physical cores, 2.3 GHz clock speed, 24.75 MB L3 cache) with 192 GB of main memory, as well as standard cascadelake nodes, which are equipped with two Xeon Gold 6240 CPUs (18 physical cores, 2.6 GHz clock speed, 24.75 MB L3 cache) with 192 GB of main memory for our experiments. In all experiments, we allowed all algorithms to use up to 18 physical cores.
The proposed algorithm was implemented in Julia v1.4.0, relying on the non-base packages Arpack.jl, DynamicPolynomials.jl, GenericSchur.jl, and MultivariatePolynomials.jl.11 Our implementation follows the pseudocode in 2.1 and 3.1 and the detailed discussion in Section 3.4 quite closely. The Julia code of cpd_hnf, including driver routines to reproduce cpd_hnf’s experimental results can be found at https://gitlab.kuleuven.be/u0072863/homogeneous-normal-form-cpd.
In the experiments below, random rank-\(r\) tensors \(\mathcal {A}= \sum _{i=1}^r \alpha _i \otimes \beta _i \otimes \gamma _i\) are generated by randomly sampling the elements of \(\alpha _i \in \mathbb {R}^{\ell +1}\), \(\beta _i \in \mathbb {R}^{m+1}\), and \(\gamma _i \in \mathbb {R}^{n+1}\) identically and independently distributed (i.i.d.) from a standard normal distribution.
6.1 Implementation Details
The algorithm is implemented for real and complex input tensors. In the former case, all computations are performed over the reals with the exception of the computation of the left null space of \(R_I(d,e)\). In the real case, the algorithm continues with the real part of the output of this step.
At the start of the algorithm, we compress the \((\ell +1) \times (m+1) \times (n+1)\) input tensor, which is to be decomposed into \(r\) rank-1 terms, to a \(\min \lbrace \ell +1,r\rbrace \times \min \lbrace m+1,r\rbrace \times \min \lbrace n+1,r\rbrace\) tensor. For this, we apply ST-HOSVD compression [Vannieuwenhoven et al. 2012] with truncation rank \((\min \lbrace \ell +1,r\rbrace , \min \lbrace m+1,r\rbrace , \min \lbrace n+1,r\rbrace)\). Most of the experiments below are chosen so that this step performs no computations.
The kernel of \(\mathcal {A}_{(1)}\) in 2.1 is computed by an SVD. The linear system at the end of 2.1 is solved by computing the Khatri–Rao product \(K = [\beta _i \otimes \gamma _i]_{i=1}^r\) and then solving the overdetermined linear system \(K A = \mathcal {A}_{(1)}^\top\) for \(A\). The rows of \(A\) then correspond to the \(\alpha _i\)’s.
In 3.1, the degree \((d,e)\) is determined automatically by assuming conjecture 1 is true and selecting a valid \((d,1)\) or \((1,e)\) based on a heuristic that takes into account the estimated computational cost and numerical considerations.12
The key computational bottleneck is the computation of the left nullspace of \(R_I(d,e)\) in line 11 of 3.1. When the number of entries of \(R_I(d,e)\) is smaller than 10,000, we use the standard SVD-based kernel computation. In larger cases, for efficiency, we propose to employ Arpack.jl’s eigs function to extract the left null space from the Hermitian Gram matrix \(G = R_I(d,e) (R_I(d,e))^H\), where \(^H\) denotes the Hermitian transpose. The eigs function (with parameters tol = 1e-6 and maxiter = 25) extracts the \(r\) eigenvalues of smallest modulus and their corresponding eigenvectors. Note that Arpack.jl employs a Bunch–Kaufman factorization [Bunch and Kaufman 1977] of the positive semidefinite input matrix \(G\) to perform its Lanczos iterations. Using eigs was up to \(75\%\) faster than the SVD-based approach for large problems.
The final step of 3.1 is implemented as discussed at the end of Section 3.4. The kernel is computed with an SVD and \(\gamma _i\) is taken as the left singular vector corresponding to the smallest singular value. After this step, we find \((\beta _i, \gamma _i)\). Observe that \(f_j(\beta _i, \gamma _i)\) should vanish exactly for all \(j = 1, \ldots , s\). We propose to refine the approximate roots \((\beta _i, \gamma _i)\) by applying three Newton iterations on \(f_1 = \cdots = f_s = 0\). This adds a minor computational overhead of \(r(m+1)(n+1)(m+n)^2\).
6.2 Impact on the Accuracy of Computational Optimizations
We compare and evaluate the impact on numerical accuracy of four variants of 3.1. Two options are considered in each combination:
(i)
refining the approximate roots \((\beta _i,\gamma _i)\) with 3 Newton steps (+newton) or not, and
(ii)
computing the left nullspace of \(R_I(d,e)\) with Arpack.jl’s iterative eigs method applied to the Gram matrix or using the standard svd.
We append _eigs, _eigs+newton, _svd, and _svd+newton to cpd_hnf to distinguish the variants.
The experimental setup is as follows. Random rank-\(r\) tensors are generated as explained at the start of this section, sampling one rank-\(r\) tensor for each combination \((\ell +1,m+1,n+1,r)\) with \(25 \ge m + 1 \ge n+1 \ge 2\) and \(\ell +1=r=\min \lbrace \mathcal {R}(m,n,(2,1)),mn\rbrace\) where \(\mathcal {R}\) is as in Equation (4.9). For each variant of cpd_hnf, the relative backward error\(\Vert \mathcal {A}- \sum _{i=1}^r \alpha _i^{\prime } \otimes \beta _i^{\prime } \otimes \gamma _i^{\prime } \Vert _F \Vert \mathcal {A}\Vert _F^{-1},\) and execution time is recorded, where \((\alpha _i^{\prime }, \beta _i^{\prime }, \gamma _i^{\prime })\) are the estimates obtained by the algorithm.
The experiment was performed on cascadelake nodes and the results are displayed in Figure 1. It is visually evident in the right panels that the Newton refinement improves the overall relative backward error for both eigs and svd by several orders of magnitude and brings both to seemingly the same level of about \(10^{-16}\)–\(10^{-12}\). However, there is a marked difference in accuracy between eigs and svdbefore Newton refinement, the svd being more accurate by about 1–3 orders of magnitude. It is interesting to compare these results with the leftmost panel of Figure 2(b), which shows the corresponding results on the same tensors for a conceptually similar state-of-the-art method from the literature. The method also requires a kernel computation that, in the displayed graph, is performed with a singular value decomposition. This means that the methods in Figure 1(c) and the left panel of Figure 2(b) can be directly compared. It can be seen that the proposed cpd_hnf_svd method, even without Newton refinement, can be up to 3 orders of magnitude more accurate (in relative backward error). The differences in accuracy are thus not exclusively attributable to the Newton refinement.
Fig. 1.
Fig. 2.
We did not include a visualization of the timings, because they visually look very similar. By adding the Newton refinement, the total execution time over all cases only modestly increased: \(3.4\%\) for eigs and \(3.8\%\) for svd. The relative increase was larger for the small cases while it was close to 0–5% for the large cases (\(n+1 \ge 20\)). Given that the accuracy can be improved by orders of magnitude, we conclude from these experiments that a few steps of Newton refinement of the approximate roots \((\beta _i,\gamma _i)\) is highly recommended.
After deciding that Newton refinement is our default choice and observing in the right panels of Figure 1 there is no significant difference in accuracy between the eigs and svd, we investigate their relative computational performance. In terms of total execution time over all cases, the eigs variant is 29.1% faster. For the largest case this increases to a 40.9% reduction in execution time.
From the above experiments, we conclude that the variant with an iterative eigensolver and Newton refinement is an appropriate default choice. In the remainder of the experiments, cpd_hnf refers to cpd_hnf_eigs+newton, the variant using Arpack’s iterative eigenvalue method applied to the Gram matrix combined with 3 Newton steps to refine the approximate roots.
6.3 Comparison with the State of the Art
In the second experiment, we compare cpd_hnf with the current state of the art in direct numerical methods for tensor rank decomposition in terms of accuracy and computational performance. The algorithm developed by Domanov and De Lathauwer [2017];, 2014] is an advanced direct numerical method that shares several high-level characteristics with cpd_hnf:
(i)
Both methods assume the target rank \(r\) is supplied to the decomposition algorithm. They operate in a similar regime of ranks and are able to treat the full range of generically \(r\)-identifiable tensors in unbalanced tensor spaces. The method by Domanov and De Lathauwer can even deal with some ranks \(r \gt \ell +1 \ge m+1 \ge n+1\), while our algorithm cannot.
(ii)
Like our method, Domanov and De Lathauwer rely on a simultaneous diagonalization procedure to extract one or two of the factor matrices.
(iii)
To deal with high ranks, both methods rely on the construction of an auxiliary matrix whose size is parameterized by an integer. By increasing this integer, the range of ranks that is covered by the algorithms is broadened at the cost of a substantially increased computational complexity. In both algorithms, the asymptotic computational complexity is determined by the cost of computing the kernel of this auxiliary matrix. Contrary to Domanov and De Lathauwer [2017];, 2014], we are able to give a precise connection between this integer and the range of ranks we can cover (subject to conjecture 1).
(iv)
When \(\ell +1 \ge m+1 \ge r\), both approaches can be considered pencil-based algorithms. For this reason, our comparison will focus on the case where \(\ell +1 \ge r \ge m+1 \ge n+1\).
A Matlab implementation of Domanov and De Lathauwer’s algorithm [Domanov and De Lathauwer 2017] is available as part of the Tensorlab+ repository [Hendrikx et al.] as cpd3_gevd in the domanov2017laa directory. We refer to it as cpd_ddl henceforth. cpd_ddl has an accurate and a fast option for computing the kernel of the auxiliary matrix. We found in our experiments that the dimension of this kernel is often misjudged by the fast implementation. Consequently, it usually increases the size of the auxiliary matrix and causes it to exceed the 16.2 GB memory limit, we put on the size of that matrix. As a concrete statistic, in the configuration with \(d=2\) below, the fast version of cpd_ddl failed in over 75% of the cases. Therefore, we exclusively compare our implementation with the accurate version cpd_ddl.
We generate random rank-\(r\) tensors \(\mathcal {A}\) as described at the start of this section. We apply both cpd_hnf and cpd_ddl to these random tensors for all of the following configurations13:
and in all cases, we take \(\ell\)\(+\)\(1 = r = \min \lbrace \mathcal {R}(m, n, (d,1)), mn\rbrace\). We do not provide our algorithm with the value of \(d\). For each input tensor, cpd_hnf determines the degree \((d,1)\) or \((1,e)\) automatically as explained in the previous subsection. For each algorithm, we record the relative backward error, total execution time, and the size of the auxiliary matrix whose kernel is computed. As the latter is the dominant operation in both algorithms, its time and memory complexity gives a good estimate of the overall complexity.
For the first configuration cpd_ddl was executed on bigmem nodes of the supercomputer. These are the same as the skylake nodes, except that they are equipped with 768 GB of main memory. The reason was that our Matlab driver routine consumed more than 162 GB of memory while running through all configurations.
The accuracy of cpd_hnf and cpd_ddl in this set of experiments is shown in Figure 2. The newly proposed algorithm is consistently several orders of magnitude more accurate in terms of the relative backward error than the state of the art.
It took over 146 h to obtain the (incomplete) results in the left panel of Figure 2(b). To put this in perspective, we note that the computation for the left panel of Figure 2(a) took a little over 17 h. The missing values inside of the triangles in Figure 2(b) indicate that cpd_ddl wanted to allocate an auxiliary matrix that would require more than 16.2 GB of memory. The same memory constraint was also imposed on cpd_hnf, but here only the largest case with \(d=4\) and \(m=n=14\) could not be treated. The red pixel for \(d = 2, m = 35, n = 13\) in Figure 2(a) indicates that cpd_hnf gave inaccurate results. This is the only case where eigs failed to find a sufficiently accurate nullspace.
The timings of our Julia implementation are shown in Figure 3. As cpd_ddl is implemented in a different programming language, we believe a direct comparison in timings is not opportune. Nevertheless, in both algorithms computing the kernel of the auxiliary matrix has the highest asymptotic time complexity. In cpd_ddl it is an \({\rm L}\)\(\times\)\({\rm L}\) square matrix, and in our algorithm, depending on the choice of degree, \(R_{I}(d,e)\) is an almost-square \({\rm M} \times {\rm N}\) matrix with \({\rm M} \approx {\rm N}\). Therefore, we decided to plot the ratio between the number of elements in the auxiliary matrix of cpd_ddl and cpd_hnf. Figure 4 visualizes this factor \(\mu = {\rm L}^2/({\rm M}{\rm N})\) in a logarithmic scale. This number indicates the fraction of memory that cpd_ddl requires relative to cpd_hnf. Raising it to the power \(\frac{3}{2}\) gives an estimate of the speedup factor in execution time of cpd_hnf relative to cpd_ddl. Based on this estimate and the execution times we logged, it is accurate to state that the newly proposed algorithm outperforms the state of the art by up to two orders of magnitude for larger tensors.
Fig. 3.
Fig. 4.
6.4 Robustness in the Noisy Case
The next experiment illustrates that cpd_hnf can successfully decompose tensors even in the presence of some noise. The setup is as follows. We generate a random rank-\(r\) tensor \(\mathcal {A}\) of size \(150 \times 25 \times 10\) by randomly sampling the \(\alpha _i,\beta _i\), and \(\gamma _i\) as before. Then, we add white Gaussian noise of relative magnitude 10e for \(e=-1,\ldots ,-15\); that is, we compute \(\mathcal {A}^{\prime } = \mathcal {A}+ 10^{e} \frac{\Vert \mathcal {A}\Vert _F}{\Vert \mathcal {E}\Vert _F} \mathcal {E}\). We provide \(\mathcal {A}^{\prime }\) as input to our algorithm and request a rank-\(r\) decomposition.
The relative backward error between \(\mathcal {A}^{\prime }\) and the computed rank-\(r\) decomposition is shown in a logarithmic scale in Figure 5. Because of our setup, the rank-\(r\) CPD of \(\mathcal {A}\) has relative backward error 10e. A good tensor decomposition algorithm should thus return a CPD with a backward error of at most 10e. Remarkably, for tensors with random rank-\(r\) CPDs, the proposed algorithm consistently manages to reach this benchmark when \(e \le -5\). For ranks up to about half the maximum range (from \(r=1\) to 70), it even consistently manages to reach this benchmark for white noise of magnitude at most \(10^{-2}\)! Based on these results, we anticipate that cpd_hnf could be employed as a rank-\(r\)approximation algorithm in the high signal-to-noise regime. We believe this observation warrants further research.
Fig. 5.
6.5 An Example of Higher-order Tensors
The last experiment illustrates that the reshaping trick combined with a decomposition algorithm that works well in the unbalanced regime (such as cpd_hnf and cpd_ddl [Domanov and De Lathauwer 2014;, 2017]) is a powerful technique for decomposing high-order, high-rank tensors, even with a balanced shape.
As an example, we generated an eighth-order real tensor of size \(7 \times 7 \times 7 \times 7 \times 6 \times 6 \times 5 \times 5\) and rank \(r = 1,000\) with factor matrices whose entries are sampled i.i.d. from a standard normal distribution. Here is what happens:
To our knowledge, this computation represents the first time any tensor decomposition algorithm of any type (i.e., alternating least squares, optimization-based methods, direct algebraic methods, homotopy-based methods, or heuristic methods) successfully decomposes a high-order rank-1000 tensor that cannot be reshaped to an order-3 tensor whose CPD can be computed with a pencil-based algorithm.
7 Conclusions
The cpd_hnf algorithm proposed in this article computes the CPD of tensors satisfying assumption 2.1 using numerical linear algebra techniques for solving systems of polynomial equations. Its complexity is governed (proposition 5.11.1) by the regularity of a homogeneous, \(\mathbb {N}^2\)-graded ideal obtained from a flattening. We presented a formula for degrees \((d,1)\) in the regularity for generic tensors of many formats (theorem 4.2) and proposed conjecture 1 for the general case. Our experiments show that cpd_hnf produces backward errors that are almost always of the order of the machine precision. This improves upon the previous state of the art by several orders of magnitude (see Figure 2). In the high signal-to-noise-ratio regime, it seems the algorithm can be used to approximate noisy rank-\(r\) tensors.
Possible directions for future research include a further analysis of the regularity, a relaxation of the conditions in assumption 2.1, generalizations for (semi-)symmetric tensors, and a theoretical analysis of cpd_hnf in the noisy setting.
Footnotes
1
A property on a variety \(\mathcal {V}\) is “generic” if the locus where the property does not hold is contained in a Zariski closed subset.
2
The tensor product is defined uniquely by the linear space into which it maps by universality [Greub 1978], so we do not make a distinction in notation.
3
Often, we will require \((d,e) \ne (1,1)\) and \((d-1,e-1)\ge (0,0)\). This makes sense by lemma 3.2. The problem of how to find such a tuple \((d,e)\) will be the topic of Section 4.
4
If \(d = 1\) and \(e \ge 2\), then the roles of \(d\) and \(e\) can be swapped so that the approach still works.
5
The decomposition problem for a rank-\(r\) tensor in \(\mathbb {C}^{(\ell +1)\times (m+1)\times (n+1)}\) with \(r \le m+1 \le \ell +1\) can always be reduced to this so-called concise case [Landsberg 2012; Bürgisser et al. 1997] by computing an orthogonal Tucker decomposition [Tucker 1966] followed by a rank-\(r\) decomposition of the core tensor.
6
The notation is similar to Section 2. Here, we omit the restriction to the subspace \(B\) (or, equivalently, we take \(B = S_{(1,1)}\)). We also drop the transpose on \(w_i\), as we will think of them as column vectors instead of row vectors in this section.
7
For the reader who is familiar with algebraic geometry, we note that this is our motivation for associating the ideal \(J(Z)\) to \(Z\), instead of the usual vanishing ideal of the points in \(Z\). These are different ideals, as \(J(Z)\) is usually not saturated with respect to the irrelevant ideal of \(S\).
8
This follows from the fact that the Segre variety is not contained in a hyperplane.
A tensor \(\mathcal {A}\) not satisfying this constraint on \(\ell\) can always be represented in new bases by a coordinate array \(\mathcal {B}\) in a concise [Landsberg 2012; Bürgisser et al. 1997] tensor space \(\mathbb {C}^{\ell ^{\prime }+1} \otimes \mathbb {C}^{m^{\prime }+1}\otimes \mathbb {C}^{n^{\prime }+1}\) with \(\ell ^{\prime } \le \ell\), \(m^{\prime } \le m\), and \(n^{\prime }\le n\). After permutation of the factors, a concise tensor space \(\mathbb {C}^{k_1}\otimes \mathbb {C}^{k_2}\otimes \mathbb {C}^{k_3}\) always satisfies \(k_2k_3 \ge k_1 \ge k_2 \ge k_3\); see, e.g., Carlini and Kleppe [2011]. In practice, \(\mathcal {B}\) can be obtained as the core tensor of the (sequentially) truncated higher-order singular value decomposition [De Lathauwer et al. 2000; Vannieuwenhoven et al. 2012].
11
Additional packages are used to support our experimental setup, but these are not required for the main algorithm.
12
For more details, see the function minimumMultiDegree in NormalFormCPD.jl.
13
Both algorithms were applied to the same rank-\(r\) tensors. To deal with the different programming languages and to limit storage demands, we generated a buffer of \(10^7\) reals sampled i.i.d. from a standard normal distribution. The \(\alpha _i\), \(\beta _i\), and \(\gamma _i\) of the true decomposition were then generated from this buffer. This entails there is a statistical correlation between the various tensors that are decomposed. However, we judged that this does not affect the validity of our conclusions. For each line in the set of configurations, a different random buffer was generated.
A Proof of the Technical Results
For two ideals \(I, K \subset S\), we write \((I: K^\infty) = \lbrace f \in S ~|~ K^k f \subset I \text{ for some } k \in \mathbb {N}\rbrace .\) The next result follows from lemma 2.2.
Acknowledgments
We are grateful to Alessandra Bernardi, Fulvio Gesmundo, and Bernard Mourrain for fruitful discussions. We thank Ignat Domanov and Lieven de Lathauwer for kindly sharing a Matlab implementation of cpd_ddl prior to its publication in the Tensorlab+ repository.
We thank the three anonymous reviewers for their detailed feedback on the manuscript that, in particular, led to the inclusion of sections 2.2 and 6.2. We thank an anonymous referee for suggesting shorter proofs of proposition 4.2 and theorem 4.1.
References
[1]
C. Beltrán, P. Breiding, and N. Vannieuwenhoven. 2019. Pencil-based algorithms for tensor rank decomposition are not stable. SIAM J. Matrix Anal. Appl. 40, 2 (2019), 739–773.
A. Bernardi, J. Brachat, P. Comon, and B. Mourrain. 2011. Multihomogeneous polynomial decomposition using moment matrices. In Proceedings of the 36th International Symposium on Symbolic and Algebraic Computation. 35–42.
A. Bernardi, J. Brachat, P. Comon, and B. Mourrain. 2013. General tensor decomposition, moment matrices and applications. J. Symbolic Comput. 52 (2013), 51–71.
P. Bürgisser, M. Clausen, and A. Shokrollahi. 1997. Algebraic Complexity Theory. Grundlehren der mathematischen Wissenschaften, Vol. 315. Springer-Verlag, Berlin.
L. Chiantini, G. Ottaviani, and N. Vannieuwenhoven. 2014. An algorithm for generic and low-rank specific identifiability of complex tensors. SIAM J. Matrix Anal. Appl. 35, 4 (2014), 1265–1287.
L. Chiantini, G. Ottaviani, and N. Vannieuwenhoven. 2017. Effective criteria for specific identifiability of tensors and forms. SIAM J. Matrix Anal. Appl. 38, 2 (2017), 656–681.
D. A. Cox, J. Little, and D. O’Shea. 2013. Ideals, Varieties, and Algorithms: An Introduction to Computational Algebraic Geometry and Commutative Algebra. Springer Science & Business Media.
L. De Lathauwer. 2006. A link between the canonical decomposition in multilinear algebra and simultaneous matrix diagonalization. SIAM J. Matrix Anal. Appl. 28, 3 (2006), 642–666.
I. Domanov and L. De Lathauwer. 2013. On the uniqueness of the canonical polyadic decomposition of third-order tensors—part II: Uniqueness of the overall decomposition. SIAM J. Matrix Anal. Appl. 34, 3 (2013), 876–903.
I. Domanov and L. De Lathauwer. 2014. Canonical polyadic decomposition of third-order tensors: Reduction to generalized eigenvalue decomposition. SIAM J. Matrix Anal. Appl. 35, 2 (2014), 636–660.
I. Domanov and L. De Lathauwer. 2017. Canonical polyadic decomposition of third-order tensors: Relaxed uniqueness conditions and algebraic algorithm. Linear Alg. Appl. 513 (2017), 342–375.
D. R. Grayson and M. E. Stillman. 2019. Macaulay2, a software system for research in algebraic geometry. software version: v1.18. Retrieved from http://www.math.uiuc.edu/Macaulay2/.
J. B. Kruskal. 1977. Three-way arrays: Rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Alg. Appl. 18 (1977), 95–138.
Y.-Ch. Kuo and T.-L. Lee. 2018. Computing the unique CANDECOMP/PARAFAC decomposition of unbalanced tensors by homotopy method. Linear Alg. Appl. 556 (2018), 238–264.
A. Lorber. 1985. Features of quantifying chemical composition from two-dimensional data array by the rank annihilation factor analysis method. Anal. Chem. 57 (1985), 2395–2397.
I. V. Oseledets, D. V. Savostianov, and E. E. Tyrtyshnikov. 2008. Tucker dimensionality reduction of three-dimensional arrays in linear time. SIAM J. Matrix Anal. Appl. 30, 3 (2008), 939–956.
N. D. Sidiropoulos, L. De Lathauwer, X. Fu, K. Huang, E. E. Papalexakis, and Ch. Faloutsos. 2017. Tensor decomposition for signal processing and machine learning. IEEE Trans. Signal. Process. 65, 13 (2017), 3551–3582.
S. Telen, B. Mourrain, and M. Van Barel. 2018. Solving polynomial systems via truncated normal forms. SIAM J. Matrix Anal. Appl. 39, 3 (2018), 1421–1447.
N. Vannieuwenhoven, R. Vandebril, and K. Meerbergen. 2012. A new truncation strategy for the higher-order singular value decomposition. SIAM J. Sci. Comput. 34, 2 (2012), A1027–A1052.
We prove the existence of an open set of $n_1\times n_2 \times n_3$ tensors of rank $r$ for which popular and efficient algorithms for computing tensor rank decompositions based on a reduction to a linear matrix pencil, typically followed by a generalized ...
LVA/ICA 2015: Proceedings of the 12th International Conference on Latent Variable Analysis and Signal Separation - Volume 9237
CANDECOMP/PARAFAC CP approximates multiway data by a sum of rank-1 tensors. Our recent study has presented a method to rank-1 tensor deflation, i.e. sequential extraction of rank-1 tensor components. In this paper, we extend the method to block ...
The aim of Compressing sensing (CS) is to acquire an original signal, when it is sampled at a lower rate than Nyquist rate previously. In the framework of CS, the original signal is often assumed to be sparse and correlated in some domain. Recently, ...
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].