Abstract
In this work, we develop a new complexity metric for an important class of low-rank matrix optimization problems in both symmetric and asymmetric cases, where the metric aims to quantify the complexity of the nonconvex optimization landscape of each problem and the success of local search methods in solving the problem. The existing literature has focused on two recovery guarantees. The RIP constant is commonly used to characterize the complexity of matrix sensing problems. On the other hand, the incoherence and the sampling rate are used when analyzing matrix completion problems. The proposed complexity metric has the potential to generalize these two notions and also applies to a much larger class of problems. To mathematically study the properties of this metric, we focus on the rank-1 generalized matrix completion problem and illustrate the usefulness of the new complexity metric on three types of instances, namely, instances with the RIP condition, instances obeying the Bernoulli sampling model, and a synthetic example. We show that the complexity metric exhibits a consistent behavior in the three cases, even when other existing conditions fail to provide theoretical guarantees. These observations provide a strong implication that the new complexity metric has the potential to generalize various conditions of optimization complexity proposed for different applications. Furthermore, we establish theoretical results to provide sufficient conditions and necessary conditions for the existence of spurious solutions in terms of the proposed complexity metric. This contrasts with the RIP and incoherence conditions that fail to provide any necessary condition.
Similar content being viewed by others
Notes
A point \(U^0\) is called a spurious local minimum if it is a local minimum of problem (1.2) and \(U^0\left( U^0\right) ^T \ne M^*\).
We note that this definition is different from the common definition of a maximum independent set, which only requires that a maximum independent set is not a proper subset of an independent set.
A point \(u\in \mathbb {R}^n\) is called a spurious second-order critical point if it satisfies the first-order and the second-order necessary optimality conditions and \(uu^T \ne u^*(u^*)^T\).
References
Agarwal, A., Anandkumar, A., Jain, P., Netrapalli, P.: Learning sparsely used overcomplete dictionaries via alternating minimization. SIAM J. Optim. 26(4), 2775–2799 (2016)
Ahn, K., Suarez, F.: Riemannian perspective on matrix factorization. arXiv preprint arXiv:2102.00937 (2021)
Aigner, M.: Turán’s graph theorem. Am. Math. Mon. 102(9), 808–816 (1995)
Ajayi, T., Mildebrath, D., Kyrillidis, A., Ubaru, S., Kollias, G., Bouchard, K.: Provably convergent acceleration in factored gradient descent with applications in matrix sensing. arXiv preprint arXiv:1806.00534 (2018)
Allen-Zhu, Z., Li, Y.: Neon2: Finding local minima via first-order oracles. Adv. Neural Inform. Process. Syst. 31 (2018)
Bi, Y., Lavaei, J.: On the absence of spurious local minima in nonlinear low-rank matrix recovery problems. In: International Conference on Artificial Intelligence and Statistics, pp. 379–387. PMLR (2021)
Bi, Y., Zhang, H., Lavaei, J.: Local and global linear convergence of general low-rank matrix recovery problems. In: Proceedings of 36th AAAI Conference on Artificial Intelligence (AAAI), pp. 1–9. Vancouver, Canada (2022)
Burer, S., Monteiro, R.D.: A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math. Program. 95(2), 329–357 (2003)
Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM (JACM) 58(3), 1–37 (2011)
Candes, E.J., Li, X., Soltanolkotabi, M.: Phase retrieval via wirtinger flow: theory and algorithms. IEEE Trans. Inform. Theory 61(4), 1985–2007 (2015)
Candes, E.J., Plan, Y.: Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements. IEEE Trans. Inform. Theory 57(4), 2342–2359 (2011)
Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717–772 (2009)
Candès, E.J., Tao, T.: The power of convex relaxation: near-optimal matrix completion. IEEE Trans. Inform. Theory 56(5), 2053–2080 (2010)
Cartis, C., Gould, N.I., Toint, P.L.: Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Math. Program. 127(2), 245–295 (2011)
Charisopoulos, V., Chen, Y., Davis, D., Díaz, M., Ding, L., Drusvyatskiy, D.: Low-rank matrix recovery with composite optimization: good conditioning and rapid convergence. Found. Comput. Math. 21(6), 1505–1593 (2021)
Chen, J., Li, X.: Model-free nonconvex matrix completion: local minima analysis and applications in memory-efficient kernel PCA. J. Mach. Learn. Res. 20(142), 1–39 (2019)
Chen, J., Liu, D., Li, X.: Nonconvex rectangular matrix completion via gradient descent without \(\ell _{2,\infty }\) regularization. IEEE Trans. Inform. Theory 66(9), 5806–5841 (2020)
Chen, Y., Chi, Y.: Harnessing structures in big data via guaranteed low-rank matrix estimation: recent theory and fast algorithms via convex and nonconvex optimization. IEEE Signal Process. Mag. 35(4), 14–31 (2018)
Chen, Y., Chi, Y., Fan, J., Ma, C.: Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval. Math. Program. 176(1), 5–37 (2019)
Chen, Y., Chi, Y., Fan, J., Ma, C., Yan, Y.: Noisy matrix completion: understanding statistical guarantees for convex relaxation via nonconvex optimization. SIAM J. Optim. 30(4), 3098–3121 (2020)
Chen, Y., Fan, J., Ma, C., Yan, Y.: Bridging convex and nonconvex optimization in robust PCA: noise, outliers and missing data. Ann. Stat. 49(5), 2948–2971 (2021)
Chi, Y., Lu, Y.M., Chen, Y.: Nonconvex optimization meets low-rank matrix factorization: an overview. IEEE Trans. Signal Process. 67(20), 5239–5269 (2019)
Chou, H.H., Gieshoff, C., Maly, J., Rauhut, H.: Gradient descent for deep matrix factorization: dynamics and implicit bias towards low rank. arXiv preprint arXiv:2011.13772 (2020)
Fattahi, S., Sojoudi, S.: Exact guarantees on the absence of spurious local minima for non-negative rank-1 robust principal component analysis. J. Mach. Learn. Res. (2020)
Ge, R., Jin, C., Zheng, Y.: No spurious local minima in nonconvex low rank problems: a unified geometric analysis. In: International Conference on Machine Learning, pp. 1233–1242. PMLR (2017)
Ge, R., Lee, J.D., Ma, T.: Matrix completion has no spurious local minimum. Adv. Neural Inform. Process. Syst. 2981–2989 (2016)
Hardt, M.: Understanding alternating minimization for matrix completion. In: 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pp. 651–660. IEEE (2014)
Hardt, M., Wootters, M.: Fast matrix completion without the condition number. In: Conference on Learning Theory, pp. 638–678. PMLR (2014)
Hou, T.Y., Li, Z., Zhang, Z.: Fast global convergence for low-rank matrix recovery via riemannian gradient descent with random initialization. arXiv preprint arXiv:2012.15467 (2020)
Jain, P., Meka, R., Dhillon, I.: Guaranteed rank minimization via singular value projection. Adv. Neural Inform. Process. Syst. 23 (2010)
Jain, P., Netrapalli, P., Sanghavi, S.: Low-rank matrix completion using alternating minimization. In: Proceedings of the Forty-fifth Annual ACM Symposium on Theory of Computing, pp. 665–674. (2013)
Jin, C., Ge, R., Netrapalli, P., Kakade, S.M., Jordan, M.I.: How to escape saddle points efficiently. In: International Conference on Machine Learning, pp. 1724–1732. PMLR (2017)
Jin, C., Netrapalli, P., Jordan, M.I.: Accelerated gradient descent escapes saddle points faster than gradient descent. In: Conference On Learning Theory, pp. 1042–1085. PMLR (2018)
Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B.: Gradient descent only converges to minimizers. In: Conference on Learning Theory, pp. 1246–1257. PMLR (2016)
Levin, E., Kileel, J., Boumal, N.: The effect of smooth parametrizations on nonconvex optimization landscapes. arXiv preprint arXiv:2207.03512 (2022)
Li, X., Zhu, Z., Man-Cho So, A., Vidal, R.: Nonconvex robust low-rank matrix recovery. SIAM J. Optim. 30(1), 660–686 (2020)
Li, Y., Ma, T., Zhang, H.: Algorithmic regularization in over-parameterized matrix sensing and neural networks with quadratic activations. In: Conference On Learning Theory, pp. 2–47. PMLR (2018)
Luo, Y., Li, X., Zhang, A.R.: Nonconvex factorization and manifold formulations are almost equivalent in low-rank matrix optimization. arXiv preprint arXiv:2108.01772 (2021)
Ma, J., Fattahi, S.: Sign-rip: a robust restricted isometry property for low-rank matrix recovery. arXiv preprint arXiv:2102.02969 (2021)
Ma, Z., Bi, Y., Lavaei, J., Sojoudi, S.: Sharp restricted isometry property bounds for low-rank matrix recovery problems with corrupted measurements. arXiv preprint arXiv:2105.08232 (2021)
Netrapalli, P., Jain, P., Sanghavi, S.: Phase retrieval using alternating minimization. Adv. Neural Inform. Process. Syst. 26 (2013)
Netrapalli, P., Un, N., Sanghavi, S., Anandkumar, A., Jain, P.: Non-convex robust PCA. Adv. Neural Inform. Process. Syst. 27 (2014)
Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)
Renegar, J.: Linear programming, complexity theory and elementary functional analysis. Math. Program. 70(1), 279–351 (1995)
Renegar, J.: Condition numbers, the barrier method, and the conjugate-gradient method. SIAM J. Optim. 6(4), 879–912 (1996)
Stöger, D., Soltanolkotabi, M.: Small random initialization is akin to spectral learning: optimization and generalization guarantees for overparameterized low-rank matrix reconstruction. Adv. Neural Inform. Process. Syst. 34 (2021)
Sun, J., Qu, Q., Wright, J.: Complete dictionary recovery over the sphere I: overview and the geometric picture. IEEE Trans. Inform. Theory 63(2), 853–884 (2016)
Sun, J., Qu, Q., Wright, J.: A geometric analysis of phase retrieval. Found. Comput. Math. 18(5), 1131–1198 (2018)
Sun, R., Luo, Z.Q.: Guaranteed matrix completion via non-convex factorization. IEEE Trans. Inform. Theory 62(11), 6535–6579 (2016)
Tong, T., Ma, C., Chi, Y.: Accelerating ill-conditioned low-rank matrix estimation via scaled gradient descent. J. Mach. Learn. Res. 22(150), 1–63 (2021)
Tong, T., Ma, C., Chi, Y.: Low-rank matrix recovery with scaled subgradient methods: fast and robust convergence without the condition number. IEEE Trans. Signal Process. 69, 2396–2409 (2021)
Tong, T., Ma, C., Prater-Bennette, A., Tripp, E., Chi, Y.: Scaling and scalability: provable nonconvex low-rank tensor estimation from incomplete measurements. arXiv preprint arXiv:2104.14526 (2021)
Tu, S., Boczar, R., Simchowitz, M., Soltanolkotabi, M., Recht, B.: Low-rank solutions of linear matrix equations via procrustes flow. In: International Conference on Machine Learning, pp. 964–973. PMLR (2016)
Wei, K., Cai, J.F., Chan, T.F., Leung, S.: Guarantees of riemannian optimization for low rank matrix recovery. SIAM J. Matrix Anal. Appl. 37(3), 1198–1222 (2016)
Wei, K., Cai, J.F., Chan, T.F., Leung, S.: Guarantees of riemannian optimization for low rank matrix completion. Inverse Probl. Ima. 14(2) (2020)
Yalcin, B., Zhang, H., Lavaei, J., Sojoudi, S.: Factorization approach for low-complexity matrix completion problems: exponential number of spurious solutions and failure of gradient methods. In: International Conference on Artificial Intelligence and Statistics, pp. 1–9. PMLR (2022)
Yi, X., Park, D., Chen, Y., Caramanis, C.: Fast algorithms for robust pca via gradient descent. Adv. Neural Inform. Process. Syst. 29 (2016)
Yurtsever, A., Tropp, J.A., Fercoq, O., Udell, M., Cevher, V.: Scalable semidefinite programming. SIAM J. Math. Data Sci. 3(1), 171–200 (2021)
Zhang, H., Bi, Y., Lavaei, J.: General low-rank matrix optimization: geometric analysis and sharper bounds. Adv. Neural Inform. Process. Syst. 34 (2021)
Zhang, H., Yalcin, B., Lavaei, J., Sojoudi, S.: A new complexity metric for nonconvex rank-one generalized matrix completion. arXiv preprint arXiv:2204.02364 (2022)
Zhang, J., Fattahi, S., Zhang, R.: Preconditioned gradient descent for over-parameterized nonconvex matrix factorization. Adv. Neural Inform. Process. Syst. 34 (2021)
Zhang, R.Y., Sojoudi, S., Lavaei, J.: Sharp restricted isometry bounds for the inexistence of spurious local minima in nonconvex matrix recovery. J. Mach. Learn. Res. 20(114), 1–34 (2019)
Zheng, Q., Lafferty, J.: A convergent gradient descent algorithm for rank minimization and semidefinite programming from random linear measurements. In: Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 1, pp. 109–117 (2015)
Zhu, Z., Li, Q., Tang, G., Wakin, M.B.: Global optimality in low-rank matrix optimization. IEEE Trans. Signal Process. 66(13), 3614–3628 (2018)
Acknowledgements
This work was supported by grants from ARO, AFOSR, ONR and NSF.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We note that a similar complexity metric based on a special case of instances in Sect. 3.3 was proposed in our conference paper [56]. However, the complexity metric in this work has a different form and is proved to work on a broader set of applications. In addition, we prove several theoretical properties of the metric in this work, which are not included in [56].
Appendices
A Proofs in Sect. 2
1.1 A.1 Proof of Theorem 2.4
The proof of Theorem 2.4 relies on the following two lemmas, which transform the computation of \(\mathbb {D}_\alpha ^{min}\) into a one-dimensional optimization problem. The first lemma upper-bounds the maximum possible distance.
Lemma A.1
Suppose that \(n\ge 2\). It holds that
where the function \(g(\alpha ,c)\) is defined by
We denote \(g_i(\alpha ,c)\) be the i-th term in the above minimization for all \(i\in \{1,\dots ,6\}\). The next lemma proves the other direction.
Lemma A.2
Suppose that \(n\ge 2\). It holds that
where the function \(g(\alpha ,c)\) is defined in Lemma A.1.
The proof of Lemmas A.1 and A.2 can be found in the online version [60].
Proof of Theorem 2.4
By the results of Lemmas A.1 and A.2, we only need to compute \(\max _{c\in \left[ 0,\frac{1}{n(n-1)}\right] } g(\alpha ,c)\). Let \(\kappa := (1-\alpha )/\alpha \in [0,+\infty ]\). We study three cases below.
Case 1. We first consider the case when \(\kappa \ge 2(n-3) / [(n-4)(n-1)]\). We prove that \(g(\alpha ,c) = g_2(\alpha ,c)\). Since \(g_2(\alpha ,c)\) has a larger gradient than \(g_1(\alpha ,c)\) and the function \(g_i(\alpha ,c)\) is decreasing in c for \(i=3,4,5,6\), we only need to show that
The above inequality with \(i=1\) is equivalent to \(\kappa \ge {2}/(n-1)\), which is guaranteed by the assumption that \(\kappa \ge 2(n-3) / [(n-4)(n-1)]\). For \(i\in \{3,4,5,6\}\), the inequality (A.1) is equivalent to
Therefore, it holds that
whose maximum is attained at \(c=[n(n-1)]^{-1}\) and
The other two cases can be proved in the same way and the proof can be found in the online version [60]. \(\square \)
B Proofs in Sect. 3
1.1 B.1 Proof of Theorem 3.2
Before proving the estimation of the complexity metric, we prove two properties of \(\mu \)-incoherent vectors.
Lemma B.1
Given any constant \(\mu \in [1,n]\), suppose that \(u^*\) has incoherence \(\mu \) and \(\Vert u^*\Vert _1 = 1\). Then, the following properties hold:
-
1.
\(u^*\) has at least \(n / \mu \) nonzero components;
-
2.
\(|u_i^*| \le \mu / n\) for all \(i\in [n]\).
The following lemma lower-bounds the perturbation of the weight matrix C.
Lemma B.2
Suppose that the instance \(\mathcal{M}\mathcal{C}\left( C,u^*\right) \) satisfies the \(\delta \)-RIP\(_{2,2}\) condition and the weight matrix \({\tilde{C}}\in \mathbb {S}^{n^2-1}_{+,1}\) has N zero entries, where \(\delta \in [0,1)\) and \(\mathcal {N}\subset [n] \times [n]\). Then, it holds that
where \(\mathcal {N}\) is the set of indices of zero entries of \({\tilde{C}}\).
The proofs of Lemmas B.1 and B.2 are direct calculations and can be found in the online version [60]. Now, we prove the main theorem.
Proof of Theorem 3.2
Suppose that \(\mathcal{M}\mathcal{C}({\tilde{C}},{\tilde{u}}^*)\in {\overline{\mathcal {D}}}\) is the instance such that
In the following, we split the proof into two steps.
Step I. We first fix \({\tilde{u}}^*\) and consider the closest matrix \({\tilde{C}}\) to C such that \(({\tilde{C}}, {\tilde{u}}^*)\in {\overline{\mathcal {D}}}\). Let \(k:=|\mathcal {I}_1({\tilde{C}}, {\tilde{u}}^*)|\). Without loss of generality, we assume that
We first consider the case when \(k\ge 2\). If \(\mathbb {G}_1({\tilde{C}}, {\tilde{u}}^*)\) is disconnected, at least \(2(k-1)\) entries of \({\tilde{C}}\) are 0. If \(\mathbb {G}_1({\tilde{C}}, {\tilde{u}}^*)\) are bipartite, at least \(k^2/2 \ge 2(k-1)\) entries of \({\tilde{C}}\) are 0. If \(\mathcal {I}_{00}({\tilde{C}}, {\tilde{u}}^*)\) is non-empty, at least 2k entries of \({\tilde{C}}\) are 0. Otherwise if \(k = 1\), at least one entry of \({\tilde{C}}\) should be 0 to make \(\mathbb {G}_1({\tilde{C}}, {\tilde{u}}^*)\) bipartite. In summary, at least N(k) entries of \({\tilde{C}}\) are 0, where
Using the results in Lemma B.2, the distance between C and \({\tilde{C}}\) is at least
We note that the distance is monotonously increasing as a function of k.
Step II. Now, we consider the optimal choice of \({\tilde{u}}^*\) based on the lower bound in (B.1). Let
Since the distance between C and \({\tilde{C}}\) is a monotonously increasing function of k, the minimum distance between \(\left( C,u^*\right) \) and \(({\tilde{C}},{\tilde{u}}^*)\) cannot be attained by \(k>\ell \). Therefore, we focus on the case when \(k\le \ell \). Without loss of generality, we assume that
Then, the distance between \(u^*\) and \({\tilde{u}}^*\) satisfies
Denote the distance between \(\left( C,u^*\right) \) and \(({\tilde{C}},{\tilde{u}}^*)\) by
Step II-1. We first consider the case when \(\mu \le 2n/3\). Combining inequalities (B.1) and (B.2), we obtain a lower bound on \(d_\alpha \):
For every \(k\in [\ell ]\), the term inside the above minimization can be lower-bounded by
The minimum of the right-hand side over \(k\in [\ell ]\) can be solved in closed form and is equal to
Using the second property in Lemma B.1, we have
Taking the summation over \(k\in \{2,\dots ,\ell \}\), we can conclude that
Using the second property in Lemma B.1 and \(\Vert u^*\Vert _1= 1\), it follows that
Substituting back into inequality (B.3), we have
Step II-2. Next, we consider the case when \(\mu \ge 2n/3\). By Theorem 3.1, the distance is at least
where the second inequality is due to the assumption that \(\mu \ge 2n/3\).
By combining Steps II-1 and II-2, the distance is lower-bounded by
The proof is completed by using the relation between \(d_\alpha \) and \(\mathbb {T}_\alpha \left( C,u^*\right) \). \(\square \)
1.2 B.2 Reduction of problem (3.7)
Before discussing the properties of problem instances in Sect. 3.3, we prove that the SSCPs of the instance \(\mathcal{M}\mathcal{C}(C^\epsilon , u^*)\) are closely related to those of the m-dimensional problem
Lemma B.3
If problem (B.4) has no SSCPs, then the instance \(\mathcal{M}\mathcal{C}(C^\epsilon , u^*)\) has no SSCPs. In addition, given a number \(N\in \mathbb {N}\), suppose that problem (B.4) has N SSCPs with nonzero components at which the objective function has a positive definite Hessian matrix. Then, the instance \(\mathcal{M}\mathcal{C}(C^\epsilon , u^*)\) has at least N spurious local minima.
The proof of Lemma B.3 is a direct calculation and can be found in the online version [60].
1.3 B.3 Proof of Theorem 3.5
To simplify the notations in the following proofs, we denote the gradient and the Hessian matrix of the objective function of problem (B.4) by
The following theorem guarantees that the instance \(\mathcal{M}\mathcal{C}(C^\epsilon , u^*)\) does not have spurious local minima when \(\epsilon \ge O(m^{-1})\).
Theorem B.1
If \(\epsilon > 18 / m\), the instance \(\mathcal{M}\mathcal{C}(C^\epsilon , u^*)\) does not have SSCPs, namely, all second-order critical points are global minima associated with the ground truth solution \(M^*\).
The proof of Theorem B.1 can be found in the online version [60]. Then, we consider the regime of \(\epsilon \) where the instance \(\mathcal{M}\mathcal{C}(C^\epsilon , u^*)\) has spurious solutions. The following theorem studies the case when m is an even number.
Theorem B.2
Suppose that m is an even number. If \(\epsilon < 1 / (m+1)\), then the instance \(\mathcal{M}\mathcal{C}(C^\epsilon , u^*)\) has at least \(2^{m/2}\) spurious local minima.
Proof
By Lemma B.3, we only need to show that problem (B.4) has at least \(\left( {\begin{array}{c}m\\ m/2\end{array}}\right) \) SSCPs whose associated Hessian matrices are positive definite and whose components are nonzero. We consider a point \(x^0\in \mathbb {R}^m\) such that
The above equations have a solution since m is an even number. By a direct calculation, we can verify that the gradient \(g(x^0;\epsilon )\) is equal to 0. We only need to show that the Hessian matrix \(H(x^0;\epsilon )\) is positive definite, namely
The above condition is equivalent to
Under the normalization constraint \(\Vert c\Vert _2 = 1\), the Cauchy inequality implies that the minimum of the left-hand side is attained by
Therefore, the Hessian is positive definite if and only if
By substituting \(\left( x_1^0\right) ^2=(1-\epsilon )/[1+(m-1)\epsilon ]\), the above condition is equivalent to
Using the condition that \((m+1)\epsilon < 1\), we obtain that
where the first inequality is from the fact that \(m\ge 2\), which follows from the assumption that \(m>0\) is an even number.
To estimate the number of SSCPs, we observe that m/2 components of \(x^0\) have a positive sign and the other m/2 components have a negative sign. Hence, there are at least
spurious SSCPs. The estimate on the combinatorial number is in light of the inequality \(\left( {\begin{array}{c}n\\ k\end{array}}\right) \ge (n/k)^k\). \(\square \)
The estimation of the odd number case is similar and we present the result in the following theorem.
Theorem B.3
Suppose that m is an odd number. If \(\epsilon < 1 / [13(m+1)]\), then the instance \(\mathcal{M}\mathcal{C}(C^\epsilon , u^*)\) has at least \([{2m}/(m+1)]^{(m+1)/2}\) spurious local minima.
The proof of Theorem B.3 is similar to that of Theorem B.2 and can be found in the online version [60]. By combining Theorems B.1–B.3, we complete the proof of Theorem 3.5.
1.4 B.4 Proof of Theorem 3.6
The proof of Theorem 3.6 relies on the following lemma, which calculates the complexity metric of the instance \(\mathcal{M}\mathcal{C}(C^\epsilon ,u^*)\). The proof of Lemma B.4 is similar to that of Theorem 2.4.
Lemma B.4
Suppose that \(n\ge m \ge 5\), \(\alpha \in [0,1]\) and \(\epsilon \in [0,1]\). The complexity metric \(\mathbb {D}_\alpha (C^\epsilon ,u^*)\) has the closed form
Moreover, \(\mathbb {D}_\alpha (C^\epsilon ,u^*)\) is strictly decreasing in \(\epsilon \) on [0, 1/2].
The proof of Lemma B.4 can be found in the online version [60]. Combining Theorem 3.5 and Lemma B.4, we are able to estimate the range of the complexity metric.
Proof of Theorem 3.6
By defining constants \(\delta :=1/26\) and \(\Delta :=18\), Theorem 3.5 implies that
-
1.
If \(\epsilon < \delta / m\), the instance \(\mathcal{M}\mathcal{C}(C^\epsilon , u^*)\) has spurious local minima;
-
2.
If \(\epsilon > \Delta / m\), the instance \(\mathcal{M}\mathcal{C}(C^\epsilon , u^*)\) has no spurious local minima.
Then, we study two different cases.
Case I. We first consider the case when \(m \epsilon \) is large. Since \(\epsilon < \Delta /m \le 1/2\), the threshold is located in the regime where \(\mathbb {D}_\alpha (C^\epsilon ,u^*)\) is strictly decreasing. Hence, it suffices to show that
is a lower bound on \(\mathbb {D}_\alpha (C^\epsilon ,u^*)\) when \(\epsilon = \Delta / m\). By Lemma B.4, it holds that
Since the graph \(\mathbb {G}\) does not contain any independence set with \(m+1\) nodes, Turán’s theorem [3] implies that the graph \(\mathbb {G}\) has at least \(n^2 / (2m)\) edges, namely,
We note that the above bound is asymptotically tight and is attained by the Turán graph. Hence, we obtain that
By substituting into the estimate of \(\mathbb {D}_\alpha (C^\epsilon ,u^*)\), it follows that
Case II. Next, we consider the case when \(\epsilon m\) is small. Similar to Case I, it suffices to show that
is an upper bound for \(\mathbb {D}_\alpha (C^\epsilon ,u^*)\) when \(\epsilon = \delta / m\). Since \(\delta < 1/2\), we have
By Lemma B.4, it holds that
where the last inequality is from \(m \ge 36\). Since \(\epsilon \le 1\), the definition of \(Z_\epsilon \) implies that \(Z_\epsilon \le n^2\). By substituting into the estimate of \(\mathbb {D}_\alpha (C^\epsilon ,u^*)\), it follows that
By combining Cases I and II, we complete the proof. \(\square \)
C Proofs in Sect. 4
1.1 C.1 Proof of Theorem 4.3
The proof of Theorem 4.3 directly follows from the next two lemmas.
Lemma C.1
Suppose that \(\left( C,u^*\right) \in \mathcal{S}\mathcal{D}\) and that \(u^0\) is a global solution to \(\mathcal{M}\mathcal{C}\left( C,u^*\right) \). Then, for all \(k\in [n_1]\), it holds that \(u_i^0u_j^0 = u_i^*u_j^*\) for all \(i,j\in \mathcal {I}_{1k}\). In addition, \(u^0_i = 0\) for all \(i\in \mathcal {I}_0\left( C,u^*\right) \).
Proof
Denote \(M^*:= u^*(u^*)^T\). We first consider nodes in \(\mathcal {G}_{1k}\) for some \(k\in [n_1]\). Since the subgraph is not bipartite, there exists a cycle with an odd length \(2\ell +1\), which we denote as
Then, we have
which implies that the conclusion holds for \(i=j=i_1\). Using the connectivity of \(\mathbb {G}_{1k}\left( C,u^*\right) \), we know
Then, we consider nodes in \(\mathcal {I}_0\left( C,u^*\right) \). Since \(\mathcal {I}_{00}\left( C,u^*\right) \) is empty, for every node \(i\in \mathcal {I}_0\left( C,u^*\right) \), there exists another node \(j\in \mathcal {I}_1\left( C,u^*\right) \) such that \(C_{ij}>0\). Hence, we have
This completes the proof. \(\square \)
The following lemma provides a necessary and sufficient condition for instances with a positive definite Hessian matrix at global solutions, which is stronger than what Theorem 4.3 requires.
Lemma C.2
Suppose that \(u^0\in \mathbb {R}^n\) is a global minimizer of the instance \(\mathcal{M}\mathcal{C}\left( C,u^*\right) \) such that the conditions in Lemma C.1 hold. Then, the Hessian matrix is positive definite at \(u^0\) if and only if
-
1.
\(\mathbb {G}_{1i}\left( C,u^*\right) \) is not bipartite for all \(i\in [n_1]\);
-
2.
\(\mathcal {I}_{00}\left( C,u^*\right) = \emptyset \).
Proof
We prove the positive definiteness of the Hessian matrix for the sufficiency part since it is the part used in this manuscript. The necessity part is proved in the online version [60].
Sufficiency. Now, we consider the sufficiency part, namely, we prove that the Hessian matrix is positive definite under the two conditions stated in the theorem. Suppose that there exists a nonzero vector \(q\in \mathbb {R}^n\) such that
Then, after straightforward calculations, we arrive at
The two conditions can be written compactly as
Consider the index set \(\mathcal {I}_{1k}\left( C,u^*\right) \) for some \(k\in [n_1]\). The equality (C.1) implies that
Since the graph \(\mathbb {G}_{1k}\left( C,u^*\right) \) is not bipartite, there exists a cycle with an odd length \(2\ell +1\), which we denote as
Denoting \(i_{2\ell +2}:= i_1\), we can calculate that
which leads to \(q_{i_1} = 0\). Using the connectivity of \(\mathcal {G}_{1k}\) and the relation (C.2), it follows that
Moreover, the same conclusion holds for all \(k\in [n_1]\) and, thus, we conclude that
Since \(\mathcal {I}_{00}\left( C,u^*\right) = \emptyset \), for every node \(i\in \mathcal {I}_0\left( C,u^*\right) \), there exists another node \(j\in \mathcal {I}_1\left( C,u^*\right) \) such that \(C_{ij} > 0\). Considering the relation (C.2), we obtain that
In summary, we have proved that \(q_i = 0\) for all \(i\in [n]\), which contradicts the assumption that \(q\ne 0\). Hence, the Hessian matrix at \(u^0\) is positive definite. \(\square \)
1.2 C.2 Application of the implicit function theorem
Using the positive-definiteness of the Hessian matrix, we are able to apply the implicit function theorem to certify the existence of spurious local minima.
Lemma C.3
Suppose that \(\alpha \in [0,1]\) and consider a pair \(\left( C,u^*\right) \in \mathcal{S}\mathcal{D}\). Then, there exists a small constant \(\delta (C,u^*) > 0\) such that for every instance \(\mathcal{M}\mathcal{C}({\tilde{C}},{\tilde{u}}^*)\) satisfying
the instance \(\mathcal{M}\mathcal{C}({\tilde{C}},{\tilde{u}}^*)\) has spurious local minima.
Proof
By Theorem 4.3, there exists a global solution \(u^0\) to the instance \(\mathcal{M}\mathcal{C}\left( C,u^*\right) \) such that
Consider the system of equations:
Since the Jacobi matrix of \(\nabla g(u;C,u^*)\) with respect to u is the Hessian matrix \(\nabla ^2\,g(u;C,u^*)\) and \((u^0, C, u^*)\) is a solution, the implicit function theorem guarantees that there exists a small constant \(\delta \left( C,u^*\right) >0\) such that in the neighborhood
there exists a function \(u({\tilde{C}}, {\tilde{u}}^*):\mathcal {N}\mapsto \mathbb {R}^n\) such that
-
1.
\(u\left( C,u^*\right) = u^0\);
-
2.
\(u(\cdot ,\cdot )\) is a continuous function in \(\mathcal {N}\);
-
3.
\(\nabla g[u({\tilde{C}}, {\tilde{u}}^*); {\tilde{C}}, {\tilde{u}}^*] = 0\).
Using the continuity of the Hessian matrix and \(u(\cdot ,\cdot )\), we can choose \(\delta \left( C,u^*\right) \) to be small enough such that
Therefore, the point \(u({\tilde{C}}, {\tilde{u}}^*)\) is a spurious local minimum of the instance \(\mathcal{M}\mathcal{C}({\tilde{C}}, {\tilde{u}}^*)\). \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, H., Yalcin, B., Lavaei, J. et al. A new complexity metric for nonconvex rank-one generalized matrix completion. Math. Program. 207, 227–268 (2024). https://doi.org/10.1007/s10107-023-02008-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-023-02008-5