[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Efficient random graph matching via degree profiles

  • Published:
Probability Theory and Related Fields Aims and scope Submit manuscript

Abstract

Random graph matching refers to recovering the underlying vertex correspondence between two random graphs with correlated edges; a prominent example is when the two random graphs are given by Erdős-Rényi graphs \(G(n,\frac{d}{n})\). This can be viewed as an average-case and noisy version of the graph isomorphism problem. Under this model, the maximum likelihood estimator is equivalent to solving the intractable quadratic assignment problem. This work develops an \({\widetilde{O}}(n d^2+n^2)\)-time algorithm which perfectly recovers the true vertex correspondence with high probability, provided that the average degree is at least \(d = \varOmega (\log ^2 n)\) and the two graphs differ by at most \(\delta = O( \log ^{-2}(n) )\) fraction of edges. For dense graphs and sparse graphs, this can be improved to \(\delta = O( \log ^{-2/3}(n) )\) and \(\delta = O( \log ^{-2}(d) )\) respectively, both in polynomial time. The methodology is based on appropriately chosen distance statistics of the degree profiles (empirical distribution of the degrees of neighbors). Before this work, the best known result achieves \(\delta =O(1)\) and \(n^{o(1)} \le d \le n^c\) for some constant c with an \(n^{O(\log n)}\)-time algorithm and \(\delta ={{\widetilde{O}}}((d/n)^4)\) and \(d = {\widetilde{\varOmega }}(n^{4/5})\) with a polynomial-time algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. To ensure the Bernoulli parameter in (2) is well-defined, we need to assume \(q(1-s) \le 1-q\), or equivalently \(s \ge 2-1/q\). Similarly, to ensure the edge probability in the parent graph \(p=q/s \le 1\), we need to assume \( s \ge q\).

  2. Throughout the paper, we use standard big O notation, e.g., for any sequences \(\{a_n\}\) and \(\{b_n\}\), \(a_n=\varTheta (b_n)\) (or \(a_n \asymp b_n\)) if \(1/c\le a_n/ b_n \le c\) holds for all n for some absolute constant \(c>0\); \(a_n =\varOmega (b_n)\) and \(b_n = O(a_n)\) (or \(a_n > rsim b_n\) and \(b_n \lesssim a_n\)) if \(a_n/b_n \ge c\). We use big \({\widetilde{O}}\) notation to hide logarithmic factors.

  3. Achievability and converse bounds for more general correlated Erdős-Rényi random graph models are also available in [13, 14].

  4. To be precise, all but two elements (namely, \(A_{ik}\) and \(B_{ki}\)) are independent. This can be easily dealt with by excluding those two from the empirical distribution, which, by the triangle inequality, changes the distance statistic by at most \(\frac{1}{n}\).

  5. Alternatively, outdegrees can be computed via the number of common neighbors by squaring the adjacency matrix using fast matrix multiplication.

References

  1. Aflalo, Y., Bronstein, A., Kimmel, R.: On convex relaxation of graph isomorphism. Proc. Nat. Acad. Sci. 112(10), 2942–2947 (2015)

    Article  MathSciNet  Google Scholar 

  2. Alon, N., Spencer, J.H.: The probabilistic method, 3rd edn. Wiley, New Jersey (2008)

    Book  Google Scholar 

  3. Babai, L., Erdös, P., Selkow, S.M.: Random graph isomorphism. SIAM J. Comput. 9(3), 628–635 (1980)

    Article  MathSciNet  Google Scholar 

  4. Barak, B., Chou, C.N., Lei, Z., Schramm, T., Sheng, Y.: (Nearly) efficient algorithms for the graph matching problem on correlated random graphs. arXiv preprint arXiv:1805.02349 (2018)

  5. del Barrio, E., Giné, E., Matrán, C.: Central limit theorems for the Wasserstein distance between the empirical and the true distributions. Ann. Prob. 27, 1009–1071 (1999)

    Article  MathSciNet  Google Scholar 

  6. Berend, D., Kontorovich, A.: A sharp estimate of the binomial mean absolute deviation with applications. Stat. Prob. Lett. 83(4), 1254–1259 (2013)

    Article  MathSciNet  Google Scholar 

  7. Bollobás, B.: Distinguishing vertices of random graphs. In: North-Holland Mathematics Studies vol. 62, pp. 33–49 (1982)

  8. Bollobás, B.: Cambridge studies in advanced mathematics. In: Random Graphs (2nd Edition). Cambridge university press, New York (2001)

  9. Bordenave, C., Lelarge, M., Massoulié, L.: Non-backtracking spectrum of random graphs: community detection and non-regular Ramanujan graphs. In: 2015 IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS), pp. 1347–1357 (2015). ArXiv arXiv:1501.06087

  10. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)

    MATH  Google Scholar 

  11. Burkard, R.E., Cela, E., Pardalos, P.M., Pitsoulis, L.S.: The quadratic assignment problem. In: Handbook of Combinatorial Optimization, pp. 1713–1809. Springer, Berlin (1998)

  12. Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph matching in pattern recognition. Int. J. Pattern Recognit. Artif. Intell. 18(03), 265–298 (2004)

    Article  Google Scholar 

  13. Cullina, D., Kiyavash, N.: Improved achievability and converse bounds for Erdös-Rényi graph matching. In: Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, pp. 63–72. ACM (2016)

  14. Cullina, D., Kiyavash, N.: Exact alignment recovery for correlated Erdös-Rényi graphs. arXiv preprint arXiv:1711.06783 (2017)

  15. Cullina, D., Kiyavash, N., Mittal, P., Poor, H.V.: Partial recovery of Erdős-Rényi graph alignment via \( k \)-core alignment. arXiv preprint arXiv:1809.03553 (2018)

  16. Czajka, T., Pandurangan, G.: Improved random graph isomorphism. J. Discrete Algorithms 6(1), 85–92 (2008)

    Article  MathSciNet  Google Scholar 

  17. Dai, O.E., Cullina, D., Kiyavash, N., Grossglauser, M.: On the performance of a canonical labeling for matching correlated Erdös-Rényi graphs. arXiv preprint arXiv:1804.09758 (2018)

  18. David, H., Nagaraja, H.: Order Statistics, 3rd edn. Wiley, New Jersey (2003)

    Book  Google Scholar 

  19. Dym, N., Maron, H., Lipman, Y.: DS++: a flexible, scalable and provably tight relaxation for matching problems. ACM Trans. Graphics (TOG) 36(6), 184 (2017)

    Article  Google Scholar 

  20. Feizi, S., Quon, G., Recamonde-Mendoza, M., Medard, M., Kellis, M., Jadbabaie, A.: Spectral alignment of graphs. arXiv preprint arXiv:1602.04181 (2016)

  21. Fiori, M., Sapiro, G.: On spectral properties for graph matching and graph isomorphism problems. Inf. Inference J. IMA 4(1), 63–76 (2015)

    Article  MathSciNet  Google Scholar 

  22. Fishkind, D.E., Adali, S., Patsolic, G.H., Meng, L., Singh, D., Lyzinski, V., Priebe, C.E.: Seeded graph matching. Pattern Recogn. 87, 203–215 (2019)

    Article  Google Scholar 

  23. Ford, L.R., Fulkerson, D.R.: Maximal flow through a network. Can. J. Math. 8(3), 399–404 (1956)

    Article  MathSciNet  Google Scholar 

  24. Haghighi, A.D., Ng, A.Y., Manning, C.D.: Robust textual inference via graph matching. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 387–394. Association for Computational Linguistics (2005)

  25. Hopcroft, J.E., Karp, R.M.: An \(n^{5/2}\) algorithm for maximum matchings in bipartite graphs. SIAM J. Comput. 2(4), 225–231 (1973)

    Article  MathSciNet  Google Scholar 

  26. Kaas, R., Buhrman, J.M.: Mean, median and mode in binomial distributions. Stat. Neerl. 34(1), 13–18 (1980)

    Article  MathSciNet  Google Scholar 

  27. Kazemi, E., Hassani, H., Grossglauser, M., Modarres, H.P.: Proper: global protein interaction network alignment through percolation matching. BMC Bioinform. 17(1), 527 (2016)

    Article  Google Scholar 

  28. Kazemi, E., Hassani, S.H., Grossglauser, M.: Growing a graph matching from a handful of seeds. Proc. VLDB Endow. 8(10), 1010–1021 (2015)

    Article  Google Scholar 

  29. Kezurer, I., Kovalsky, S.Z., Basri, R., Lipman, Y.: Tight relaxation of quadratic matching. In: Computer Graphics Forum, vol. 34, pp. 115–128. Wiley Online Library (2015)

  30. Korula, N., Lattanzi, S.: An efficient reconciliation algorithm for social networks. Proc. VLDB Endow. 7(5), 377–388 (2014)

    Article  Google Scholar 

  31. Li, W.V., Shao, Q.M.: Gaussian processes: inequalities, small ball probabilities and applications. Handbook of Statistics 19, 533–597 (2001)

    Article  MathSciNet  Google Scholar 

  32. Livi, L., Rizzi, A.: The graph matching problem. Pattern Anal. Appl. 16(3), 253–283 (2013)

    Article  MathSciNet  Google Scholar 

  33. Lubars, J., Srikant, R.: Correcting the output of approximate graph matching algorithms. In: IEEE INFOCOM 2018-IEEE Conference on Computer Communications, pp. 1745–1753. IEEE (2018)

  34. Lyzinski, V., Fishkind, D., Fiori, M., Vogelstein, J., Priebe, C., Sapiro, G.: Graph matching: relax at your own risk. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 60–73 (2016)

    Article  Google Scholar 

  35. Lyzinski, V., Fishkind, D.E., Priebe, C.E.: Seeded graph matching for correlated Erdős-Rényi graphs. J. Mach. Learn. Res. 15, 3513 (2013)

    MATH  Google Scholar 

  36. Makarychev, K., Manokaran, R., Sviridenko, M.: Maximum quadratic assignment problem: Reduction from maximum label cover and lp-based approximation algorithm. In: International Colloquium on Automata, Languages, and Programming pp. 594–604 (2010)

  37. Mitzenmacher, M., Upfal, E.: Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, New York (2005)

    Book  Google Scholar 

  38. Mossel, E., Ross, N.: Shotgun assembly of labeled graphs. IEEE Trans. Netw. Sci. Eng. 6(2), 145–157 (2019)

    Article  MathSciNet  Google Scholar 

  39. Mossel, E., Xu, J.: Seeded graph matching via large neighborhood statistics. To appear in 2019 ACM-SIAM Symposium on Discrete Algorithms (SODA), arXiv preprint arXiv:1807.10262 (2018)

  40. Nadarajah, S., Kotz, S.: Exact distribution of the max/min of two Gaussian random variables. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 16(2), 210–212 (2008)

    Article  Google Scholar 

  41. Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets. In: Security and Privacy, 2008. SP 2008. IEEE Symposium on, pp. 111–125. IEEE (2008)

  42. Narayanan, A., Shmatikov, V.: De-anonymizing social networks. In: Security and Privacy, 2009 30th IEEE Symposium on, pp. 173–187. IEEE (2009)

  43. Okamoto, M.: Some inequalities relating to the partial sum of binomial probabilities. Ann. Inst. Stat. Math. 10(1), 29–35 (1959). https://doi.org/10.1007/BF02883985

    Article  MathSciNet  MATH  Google Scholar 

  44. Onaran, E., Villar, S.: Projected power iteration for network alignment. arXiv preprint arXiv:1707.04929 (2017)

  45. Pardalos, P.M., Rendl, F., Wolkowicz, H.: The quadratic assignment problem: a survey and recent developments. In: Proceedings of the DIMACS Workshop on Quadratic Assignment Problems, volume 16 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pp. 1–42. American Mathematical Society (1994)

  46. Pedarsani, P., Grossglauser, M.: On the privacy of anonymized networks. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1235–1243 (2011)

  47. Petrov, V.V.: Limit Theorems of Probability Theory: Sequences of Independent Random Variables. Oxford Science Publications, Clarendon Press, Oxford, United Kingdom (1995)

  48. Slashdot social network (2009). https://snap.stanford.edu/data/soc-Slashdot0902.html

  49. Scheinerman, E.R., Ullman, D.H.: Fractional Graph Theory: a Rational Approach to the Theory of Graphs. Dover, Illinois (1997)

    MATH  Google Scholar 

  50. Schellewald, C., Schnörr, C.: Probabilistic subgraph matching based on convex relaxation. In: EMMCVPR, vol. 5, pp. 171–186. Springer, Berlin (2005)

  51. Shirani, F., Garg, S., Erkip, E.: Seeded graph matching: Efficient algorithms and theoretical guarantees. arXiv preprint arXiv:1711.10360 (2017)

  52. Shorack, G.R., Wellner, J.A.: Empirical Processes with Applications to Statistics. Wiley, New Jersey (1986)

    MATH  Google Scholar 

  53. Singh, R., Xu, J., Berger, B.: Global alignment of multiple protein interaction networks with application to functional orthology detection. Proc. Nat. Acad. Sci. 105(35), 12763–12768 (2008)

    Article  Google Scholar 

  54. Wright, E.M.: Graphs on unlabelled nodes with a given number of edges. Acta Mathematica 126(1), 1–9 (1971)

    Article  MathSciNet  Google Scholar 

  55. Yartseva, L., Grossglauser, M.: On the performance of percolation graph matching. In: Proceedings of the First ACM Conference on Online Social Networks, pp. 119–130. ACM (2013)

  56. Zhao, Q., Karisch, S.E., Rendl, F., Wolkowicz, H.: Semidefinite programming relaxations for the quadratic assignment problem. J. Comb. Opt. 2(1), 71–109 (1998)

    Article  MathSciNet  Google Scholar 

  57. Zubkov, A.M., Serov, A.A.: A complete proof of universal inequalities for the distribution function of the binomial law. Theory Prob. Its Appl. 57(3), 539–544 (2013)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yihong Wu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

J. Ding is supported in part by the NSF Grant DMS-1757479 and an Alfred Sloan fellowship. Z. Ma is supported in part by an NSF CAREER award DMS-1352060 and an Alfred Sloan fellowship. Y. Wu is supported in part by the NSF Grant CCF-1527105, an NSF CAREER award CCF-1651588, and an Alfred Sloan fellowship. J. Xu is supported by the NSF Grants CCF-1850743, CCF-1856424, and IIS-1932630.

J. Ding and Y. Wu would like to thank the Centre de Recherches Mathématiques at the Université de Montréal, where some of the work was carried out during the Workshop on Combinatorial Statistics. Y. Wu is also grateful to David Pollard for helpful discussions on small ball probability. J. Xu would like to thank Nadav Dym and Shahar Kovalsky for pointing out the connections between fractional isomorphism and iterated degree sequences. The authors are grateful to the anonymous referees for helpful comments and corrections.

Appendices

Appendix A Auxiliary results

Recall the following tail bound for binomial random variable \(X\sim \mathrm{Binom}(n,p)\) [37, Theorems 4.4, 4.5]

$$\begin{aligned} {\mathbb {P}}\left\{ X \ge (1+t) np \right\}&\le e^{-\frac{t^2}{3} np}, \quad 0 \le t \le 1 \nonumber \\ {\mathbb {P}}\left\{ X \le (1-t) np \right\}&\le e^{-\frac{t^2}{2} np}, \quad 0 \le t \le 1 \end{aligned}$$
(165)

and

$$\begin{aligned} {\mathbb {P}}\left\{ X \ge R \right\} \le 2^{-R}, \quad R \ge 6np. \end{aligned}$$
(166)

Theorem 5

( [43]) Let \(X \sim \mathrm {Bin}(n,p)\). It holds that

$$\begin{aligned} {\mathbb {P}}\left\{ X \le n t \right\}&\le \exp \left( - n \left( \sqrt{p} - \sqrt{t} \right) ^2\right) , \quad \forall 0 \le t \le p \end{aligned}$$
(167)
$$\begin{aligned} {\mathbb {P}}\left\{ X \ge n t \right\}&\le \exp \left( - 2n \left( \sqrt{t} - \sqrt{p} \right) ^2\right) , \quad \forall p \le t \le 1. \end{aligned}$$
(168)

Appendix B Analysis for seeded graph matching

In this section we analyze Algorithm 3 for seeded graph matching. Note that when Algorithm 3 is used as a subroutine in Algorithm 2, the seed set S is obtained from Algorithm 1 based on matching degree profiles, which can potentially depend on the edges between the non-seeded vertices. To deal with this dependency, the following lemma gives a sufficient condition for the seeded graph matching subroutine (Algorithm 3) to succeed, even if the seed set is chosen adversarially:

Lemma 18

(Seeded graph matching) Assume \(n\ge 4\), \(s \ge 30 q\), and

$$\begin{aligned} n (qs)^2 \ge 2^{11} \times 3 \log ^2 n. \end{aligned}$$
(169)

If the number of seeds satisfies \(m \ge \frac{96 \log n}{q s}\), then with probability \(1 - 5n^{-1}\), the following holds: for any \(\pi _0:S \rightarrow T\) that coincides with true permutation \(\pi ^*\) on the seed set S, (i.e. \(\pi _0 = \pi ^*|_S\)) with \(|S|=m\), Algorithm 3 with \(\pi _0\) as the seed set and threshold \(\kappa =\frac{1}{2} mqs\) outputs \({{\widehat{\pi }}} = \pi \).

We start by analyzing the first stage of Algorithm 3, which upgrades a partial (but correct) permutation \(\pi _0: S \rightarrow T\) to a full permutation \(\pi _1:[n] \rightarrow [n]\) with at most \(O(\log n/q)\) errors, even if the seed set S is adversarially chosen.

Lemma 19

Assume \(n\ge 2\), \(m q s \ge 96 \log n\), and \(s \ge 12q\). Recall the threshold \( \kappa =\frac{1}{2} mqs \) in Algorithm 3. Then with probability at least \(1- 2n^{ -m }\), the following holds in Algorithm 3: for any partial permutation \(\pi _0: S \rightarrow T\) such that \(\pi _0 = \pi ^*|_S\) and \(|S|=m\), \(\pi _1\) is guaranteed to have at most \(\frac{192\log n}{qs}\) errors with respect to \(\pi ^*\), i.e., \(|\{i\in [n]: \pi _1(i) \ne \pi ^*(i) \}| \le \frac{192\log n}{qs}\).

Proof (Proof of Lemma 19)

Without loss of generality, we assume \(\pi ^*\) is the identity permutation.

Fix a seed set S of cardinality m. Since \(\pi _0 = \pi ^*|_S\), it follows that

$$\begin{aligned} n_{ik} = \sum _{j \in S} A_{ij} B_{k \pi _0(j)} =\sum _{j \in S} A_{ij} B_{k \pi ^*(j)}. \end{aligned}$$

Recall that according to the definition of the weights in (35), we have

$$\begin{aligned} w(\pi ^*) = \sum _{i \in S^c} {{\mathbf {1}}_{\left\{ {n_{ii} \ge \kappa }\right\} }}. \end{aligned}$$

First, we show that

$$\begin{aligned} {\mathbb {P}}\left\{ w(\pi ^*) \le n -m- \frac{ 32 \log n}{qs} \right\} \le \exp \left( - 2 m \log n \right) , \end{aligned}$$
(170)

Indeed, for \(i \in S^c\) we have \(n_{i i} {{\mathop {\sim }\limits ^{\text {i.i.d.}}}}\mathrm{Binom}(m, qs)\). It follows from the Chernoff bound (165) that

$$\begin{aligned} {\mathbb {P}}\left\{ n_{ii} \le \kappa \right\} = {\mathbb {P}}\left\{ n_{ii} \le \frac{1}{2} mqs \right\} \le \exp \left( - \frac{1}{8} m q s \right) . \end{aligned}$$

Therefore,

$$\begin{aligned} (n-m)-w(\pi ^*) = \sum _{i \in S^c} {{\mathbf {1}}_{\left\{ {n_{ii} < \kappa }\right\} }} \overset{s.t.}{\le } \mathrm{Binom}\left( n-m, \exp \left( - \frac{1}{8} m q s \right) \right) . \end{aligned}$$

Using the following fact (which follows from a simple union bound)

$$\begin{aligned} {\mathbb {P}}\left\{ \mathrm{Binom}\left( n, p \right) \ge t \right\} \le \left( {\begin{array}{c}n\\ t\end{array}}\right) p^t, \end{aligned}$$
(171)

we get that

$$\begin{aligned} {\mathbb {P}}\left\{ (n-m)-w(\pi ^*) \ge t \right\}\le & {} \left( {\begin{array}{c}n-m\\ t\end{array}}\right) \exp \left( - \frac{t}{8} m q s \right) \le n^t \exp \left( - \frac{t}{8} m q s \right) \\\le & {} \exp \left( - \frac{t}{16} m q s \right) , \end{aligned}$$

where the last inequality holds due to the assumption that \(mqs \ge 16 \log n\). Setting \(t=\frac{ 32 \log n}{qs}\), we arrive at the desired (170).

Next, fix any permutation \(\pi \) such that \(\pi |_S = \pi _0\) and it has \(\ell \) non-fixed points. Since by assumption \(\pi _0=\pi ^* |_S\) and \(\pi ^*\) is the identity permutation, it follows that \(\pi (i)=i\) for all \(i \in S\). Let \(F = \{i \in S^c: \pi (i) = i\}\) denote the set of fixed points in \(S^c\). Then \(|F|=n-m-\ell \) and \(|S^c\backslash F|=\ell \). Thus

$$\begin{aligned} w(\pi ) = \sum _{i \in F} {{\mathbf {1}}_{\left\{ {n_{ii} \ge \kappa }\right\} }} + \sum _{ i \in S^c \backslash F} {{\mathbf {1}}_{\left\{ {n_{i\pi (i)} \ge \kappa }\right\} }} \le n-m-\ell + \sum _{ i \in S^c\backslash F} {{\mathbf {1}}_{\left\{ {n_{i\pi (i)} \ge \kappa }\right\} }}. \end{aligned}$$

Note that for each \(i \in S^c \backslash F\), \(n_{i\pi (i)} \sim \mathrm{Binom}(m, q^2)\). Since by assumption \(s \ge 12q\), it follows that \(\kappa = mqs/2 \ge 6 m q^2\). Hence, the Chernoff bound (166) yields that for each \(i \in S^c \backslash F\),

$$\begin{aligned} {\mathbb {P}}\left\{ n_{i \pi (i)} \ge \kappa \right\} \le 2^{- m q s/2 } \le \exp \left( - \frac{1}{4} m q s \right) . \end{aligned}$$

Note that \(\{n_{i\pi (i)}: i \in S^c \backslash F\}\) are not mutually independent. For instance, \(n_{i \pi (i)}\) and \(n_{\pi (i), \pi (\pi (i))}\) are dependent. To deal with this dependency issue, we construct a subset \({{\mathcal {I}}}\subset S^c \backslash F\) with \(|{{\mathcal {I}}}| \ge \ell /3\) such that \(\{n_{i\pi (i)}: i \in {{\mathcal {I}}}\}\) are mutually independent. In particular, consider the canonical cycle decomposition of permutation \(\pi |_{S^c \backslash F}\). Let \({{\mathcal {C}}}_1, \ldots , {{\mathcal {C}}}_{a}\) denote the cycles. Since \(\pi \) has no fixed point in \(S^c \backslash F\), each cycle \({{\mathcal {C}}}_i\) has length \(\ell _i \ge 2\). Let \(\varGamma \) denote the graph formed by the union of these cycles. Each cycle \(C_i\) has an independent set \({{\mathcal {I}}}_i\) of size \(\lfloor \ell _i /2 \rfloor \ge \ell _i/3\). Let \({{\mathcal {I}}}= \cup _{i=1}^a {{\mathcal {I}}}_i\). Then \({{\mathcal {I}}}\) is an independent set in \(\varGamma \) and \(|{{\mathcal {I}}}| \ge \sum _{i=1}^a \ell _i/3=\ell /3\). Since \({{\mathcal {I}}}\) is an independent set, it follows that \(\{i, \pi (i)\} \cap \{j, \pi (j)\} =\emptyset \) for all \(i \ne j \in {{\mathcal {I}}}\). Therefore, \(\{n_{i\pi (i)}: i \in {{\mathcal {I}}}\}\) are mutually independent. Therefore,

$$\begin{aligned} \sum _{ i \in {{\mathcal {I}}}} {{\mathbf {1}}_{\left\{ {n_{i\pi (i)} \ge \kappa }\right\} }} \overset{s.t.}{\le } \mathrm{Binom}\left( |{{\mathcal {I}}}| , \exp \left( - \frac{1}{4} m q s \right) \right) . \end{aligned}$$

Note that

$$\begin{aligned} w(\pi ) \le n-m-\ell + \sum _{ i \in S^c\backslash F} {{\mathbf {1}}_{\left\{ {n_{i\pi (i)} \ge \kappa }\right\} }} \le n-m-|{{\mathcal {I}}}| + \sum _{i \in {{\mathcal {I}}}} {{\mathbf {1}}_{\left\{ {n_{i\pi (i)} \ge \kappa }\right\} }} \end{aligned}$$

Using (171) again, we have

$$\begin{aligned}&{\mathbb {P}}\left\{ w(\pi ) \ge n -m- \frac{ 32 \log n}{qs} \right\} \\&\quad \le {\mathbb {P}}\left\{ \sum _{ i \in {{\mathcal {I}}}} {{\mathbf {1}}_{\left\{ {n_{i\pi (i)} \ge \kappa }\right\} }} \ge |{{\mathcal {I}}}| -\frac{ 32 \log n}{qs} \right\} \\&\quad \le \left( {\begin{array}{c}|{{\mathcal {I}}}| \\ |{{\mathcal {I}}}| - \frac{ 32 \log n}{qs}\end{array}}\right) \exp \left( - \frac{1}{4} m q s \left( |{{\mathcal {I}}}| - \frac{ 32 \log n}{qs}\right) \right) \\&\quad \le 2^{ \ell } \exp \left( - \frac{1}{4} m q s \left( \frac{\ell }{3} - \frac{ 32 \log n}{qs}\right) \right) \le 2^{\ell } \exp \left( - \frac{1}{24} m q s \ell \right) , \end{aligned}$$

where the last inequality holds provided \(\ell qs\ge 192 \log n\). Let \(\varPi _\ell \) denote the set of permutations \(\pi \) which has \(\ell \) non-fixed points and satisfies \(\pi |_S = \pi _0\). Then \(|\varPi _\ell | \le \left( {\begin{array}{c}n-m\\ \ell \end{array}}\right) \ell ! \le n^\ell \). By the union bound, we have that for any \(\ell \ge \frac{ 192 \log n}{qs}\),

$$\begin{aligned} {\mathbb {P}}\left\{ \max _{\pi \in \varPi _\ell } w(\pi ) \ge n -m- \frac{ 32 \log n}{qs} \right\} \le (2n)^{\ell } \exp \left( - \frac{1}{24} m q s \ell \right) \le \exp \left( - \frac{1}{48} m q s \ell \right) , \end{aligned}$$

where the last inequality holds due to the assumption that \(mqs \ge 96 \log n\) and \(n \ge 2\). Applying the union bound again over \(\ell \), we get that

$$\begin{aligned} {\mathbb {P}}\left\{ \max _{\ell \ge \frac{ 192 \log n}{qs}} \max _{\pi \in \varPi _\ell } w(\pi ) \ge n -m- \frac{ 32 \log n}{qs} \right\}&\le \sum _{\ell \ge \frac{ 192 \log n}{qs}} \exp \left( - \frac{1}{48} m q s \ell \right) \\&\le \frac{ \exp \left( - 4 m \log n \right) }{ 1 - \exp \left( - 4 m \log n \right) } \\&\le \exp \left( - 2 m \log n \right) , \end{aligned}$$

where the last inequality holds due to \(m \log n \ge \log 2\).

Combining the last displayed equation with (170) we get that with probability at least \(1- 2 n^{-2m}\), \(\pi _1\) has at most \(192\log n/(qs)\) errors with respect to \(\pi ^*\).

Finally, applying a simple union bound over all the \(\left( {\begin{array}{c}n\\ m\end{array}}\right) \le n^m\) possible choices of seed set S with \(|S|=m\), we complete the proof. \(\square \)

The second stage of Algorithm 3 upgrades an almost exact full permutation \(\pi _1:[n] \rightarrow [n]\) to an exact full permutation \({\widehat{\pi }}: [n] \rightarrow [n]\). The following lemma provides a worst-case guarantee even if \(\pi _1\) is adversarially chosen.

Lemma 20

Let \(0 \le \ell \le n\). Assume that \((\ell -1) qs \ge 12 nq^2 +2 \) and \( (\ell -1) q s \ge 16 \max \{ 1, n-\ell \} \log n\). Then with probability at least \(1-3n^{-1}\), the following holds for Algorithm 3: for any \(\pi _1\) with at most \(n-\ell \) errors with respect to the true permutation \(\pi ^*\), we have \({\widehat{\pi }}=\pi ^*\).

Proof

Without loss of generality, we assume \(\pi ^*\) is the identity permutation.

We first fix a permutation \(\pi _1\) which has at least \(\ell \) fixed points. Let \(F \subset [n]\) denote the set of fixed points of \(\pi _1\). Then \(|F| \ge \ell \). Recall that

$$\begin{aligned} w_{ik} = \sum _{j \in [n]} A_{ij} B_{k \pi _1(j)}. \end{aligned}$$

Then for \(i=k\),

$$\begin{aligned} w_{ii} \ge \sum _{j \in F \setminus \{i\} } A_{ij} B_{i j} \overset{s.t.}{\ge } \mathrm{Binom}( |F| -1, qs ). \end{aligned}$$

Similarly, for \(i \ne k\), note that \(A_{ij} B_{k \pi _1(j)} =0\) if \(j=i\) or \(j=\pi _1^{-1}(k)\). Thus, \(w_{ik} = \sum _{j \in [n]\backslash \{i,\pi _1^{-1}(k)\} }A_{ij} B_{k \pi _1(j)}. \) Moreover, \(A_{ij} B_{k \pi _1(j)} {{\mathop {\sim }\limits ^{\text {i.i.d.}}}}\mathrm{Bern}(q^2)\) for all \(j \in [n]\backslash \{i,\pi _1^{-1}(k) , k \}\). Therefore,

$$\begin{aligned} w_{ik} \le \sum _{j \in [n]\backslash \{i,\pi _1^{-1}(k), k \} } A_{ij} B_{k \pi _1(j)} +1 \overset{s.t.}{\le } \mathrm{Binom}( n-2 , q^2 ) + 1. \end{aligned}$$

It follows from the Chernoff bound (165) that

$$\begin{aligned} {\mathbb {P}}\left\{ w_{ii} \le \frac{1}{2} (\ell -1) qs \right\}\le & {} {\mathbb {P}}\left\{ \mathrm{Binom}\left( |F| -1 , qs \right) \le \frac{1}{2} (\ell -1) qs \right\} \\\le & {} \exp \left( - \frac{1}{8} (\ell -1) q s \right) . \end{aligned}$$

Thus, by the union bound,

$$\begin{aligned} {\mathbb {P}}\left\{ \min _{i \in [n] } w_{ii} \le \frac{1}{2} (\ell -1) qs \right\} \le n \exp \left( - \frac{1}{8} (\ell -1) q s \right) \le \exp \left( - \frac{1}{16} (\ell -1) q s \right) , \end{aligned}$$

where the last inequality holds due to the assumption that \( (\ell -1) q s \ge 16 \log n\). Moreover, since by assumption \( (\ell -1) qs /2 -1 \ge 6 n q^2\), it follows that the Chernoff bound (166) that for any \(i \ne k\),

$$\begin{aligned} {\mathbb {P}}\left\{ w_{ik} \ge \frac{1}{2} (\ell -1) qs \right\}\le & {} {\mathbb {P}}\left\{ \mathrm{Binom}(n-2, q^2) \ge \frac{1}{2} (\ell -1) qs -1 \right\} \\\le & {} 2^{ - (\ell -1) qs /2 +1 } \le 2\exp \left( -\frac{1}{4} (\ell -1) qs \right) . \end{aligned}$$

Thus, by the union bound again,

$$\begin{aligned} {\mathbb {P}}\left\{ \max _{i \ne k } w_{ik} \ge \frac{1}{2} (\ell -1) qs \right\} \le 2n^2 \exp \left( - \frac{1}{4} (\ell -1) q s \right) \le 2 \exp \left( - \frac{1}{8} (\ell -1) q s \right) . \end{aligned}$$

In conclusion, for a fixed permutation \(\pi _1\) with at least \(\ell \) fixed points, with probability at least \(1-3\exp \left( - \frac{1}{8} (\ell -1) q s \right) \),

$$\begin{aligned} \min _{i \in [n] } w_{ii} > \max _{i \ne k } w_{ik}, \end{aligned}$$

and hence \({\widehat{\pi }} = \pi ^*\).

Finally, applying a simple union bound over all the \(\left( {\begin{array}{c}n\\ n-\ell \end{array}}\right) (n-\ell )! \le n^{n-\ell }\) possible choices of permutation \(\pi _1\) with at least \(\ell \) fixed points, we get that even if \(\pi _1\) is adversarially chosen, \({\widehat{\pi }} = \pi ^*\) with probability at least

$$\begin{aligned} 1- 3 n^{n-\ell } \exp \left( - \frac{1}{8} (\ell -1) q s \right) \ge 1- 3 \exp \left( - \frac{1}{16} (\ell -1) q s \right) \ge 1-3n^{-1}, \end{aligned}$$

where the first inequality holds due to \((\ell -1) qs \ge 16(n-\ell ) \log n\) and the last inequality holds due to \((\ell -1) qs \ge 16 \log n\). \(\square \)

We now prove Lemma 18:

Proof (Proof of Lemma 18)

In view of Lemma 19, we get that with probability at least \(1- 2n^{ -m }\), \(\pi _1\) is guaranteed to have at most \(192 \log n/(qs)\) errors with respect to \(\pi ^*\), even if \(\pi _0\), or equivalently the seed set S, is adversarially chosen.

We next apply Lemma 20 with \(\ell = n- 192 \log n/(qs)\). In view of the assumption \( n (qs)^2 \ge 2^{11} \times 3 \log ^2 n\) and \(n \ge 4\), we have \((\ell -1) \ge n/2\). Thus \((\ell -1) qs \ge n qs /2 \ge 16 \log n\), and \((\ell -1) qs \ge nq s /2 \ge 12 nq^2+2\) in view of \(s \ge 30 q\) and \(nqs \ge 20\). Moreover, \((\ell -1) qs \ge n qs /2 \ge 2^{10} \times 3 \log ^2 n / (qs) = 16(n-\ell ) \log n\). Therefore, all assumptions of Lemma 20 are satisfied. It follows from Lemma 20 that with probability at least \(1-3n^{-1}\), \({\widehat{\pi }}=\pi ^*\), even if \(\pi _1\) is adversarially chosen.

In conclusion, we get that with probability at least \(1-5n^{-1}\), Algorithm 3 with \(\pi _0\) as the seed set outputs \({{\widehat{\pi }}} = \pi \). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, J., Ma, Z., Wu, Y. et al. Efficient random graph matching via degree profiles. Probab. Theory Relat. Fields 179, 29–115 (2021). https://doi.org/10.1007/s00440-020-00997-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00440-020-00997-4

Keywords

Mathematics Subject Classification