Efficient random graph matching via degree profiles

Jian Ding¹,
Zongming Ma¹,
Yihong Wu ORCID: orcid.org/0000-0001-9239-7671² &
…
Jiaming Xu³

1180 Accesses
Explore all metrics

Abstract

Random graph matching refers to recovering the underlying vertex correspondence between two random graphs with correlated edges; a prominent example is when the two random graphs are given by Erdős-Rényi graphs $G(n,\frac{d}{n})$. This can be viewed as an average-case and noisy version of the graph isomorphism problem. Under this model, the maximum likelihood estimator is equivalent to solving the intractable quadratic assignment problem. This work develops an ${\widetilde{O}}(n d^2+n^2)$-time algorithm which perfectly recovers the true vertex correspondence with high probability, provided that the average degree is at least $d = \varOmega (\log ^2 n)$ and the two graphs differ by at most $\delta = O( \log ^{-2}(n) )$ fraction of edges. For dense graphs and sparse graphs, this can be improved to $\delta = O( \log ^{-2/3}(n) )$ and $\delta = O( \log ^{-2}(d) )$ respectively, both in polynomial time. The methodology is based on appropriately chosen distance statistics of the degree profiles (empirical distribution of the degrees of neighbors). Before this work, the best known result achieves $\delta =O(1)$ and $n^{o(1)} \le d \le n^c$ for some constant c with an $n^{O(\log n)}$-time algorithm and $\delta ={{\widetilde{O}}}((d/n)^4)$ and $d = {\widetilde{\varOmega }}(n^{4/5})$ with a polynomial-time algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Exact matching of random graphs with constant correlation

Article 06 January 2023

Spectral Graph Matching and Regularized Quadratic Relaxations II

Article 13 June 2022

Spectral Graph Matching and Regularized Quadratic Relaxations I Algorithm and Gaussian Analysis

Article 10 June 2022

Notes

To ensure the Bernoulli parameter in (2) is well-defined, we need to assume $q(1-s) \le 1-q$, or equivalently $s \ge 2-1/q$. Similarly, to ensure the edge probability in the parent graph $p=q/s \le 1$, we need to assume $ s \ge q$.
Throughout the paper, we use standard big O notation, e.g., for any sequences $\{a_n\}$ and $\{b_n\}$, $a_n=\varTheta (b_n)$ (or $a_n \asymp b_n$) if $1/c\le a_n/ b_n \le c$ holds for all n for some absolute constant $c>0$; $a_n =\varOmega (b_n)$ and $b_n = O(a_n)$ (or $a_n > rsim b_n$ and $b_n \lesssim a_n$) if $a_n/b_n \ge c$. We use big ${\widetilde{O}}$ notation to hide logarithmic factors.
Achievability and converse bounds for more general correlated Erdős-Rényi random graph models are also available in [13, 14].
To be precise, all but two elements (namely, $A_{ik}$ and $B_{ki}$) are independent. This can be easily dealt with by excluding those two from the empirical distribution, which, by the triangle inequality, changes the distance statistic by at most $\frac{1}{n}$.
Alternatively, outdegrees can be computed via the number of common neighbors by squaring the adjacency matrix using fast matrix multiplication.

References

Aflalo, Y., Bronstein, A., Kimmel, R.: On convex relaxation of graph isomorphism. Proc. Nat. Acad. Sci. 112(10), 2942–2947 (2015)
Article MathSciNet Google Scholar
Alon, N., Spencer, J.H.: The probabilistic method, 3rd edn. Wiley, New Jersey (2008)
Book Google Scholar
Babai, L., Erdös, P., Selkow, S.M.: Random graph isomorphism. SIAM J. Comput. 9(3), 628–635 (1980)
Article MathSciNet Google Scholar
Barak, B., Chou, C.N., Lei, Z., Schramm, T., Sheng, Y.: (Nearly) efficient algorithms for the graph matching problem on correlated random graphs. arXiv preprint arXiv:1805.02349 (2018)
del Barrio, E., Giné, E., Matrán, C.: Central limit theorems for the Wasserstein distance between the empirical and the true distributions. Ann. Prob. 27, 1009–1071 (1999)
Article MathSciNet Google Scholar
Berend, D., Kontorovich, A.: A sharp estimate of the binomial mean absolute deviation with applications. Stat. Prob. Lett. 83(4), 1254–1259 (2013)
Article MathSciNet Google Scholar
Bollobás, B.: Distinguishing vertices of random graphs. In: North-Holland Mathematics Studies vol. 62, pp. 33–49 (1982)
Bollobás, B.: Cambridge studies in advanced mathematics. In: Random Graphs (2nd Edition). Cambridge university press, New York (2001)
Bordenave, C., Lelarge, M., Massoulié, L.: Non-backtracking spectrum of random graphs: community detection and non-regular Ramanujan graphs. In: 2015 IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS), pp. 1347–1357 (2015). ArXiv arXiv:1501.06087
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)
MATH Google Scholar
Burkard, R.E., Cela, E., Pardalos, P.M., Pitsoulis, L.S.: The quadratic assignment problem. In: Handbook of Combinatorial Optimization, pp. 1713–1809. Springer, Berlin (1998)
Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph matching in pattern recognition. Int. J. Pattern Recognit. Artif. Intell. 18(03), 265–298 (2004)
Article Google Scholar
Cullina, D., Kiyavash, N.: Improved achievability and converse bounds for Erdös-Rényi graph matching. In: Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, pp. 63–72. ACM (2016)
Cullina, D., Kiyavash, N.: Exact alignment recovery for correlated Erdös-Rényi graphs. arXiv preprint arXiv:1711.06783 (2017)
Cullina, D., Kiyavash, N., Mittal, P., Poor, H.V.: Partial recovery of Erdős-Rényi graph alignment via $ k $-core alignment. arXiv preprint arXiv:1809.03553 (2018)
Czajka, T., Pandurangan, G.: Improved random graph isomorphism. J. Discrete Algorithms 6(1), 85–92 (2008)
Article MathSciNet Google Scholar
Dai, O.E., Cullina, D., Kiyavash, N., Grossglauser, M.: On the performance of a canonical labeling for matching correlated Erdös-Rényi graphs. arXiv preprint arXiv:1804.09758 (2018)
David, H., Nagaraja, H.: Order Statistics, 3rd edn. Wiley, New Jersey (2003)
Book Google Scholar
Dym, N., Maron, H., Lipman, Y.: DS++: a flexible, scalable and provably tight relaxation for matching problems. ACM Trans. Graphics (TOG) 36(6), 184 (2017)
Article Google Scholar
Feizi, S., Quon, G., Recamonde-Mendoza, M., Medard, M., Kellis, M., Jadbabaie, A.: Spectral alignment of graphs. arXiv preprint arXiv:1602.04181 (2016)
Fiori, M., Sapiro, G.: On spectral properties for graph matching and graph isomorphism problems. Inf. Inference J. IMA 4(1), 63–76 (2015)
Article MathSciNet Google Scholar
Fishkind, D.E., Adali, S., Patsolic, G.H., Meng, L., Singh, D., Lyzinski, V., Priebe, C.E.: Seeded graph matching. Pattern Recogn. 87, 203–215 (2019)
Article Google Scholar
Ford, L.R., Fulkerson, D.R.: Maximal flow through a network. Can. J. Math. 8(3), 399–404 (1956)
Article MathSciNet Google Scholar
Haghighi, A.D., Ng, A.Y., Manning, C.D.: Robust textual inference via graph matching. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 387–394. Association for Computational Linguistics (2005)
Hopcroft, J.E., Karp, R.M.: An $n^{5/2}$ algorithm for maximum matchings in bipartite graphs. SIAM J. Comput. 2(4), 225–231 (1973)
Article MathSciNet Google Scholar
Kaas, R., Buhrman, J.M.: Mean, median and mode in binomial distributions. Stat. Neerl. 34(1), 13–18 (1980)
Article MathSciNet Google Scholar
Kazemi, E., Hassani, H., Grossglauser, M., Modarres, H.P.: Proper: global protein interaction network alignment through percolation matching. BMC Bioinform. 17(1), 527 (2016)
Article Google Scholar
Kazemi, E., Hassani, S.H., Grossglauser, M.: Growing a graph matching from a handful of seeds. Proc. VLDB Endow. 8(10), 1010–1021 (2015)
Article Google Scholar
Kezurer, I., Kovalsky, S.Z., Basri, R., Lipman, Y.: Tight relaxation of quadratic matching. In: Computer Graphics Forum, vol. 34, pp. 115–128. Wiley Online Library (2015)
Korula, N., Lattanzi, S.: An efficient reconciliation algorithm for social networks. Proc. VLDB Endow. 7(5), 377–388 (2014)
Article Google Scholar
Li, W.V., Shao, Q.M.: Gaussian processes: inequalities, small ball probabilities and applications. Handbook of Statistics 19, 533–597 (2001)
Article MathSciNet Google Scholar
Livi, L., Rizzi, A.: The graph matching problem. Pattern Anal. Appl. 16(3), 253–283 (2013)
Article MathSciNet Google Scholar
Lubars, J., Srikant, R.: Correcting the output of approximate graph matching algorithms. In: IEEE INFOCOM 2018-IEEE Conference on Computer Communications, pp. 1745–1753. IEEE (2018)
Lyzinski, V., Fishkind, D., Fiori, M., Vogelstein, J., Priebe, C., Sapiro, G.: Graph matching: relax at your own risk. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 60–73 (2016)
Article Google Scholar
Lyzinski, V., Fishkind, D.E., Priebe, C.E.: Seeded graph matching for correlated Erdős-Rényi graphs. J. Mach. Learn. Res. 15, 3513 (2013)
MATH Google Scholar
Makarychev, K., Manokaran, R., Sviridenko, M.: Maximum quadratic assignment problem: Reduction from maximum label cover and lp-based approximation algorithm. In: International Colloquium on Automata, Languages, and Programming pp. 594–604 (2010)
Mitzenmacher, M., Upfal, E.: Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, New York (2005)
Book Google Scholar
Mossel, E., Ross, N.: Shotgun assembly of labeled graphs. IEEE Trans. Netw. Sci. Eng. 6(2), 145–157 (2019)
Article MathSciNet Google Scholar
Mossel, E., Xu, J.: Seeded graph matching via large neighborhood statistics. To appear in 2019 ACM-SIAM Symposium on Discrete Algorithms (SODA), arXiv preprint arXiv:1807.10262 (2018)
Nadarajah, S., Kotz, S.: Exact distribution of the max/min of two Gaussian random variables. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 16(2), 210–212 (2008)
Article Google Scholar
Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets. In: Security and Privacy, 2008. SP 2008. IEEE Symposium on, pp. 111–125. IEEE (2008)
Narayanan, A., Shmatikov, V.: De-anonymizing social networks. In: Security and Privacy, 2009 30th IEEE Symposium on, pp. 173–187. IEEE (2009)
Okamoto, M.: Some inequalities relating to the partial sum of binomial probabilities. Ann. Inst. Stat. Math. 10(1), 29–35 (1959). https://doi.org/10.1007/BF02883985
Article MathSciNet MATH Google Scholar
Onaran, E., Villar, S.: Projected power iteration for network alignment. arXiv preprint arXiv:1707.04929 (2017)
Pardalos, P.M., Rendl, F., Wolkowicz, H.: The quadratic assignment problem: a survey and recent developments. In: Proceedings of the DIMACS Workshop on Quadratic Assignment Problems, volume 16 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pp. 1–42. American Mathematical Society (1994)
Pedarsani, P., Grossglauser, M.: On the privacy of anonymized networks. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1235–1243 (2011)
Petrov, V.V.: Limit Theorems of Probability Theory: Sequences of Independent Random Variables. Oxford Science Publications, Clarendon Press, Oxford, United Kingdom (1995)
Slashdot social network (2009). https://snap.stanford.edu/data/soc-Slashdot0902.html
Scheinerman, E.R., Ullman, D.H.: Fractional Graph Theory: a Rational Approach to the Theory of Graphs. Dover, Illinois (1997)
MATH Google Scholar
Schellewald, C., Schnörr, C.: Probabilistic subgraph matching based on convex relaxation. In: EMMCVPR, vol. 5, pp. 171–186. Springer, Berlin (2005)
Shirani, F., Garg, S., Erkip, E.: Seeded graph matching: Efficient algorithms and theoretical guarantees. arXiv preprint arXiv:1711.10360 (2017)
Shorack, G.R., Wellner, J.A.: Empirical Processes with Applications to Statistics. Wiley, New Jersey (1986)
MATH Google Scholar
Singh, R., Xu, J., Berger, B.: Global alignment of multiple protein interaction networks with application to functional orthology detection. Proc. Nat. Acad. Sci. 105(35), 12763–12768 (2008)
Article Google Scholar
Wright, E.M.: Graphs on unlabelled nodes with a given number of edges. Acta Mathematica 126(1), 1–9 (1971)
Article MathSciNet Google Scholar
Yartseva, L., Grossglauser, M.: On the performance of percolation graph matching. In: Proceedings of the First ACM Conference on Online Social Networks, pp. 119–130. ACM (2013)
Zhao, Q., Karisch, S.E., Rendl, F., Wolkowicz, H.: Semidefinite programming relaxations for the quadratic assignment problem. J. Comb. Opt. 2(1), 71–109 (1998)
Article MathSciNet Google Scholar
Zubkov, A.M., Serov, A.A.: A complete proof of universal inequalities for the distribution function of the binomial law. Theory Prob. Its Appl. 57(3), 539–544 (2013)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, USA
Jian Ding & Zongming Ma
Department of Statistics and Data Science, Yale University, New haven, USA
Yihong Wu
The Fuqua School of Business, Duke University, Durham, USA
Jiaming Xu

Authors

Jian Ding
View author publications
You can also search for this author in PubMed Google Scholar
Zongming Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yihong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jiaming Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yihong Wu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

J. Ding is supported in part by the NSF Grant DMS-1757479 and an Alfred Sloan fellowship. Z. Ma is supported in part by an NSF CAREER award DMS-1352060 and an Alfred Sloan fellowship. Y. Wu is supported in part by the NSF Grant CCF-1527105, an NSF CAREER award CCF-1651588, and an Alfred Sloan fellowship. J. Xu is supported by the NSF Grants CCF-1850743, CCF-1856424, and IIS-1932630.

J. Ding and Y. Wu would like to thank the Centre de Recherches Mathématiques at the Université de Montréal, where some of the work was carried out during the Workshop on Combinatorial Statistics. Y. Wu is also grateful to David Pollard for helpful discussions on small ball probability. J. Xu would like to thank Nadav Dym and Shahar Kovalsky for pointing out the connections between fractional isomorphism and iterated degree sequences. The authors are grateful to the anonymous referees for helpful comments and corrections.

Appendices

Appendix A Auxiliary results

Recall the following tail bound for binomial random variable $X\sim \mathrm{Binom}(n,p)$ [37, Theorems 4.4, 4.5]

$$\begin{aligned} {\mathbb {P}}\left\{ X \ge (1+t) np \right\}&\le e^{-\frac{t^2}{3} np}, \quad 0 \le t \le 1 \nonumber \\ {\mathbb {P}}\left\{ X \le (1-t) np \right\}&\le e^{-\frac{t^2}{2} np}, \quad 0 \le t \le 1 \end{aligned}$$

(165)

and

$$\begin{aligned} {\mathbb {P}}\left\{ X \ge R \right\} \le 2^{-R}, \quad R \ge 6np. \end{aligned}$$

(166)

Theorem 5

( [43]) Let $X \sim \mathrm {Bin}(n,p)$. It holds that

$$\begin{aligned} {\mathbb {P}}\left\{ X \le n t \right\}&\le \exp \left( - n \left( \sqrt{p} - \sqrt{t} \right) ^2\right) , \quad \forall 0 \le t \le p \end{aligned}$$

(167)

$$\begin{aligned} {\mathbb {P}}\left\{ X \ge n t \right\}&\le \exp \left( - 2n \left( \sqrt{t} - \sqrt{p} \right) ^2\right) , \quad \forall p \le t \le 1. \end{aligned}$$

(168)

Appendix B Analysis for seeded graph matching

In this section we analyze Algorithm 3 for seeded graph matching. Note that when Algorithm 3 is used as a subroutine in Algorithm 2, the seed set S is obtained from Algorithm 1 based on matching degree profiles, which can potentially depend on the edges between the non-seeded vertices. To deal with this dependency, the following lemma gives a sufficient condition for the seeded graph matching subroutine (Algorithm 3) to succeed, even if the seed set is chosen adversarially:

Lemma 18

(Seeded graph matching) Assume $n\ge 4$, $s \ge 30 q$, and

$$\begin{aligned} n (qs)^2 \ge 2^{11} \times 3 \log ^2 n. \end{aligned}$$

(169)

If the number of seeds satisfies $m \ge \frac{96 \log n}{q s}$, then with probability $1 - 5n^{-1}$, the following holds: for any $\pi _0:S \rightarrow T$ that coincides with true permutation $\pi ^*$ on the seed set S, (i.e. $\pi _0 = \pi ^*|_S$) with $|S|=m$, Algorithm 3 with $\pi _0$ as the seed set and threshold $\kappa =\frac{1}{2} mqs$ outputs ${{\widehat{\pi }}} = \pi $.

We start by analyzing the first stage of Algorithm 3, which upgrades a partial (but correct) permutation $\pi _0: S \rightarrow T$ to a full permutation $\pi _1:[n] \rightarrow [n]$ with at most $O(\log n/q)$ errors, even if the seed set S is adversarially chosen.

Lemma 19

Assume $n\ge 2$, $m q s \ge 96 \log n$, and $s \ge 12q$. Recall the threshold $ \kappa =\frac{1}{2} mqs $ in Algorithm 3. Then with probability at least $1- 2n^{ -m }$, the following holds in Algorithm 3: for any partial permutation $\pi _0: S \rightarrow T$ such that $\pi _0 = \pi ^*|_S$ and $|S|=m$, $\pi _1$ is guaranteed to have at most $\frac{192\log n}{qs}$ errors with respect to $\pi ^*$, i.e., $|\{i\in [n]: \pi _1(i) \ne \pi ^*(i) \}| \le \frac{192\log n}{qs}$.

Proof (Proof of Lemma 19)

Without loss of generality, we assume $\pi ^*$ is the identity permutation.

Fix a seed set S of cardinality m. Since $\pi _0 = \pi ^*|_S$, it follows that

$$\begin{aligned} n_{ik} = \sum _{j \in S} A_{ij} B_{k \pi _0(j)} =\sum _{j \in S} A_{ij} B_{k \pi ^*(j)}. \end{aligned}$$

Recall that according to the definition of the weights in (35), we have

$$\begin{aligned} w(\pi ^*) = \sum _{i \in S^c} {{\mathbf {1}}_{\left\{ {n_{ii} \ge \kappa }\right\} }}. \end{aligned}$$

First, we show that

$$\begin{aligned} {\mathbb {P}}\left\{ w(\pi ^*) \le n -m- \frac{ 32 \log n}{qs} \right\} \le \exp \left( - 2 m \log n \right) , \end{aligned}$$

(170)

Indeed, for $i \in S^c$ we have $n_{i i} {{\mathop {\sim }\limits ^{\text {i.i.d.}}}}\mathrm{Binom}(m, qs)$. It follows from the Chernoff bound (165) that

$$\begin{aligned} {\mathbb {P}}\left\{ n_{ii} \le \kappa \right\} = {\mathbb {P}}\left\{ n_{ii} \le \frac{1}{2} mqs \right\} \le \exp \left( - \frac{1}{8} m q s \right) . \end{aligned}$$

Therefore,

$$\begin{aligned} (n-m)-w(\pi ^*) = \sum _{i \in S^c} {{\mathbf {1}}_{\left\{ {n_{ii} < \kappa }\right\} }} \overset{s.t.}{\le } \mathrm{Binom}\left( n-m, \exp \left( - \frac{1}{8} m q s \right) \right) . \end{aligned}$$

Using the following fact (which follows from a simple union bound)

$$\begin{aligned} {\mathbb {P}}\left\{ \mathrm{Binom}\left( n, p \right) \ge t \right\} \le \left( {\begin{array}{c}n\\ t\end{array}}\right) p^t, \end{aligned}$$

(171)

we get that

$$\begin{aligned} {\mathbb {P}}\left\{ (n-m)-w(\pi ^*) \ge t \right\}\le & {} \left( {\begin{array}{c}n-m\\ t\end{array}}\right) \exp \left( - \frac{t}{8} m q s \right) \le n^t \exp \left( - \frac{t}{8} m q s \right) \\\le & {} \exp \left( - \frac{t}{16} m q s \right) , \end{aligned}$$

where the last inequality holds due to the assumption that $mqs \ge 16 \log n$. Setting $t=\frac{ 32 \log n}{qs}$, we arrive at the desired (170).

Next, fix any permutation $\pi $ such that $\pi |_S = \pi _0$ and it has $\ell $ non-fixed points. Since by assumption $\pi _0=\pi ^* |_S$ and $\pi ^*$ is the identity permutation, it follows that $\pi (i)=i$ for all $i \in S$. Let $F = \{i \in S^c: \pi (i) = i\}$ denote the set of fixed points in $S^c$. Then $|F|=n-m-\ell $ and $|S^c\backslash F|=\ell $. Thus

$$\begin{aligned} w(\pi ) = \sum _{i \in F} {{\mathbf {1}}_{\left\{ {n_{ii} \ge \kappa }\right\} }} + \sum _{ i \in S^c \backslash F} {{\mathbf {1}}_{\left\{ {n_{i\pi (i)} \ge \kappa }\right\} }} \le n-m-\ell + \sum _{ i \in S^c\backslash F} {{\mathbf {1}}_{\left\{ {n_{i\pi (i)} \ge \kappa }\right\} }}. \end{aligned}$$

Note that for each $i \in S^c \backslash F$, $n_{i\pi (i)} \sim \mathrm{Binom}(m, q^2)$. Since by assumption $s \ge 12q$, it follows that $\kappa = mqs/2 \ge 6 m q^2$. Hence, the Chernoff bound (166) yields that for each $i \in S^c \backslash F$,

$$\begin{aligned} {\mathbb {P}}\left\{ n_{i \pi (i)} \ge \kappa \right\} \le 2^{- m q s/2 } \le \exp \left( - \frac{1}{4} m q s \right) . \end{aligned}$$

Note that $\{n_{i\pi (i)}: i \in S^c \backslash F\}$ are not mutually independent. For instance, $n_{i \pi (i)}$ and $n_{\pi (i), \pi (\pi (i))}$ are dependent. To deal with this dependency issue, we construct a subset ${{\mathcal {I}}}\subset S^c \backslash F$ with $|{{\mathcal {I}}}| \ge \ell /3$ such that $\{n_{i\pi (i)}: i \in {{\mathcal {I}}}\}$ are mutually independent. In particular, consider the canonical cycle decomposition of permutation $\pi |_{S^c \backslash F}$. Let ${{\mathcal {C}}}_1, \ldots , {{\mathcal {C}}}_{a}$ denote the cycles. Since $\pi $ has no fixed point in $S^c \backslash F$, each cycle ${{\mathcal {C}}}_i$ has length $\ell _i \ge 2$. Let $\varGamma $ denote the graph formed by the union of these cycles. Each cycle $C_i$ has an independent set ${{\mathcal {I}}}_i$ of size $\lfloor \ell _i /2 \rfloor \ge \ell _i/3$. Let ${{\mathcal {I}}}= \cup _{i=1}^a {{\mathcal {I}}}_i$. Then ${{\mathcal {I}}}$ is an independent set in $\varGamma $ and $|{{\mathcal {I}}}| \ge \sum _{i=1}^a \ell _i/3=\ell /3$. Since ${{\mathcal {I}}}$ is an independent set, it follows that $\{i, \pi (i)\} \cap \{j, \pi (j)\} =\emptyset $ for all $i \ne j \in {{\mathcal {I}}}$. Therefore, $\{n_{i\pi (i)}: i \in {{\mathcal {I}}}\}$ are mutually independent. Therefore,

$$\begin{aligned} \sum _{ i \in {{\mathcal {I}}}} {{\mathbf {1}}_{\left\{ {n_{i\pi (i)} \ge \kappa }\right\} }} \overset{s.t.}{\le } \mathrm{Binom}\left( |{{\mathcal {I}}}| , \exp \left( - \frac{1}{4} m q s \right) \right) . \end{aligned}$$

Note that

$$\begin{aligned} w(\pi ) \le n-m-\ell + \sum _{ i \in S^c\backslash F} {{\mathbf {1}}_{\left\{ {n_{i\pi (i)} \ge \kappa }\right\} }} \le n-m-|{{\mathcal {I}}}| + \sum _{i \in {{\mathcal {I}}}} {{\mathbf {1}}_{\left\{ {n_{i\pi (i)} \ge \kappa }\right\} }} \end{aligned}$$

Using (171) again, we have

$$\begin{aligned}&{\mathbb {P}}\left\{ w(\pi ) \ge n -m- \frac{ 32 \log n}{qs} \right\} \\&\quad \le {\mathbb {P}}\left\{ \sum _{ i \in {{\mathcal {I}}}} {{\mathbf {1}}_{\left\{ {n_{i\pi (i)} \ge \kappa }\right\} }} \ge |{{\mathcal {I}}}| -\frac{ 32 \log n}{qs} \right\} \\&\quad \le \left( {\begin{array}{c}|{{\mathcal {I}}}| \\ |{{\mathcal {I}}}| - \frac{ 32 \log n}{qs}\end{array}}\right) \exp \left( - \frac{1}{4} m q s \left( |{{\mathcal {I}}}| - \frac{ 32 \log n}{qs}\right) \right) \\&\quad \le 2^{ \ell } \exp \left( - \frac{1}{4} m q s \left( \frac{\ell }{3} - \frac{ 32 \log n}{qs}\right) \right) \le 2^{\ell } \exp \left( - \frac{1}{24} m q s \ell \right) , \end{aligned}$$

where the last inequality holds provided $\ell qs\ge 192 \log n$. Let $\varPi _\ell $ denote the set of permutations $\pi $ which has $\ell $ non-fixed points and satisfies $\pi |_S = \pi _0$. Then $|\varPi _\ell | \le \left( {\begin{array}{c}n-m\\ \ell \end{array}}\right) \ell ! \le n^\ell $. By the union bound, we have that for any $\ell \ge \frac{ 192 \log n}{qs}$,

$$\begin{aligned} {\mathbb {P}}\left\{ \max _{\pi \in \varPi _\ell } w(\pi ) \ge n -m- \frac{ 32 \log n}{qs} \right\} \le (2n)^{\ell } \exp \left( - \frac{1}{24} m q s \ell \right) \le \exp \left( - \frac{1}{48} m q s \ell \right) , \end{aligned}$$

where the last inequality holds due to the assumption that $mqs \ge 96 \log n$ and $n \ge 2$. Applying the union bound again over $\ell $, we get that

$$\begin{aligned} {\mathbb {P}}\left\{ \max _{\ell \ge \frac{ 192 \log n}{qs}} \max _{\pi \in \varPi _\ell } w(\pi ) \ge n -m- \frac{ 32 \log n}{qs} \right\}&\le \sum _{\ell \ge \frac{ 192 \log n}{qs}} \exp \left( - \frac{1}{48} m q s \ell \right) \\&\le \frac{ \exp \left( - 4 m \log n \right) }{ 1 - \exp \left( - 4 m \log n \right) } \\&\le \exp \left( - 2 m \log n \right) , \end{aligned}$$

where the last inequality holds due to $m \log n \ge \log 2$.

Combining the last displayed equation with (170) we get that with probability at least $1- 2 n^{-2m}$, $\pi _1$ has at most $192\log n/(qs)$ errors with respect to $\pi ^*$.

Finally, applying a simple union bound over all the $\left( {\begin{array}{c}n\\ m\end{array}}\right) \le n^m$ possible choices of seed set S with $|S|=m$, we complete the proof. $\square $

The second stage of Algorithm 3 upgrades an almost exact full permutation $\pi _1:[n] \rightarrow [n]$ to an exact full permutation ${\widehat{\pi }}: [n] \rightarrow [n]$. The following lemma provides a worst-case guarantee even if $\pi _1$ is adversarially chosen.

Lemma 20

Let $0 \le \ell \le n$. Assume that $(\ell -1) qs \ge 12 nq^2 +2 $ and $ (\ell -1) q s \ge 16 \max \{ 1, n-\ell \} \log n$. Then with probability at least $1-3n^{-1}$, the following holds for Algorithm 3: for any $\pi _1$ with at most $n-\ell $ errors with respect to the true permutation $\pi ^*$, we have ${\widehat{\pi }}=\pi ^*$.

Proof

Without loss of generality, we assume $\pi ^*$ is the identity permutation.

We first fix a permutation $\pi _1$ which has at least $\ell $ fixed points. Let $F \subset [n]$ denote the set of fixed points of $\pi _1$. Then $|F| \ge \ell $. Recall that

$$\begin{aligned} w_{ik} = \sum _{j \in [n]} A_{ij} B_{k \pi _1(j)}. \end{aligned}$$

Then for $i=k$,

$$\begin{aligned} w_{ii} \ge \sum _{j \in F \setminus \{i\} } A_{ij} B_{i j} \overset{s.t.}{\ge } \mathrm{Binom}( |F| -1, qs ). \end{aligned}$$

Similarly, for $i \ne k$, note that $A_{ij} B_{k \pi _1(j)} =0$ if $j=i$ or $j=\pi _1^{-1}(k)$. Thus, $w_{ik} = \sum _{j \in [n]\backslash \{i,\pi _1^{-1}(k)\} }A_{ij} B_{k \pi _1(j)}. $ Moreover, $A_{ij} B_{k \pi _1(j)} {{\mathop {\sim }\limits ^{\text {i.i.d.}}}}\mathrm{Bern}(q^2)$ for all $j \in [n]\backslash \{i,\pi _1^{-1}(k) , k \}$. Therefore,

$$\begin{aligned} w_{ik} \le \sum _{j \in [n]\backslash \{i,\pi _1^{-1}(k), k \} } A_{ij} B_{k \pi _1(j)} +1 \overset{s.t.}{\le } \mathrm{Binom}( n-2 , q^2 ) + 1. \end{aligned}$$

It follows from the Chernoff bound (165) that

$$\begin{aligned} {\mathbb {P}}\left\{ w_{ii} \le \frac{1}{2} (\ell -1) qs \right\}\le & {} {\mathbb {P}}\left\{ \mathrm{Binom}\left( |F| -1 , qs \right) \le \frac{1}{2} (\ell -1) qs \right\} \\\le & {} \exp \left( - \frac{1}{8} (\ell -1) q s \right) . \end{aligned}$$

Thus, by the union bound,

$$\begin{aligned} {\mathbb {P}}\left\{ \min _{i \in [n] } w_{ii} \le \frac{1}{2} (\ell -1) qs \right\} \le n \exp \left( - \frac{1}{8} (\ell -1) q s \right) \le \exp \left( - \frac{1}{16} (\ell -1) q s \right) , \end{aligned}$$

where the last inequality holds due to the assumption that $ (\ell -1) q s \ge 16 \log n$. Moreover, since by assumption $ (\ell -1) qs /2 -1 \ge 6 n q^2$, it follows that the Chernoff bound (166) that for any $i \ne k$,

$$\begin{aligned} {\mathbb {P}}\left\{ w_{ik} \ge \frac{1}{2} (\ell -1) qs \right\}\le & {} {\mathbb {P}}\left\{ \mathrm{Binom}(n-2, q^2) \ge \frac{1}{2} (\ell -1) qs -1 \right\} \\\le & {} 2^{ - (\ell -1) qs /2 +1 } \le 2\exp \left( -\frac{1}{4} (\ell -1) qs \right) . \end{aligned}$$

Thus, by the union bound again,

$$\begin{aligned} {\mathbb {P}}\left\{ \max _{i \ne k } w_{ik} \ge \frac{1}{2} (\ell -1) qs \right\} \le 2n^2 \exp \left( - \frac{1}{4} (\ell -1) q s \right) \le 2 \exp \left( - \frac{1}{8} (\ell -1) q s \right) . \end{aligned}$$

In conclusion, for a fixed permutation $\pi _1$ with at least $\ell $ fixed points, with probability at least $1-3\exp \left( - \frac{1}{8} (\ell -1) q s \right) $,

$$\begin{aligned} \min _{i \in [n] } w_{ii} > \max _{i \ne k } w_{ik}, \end{aligned}$$

and hence ${\widehat{\pi }} = \pi ^*$.

Finally, applying a simple union bound over all the $\left( {\begin{array}{c}n\\ n-\ell \end{array}}\right) (n-\ell )! \le n^{n-\ell }$ possible choices of permutation $\pi _1$ with at least $\ell $ fixed points, we get that even if $\pi _1$ is adversarially chosen, ${\widehat{\pi }} = \pi ^*$ with probability at least

$$\begin{aligned} 1- 3 n^{n-\ell } \exp \left( - \frac{1}{8} (\ell -1) q s \right) \ge 1- 3 \exp \left( - \frac{1}{16} (\ell -1) q s \right) \ge 1-3n^{-1}, \end{aligned}$$

where the first inequality holds due to $(\ell -1) qs \ge 16(n-\ell ) \log n$ and the last inequality holds due to $(\ell -1) qs \ge 16 \log n$. $\square $

We now prove Lemma 18:

Proof (Proof of Lemma 18)

In view of Lemma 19, we get that with probability at least $1- 2n^{ -m }$, $\pi _1$ is guaranteed to have at most $192 \log n/(qs)$ errors with respect to $\pi ^*$, even if $\pi _0$, or equivalently the seed set S, is adversarially chosen.

We next apply Lemma 20 with $\ell = n- 192 \log n/(qs)$. In view of the assumption $ n (qs)^2 \ge 2^{11} \times 3 \log ^2 n$ and $n \ge 4$, we have $(\ell -1) \ge n/2$. Thus $(\ell -1) qs \ge n qs /2 \ge 16 \log n$, and $(\ell -1) qs \ge nq s /2 \ge 12 nq^2+2$ in view of $s \ge 30 q$ and $nqs \ge 20$. Moreover, $(\ell -1) qs \ge n qs /2 \ge 2^{10} \times 3 \log ^2 n / (qs) = 16(n-\ell ) \log n$. Therefore, all assumptions of Lemma 20 are satisfied. It follows from Lemma 20 that with probability at least $1-3n^{-1}$, ${\widehat{\pi }}=\pi ^*$, even if $\pi _1$ is adversarially chosen.

In conclusion, we get that with probability at least $1-5n^{-1}$, Algorithm 3 with $\pi _0$ as the seed set outputs ${{\widehat{\pi }}} = \pi $. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ding, J., Ma, Z., Wu, Y. et al. Efficient random graph matching via degree profiles. Probab. Theory Relat. Fields 179, 29–115 (2021). https://doi.org/10.1007/s00440-020-00997-4

Download citation

Received: 22 March 2019
Revised: 19 August 2020
Published: 25 September 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s00440-020-00997-4

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exact matching of random graphs with constant correlation

Spectral Graph Matching and Regularized Quadratic Relaxations II

Spectral Graph Matching and Regularized Quadratic Relaxations I Algorithm and Gaussian Analysis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A Auxiliary results

Theorem 5

Appendix B Analysis for seeded graph matching

Lemma 18

Lemma 19

Proof (Proof of Lemma 19)

Lemma 20

Proof

Proof (Proof of Lemma 18)

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

Efficient random graph matching via degree profiles

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exact matching of random graphs with constant correlation

Spectral Graph Matching and Regularized Quadratic Relaxations II

Spectral Graph Matching and Regularized Quadratic Relaxations I Algorithm and Gaussian Analysis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A Auxiliary results

Theorem 5

Appendix B Analysis for seeded graph matching

Lemma 18

Lemma 19

Proof (Proof of Lemma 19)

Lemma 20

Proof

Proof (Proof of Lemma 18)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation