Abstract
A network evolution with predicted tail and extremal indices of PageRank and the Max-Linear Model used as node influence indices in random graphs is considered. The tail index shows a heaviness of the distribution tail. The extremal index is a measure of clustering (or local dependence) of the stochastic process. The cluster implies a set of consecutive exceedances of the process over a sufficiently high threshold. Our recent results concerning sums and maxima of non-stationary random length sequences of regularly varying random variables are extended to random graphs. Starting with a set of connected stationary seed communities as a hot spot and ranking them with regard to their tail indices, the tail and extremal indices of new nodes that are appended to the network may be determined. This procedure allows us to predict a temporal network evolution in terms of tail and extremal indices. The extremal index determines limiting distributions of a maximum of the PageRank and the Max-Linear Model of newly attached nodes. The exposition is provided by algorithms and examples. To validate our theoretical results, our simulation and real data study concerning a linear preferential attachment as a tool for network growth are provided.
Similar content being viewed by others
References
Asmussen, S., & Foss, S. (2018). Regular variation in a fixed-point problem for single- and multi-class banching processes and queues. Branching Processes and Applied Probability. Papers in Honour of Peter Jagers. Advances in Applied Probability, 50A, 47–61. https://doi.org/10.1017/apr.2018.69.
Bagrow, J. P., & Brockmann, D. (2013). Natural emergence of clusters and bursts in network evolution. Physical Review X, 021016. https://doi.org/10.1103/PhysRevX.3.021016.
Banerjee, S., & Olvera-Cravioto, M. (2021). Pagerank asymptotics on directed preferential attachment networks. arXiv:2102.08894v1
Beirlant, J., Goegebeur, Y., Teugels, J., & Segers, J. (2004). Statistics of extremes: Theory and applications. Chichester: Wiley.
Bollobás, B., & Riordan, O. (2006). Percolation. Cambridge: Cambridge University Press.
Censor-Hillel K., & Shachnai H. (2010). Partial information spreading with application to distributed maximum coverage. In Proceedings of the 29th ACM SIGACT-SIGOPS symposium on principles of distributed computing (PODC ’10) (pp. 161–170) ACM. New York. https://doi.org/10.1145/1835698.1835739
Chen, N., Litvak, N., & Olvera-Cravioto, M. (2014a). Ranking algorithms on directed configuration networks. arXiv:1409.7443v2
Chen, N., Litvak, N., & Olvera-Cravioto, M. (2014b). PageRank in scale-free random graphs. In WAW 2014, LNCS 8882, ed. A. Bonato et al. (pp. 120–131). Switzerland: Springer. https://doi.org/10.1007/978-3-319-13123-8.
Clauset, A., Newman, M. E. J., & Moore, C. (2004). Finding community structure in very large networks. Physical Review E, 70(6), 066111. https://doi.org/10.1103/PhysRevE.70.066111
Clauset, A., Shalizi, K. R., & Newman, M. E. J. (2009). Power-law distributions in empirical data. SIAM Review, 51(4), 661–703. https://doi.org/10.1137/070710111
Coscia, M., Giannotti, F., & Pedreschi, D. (2011). A classification for community discovery methods in complex networks. Statistical Analysis and Data Mining: The ASA Data Science Journal, 4(5), 512–546. https://doi.org/10.1002/sam.10133
da Cruz, J. P., & Lind, P. G. (2013). The bounds of heavy-tailed return distributions in evolving complex networks. Physics Letters A,377, 189–194.
Drees, H., Janssen, A., Resnick, S. I., & Wang, T. (2020). On a minimum distance procedure for threshold selection in tail analysis. SIAM J. Math. Data Sci., 2(1), 75–102. https://doi.org/10.1137/19M1260463
Dugué N., & Perez A. (2015). Directed Louvain: maximizing modularity in directed networks. [Research Report] Université d’Orléans. hal-01231784.
Ferreira, M. (2018). Heuristic tools for the estimation of the extremal index: A comparison of methods. REVSTAT - Statistical Journal, 16(1), 115–136.
Ferro, C. A. T., & Segers, J. (2003). Inference for clusters of extreme values. Journal of the Royal Statistical Society, 65, 545–556.
Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3), 75–174.
Fortunato, S., Boguna, M., Flammini, A., & Menczer, F. (2011). On local estimations of PageRank: A mean field approach. Internet Mathematics, 4(2–3), 245–266.
Fukutome, S., Liniger, M. A., & Süveges, M. (2015). Automatic threshold and run parameter selection: A climatology for extreme hourly precipitation in Switzerland. Theoretical and Applied Climatology, 120, 403–416. https://doi.org/10.1007/s00704-014-1180-5
Garavaglia, A., van der Hofstad, R., & Litvak, N. (2020). Local weak convergence for PageRank. The Annals of Applied Probability, 30(1), 40–79. https://doi.org/10.1214/19-AAP1494
Ghoshal, G., Chi, L., & Barabási, A. L. (2013). Uncovering the role of elementary processes in network evolution. Scientific Reports, 3, 2920.
Gissibl, N., & Klüppelberg, C. (2018). Max-linear models on directed acyclic graphs. Bernoulli, 24(4A), 2693–2720.
Goldaeva, A. A. (2013). Indices of multivariate recurrent stochastic sequences. In A. N. Shiryaev (Ed.), Modern problem of mathematics and mechanics VIII(3), Moscow State University, 42–51. (in Russian).
Holme P., & Litvak N. (2017). Cost-efficient vaccination protocols for network epidemiology. PLoS Computational Biology, 13(9) https://doi.org/10.1371/journal.pcbi.1005696.
Jelenkovic, P. R., & Olvera-Cravioto, M. (2010). Information ranking and power laws on trees. Advances in Applied Probability, 42(4), 1057–1093. https://doi.org/10.1239/aap/1293113151
Jelenkovic, P. R., & Olvera-Cravioto, M. (2015). Maximums on trees. Stochastic Processes and their Applications, 125, 217–232. https://doi.org/10.1016/j.spa.2014.09.004
Jessen, A. H., & Mikosch, T. (2006). Regularly varying functions. Publ. Inst. Math. (Beograd) (N.S.), 80, 171–192. https://doi.org/10.2298/PIM0694171H.
Krapivsky, P. L., & Redner, S. (2001). Organization of growing random networks. Physical Review E, 63, 066123. https://doi.org/10.1103/PhysRevE.63.066123
Langville, A. N., & Meyer, C. D. (2006). Google’s PageRank and beyond: The science of search engine rankings. Princeton: Princeton University Press.
Leadbetter, M. R., Lingren, G., & Rootzén, H. (1983). Extremes and related properties of random sequence and processes. ch.3, New York: Springer.
Lebedev, A. V. (2015). Activity maxima in some models of information networks with random weights and heavy tails. Problems of Information Transmission, 51(1), 66–74.
Leskovec, J., Lang, K., Dasgupta, A., & Mahoney, M. (2009). Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1), 29–123. https://doi.org/10.1080/15427951.2009.10129177
Litvak, N., Scheinhardt, W. R. W., & Volkovich, Y. (2007). In-degree and PageRank: Why do they follow similar power laws? Internet Mathematics, 4(2–3), 175–198. https://doi.org/10.1080/15427951.2007.10129293
Markovich, N. M. (2007). Nonparametric analysis of univariate heavy-tailed data: Research and practice. Chichester, West Sussex: Wiley.
Markovich, N. M. (2017). Clustering and hitting times of threshold exceedances and applications. International Journal of Data Analysis Techniques and Strategies, 9(4), 331–347. https://doi.org/10.1504/IJDATS.2017.088360
Markovich, N. M. (2021). Extremes of sums and maxima with application to random networks. In Proceedings 5th international conference on stochastic methods 2020 ICSM5 November 23–270, 2020 Moscow, Russia, pp. 107–112. arXiv:2110.04120
Markovich, N. M. (2022). Weighted maxima and sums of non-stationary random length sequences in heavy-tailed models. arXiv:2209.08485v [math.ST].
Markovich, N. M., & Rodionov, I. V. (2020a). Maxima and sums of non-stationary random length sequences. Extremes, 23 (3), 451–464. https://doi.org/10.1007/s10687-020-00372-5
Markovich, N. M., & Rodionov, I. V. (2020b). Threshold selection for extremal index estimation. arXiv:2009.02318
Markovich, N. M., Ryzhov, M., & Krieger, U. R. (2017). Nonparametric analysis of extremes on web graphs: pagerank versus max-linear model. Communications in Computer and Information Science, 700, 13–26.
Markovich, N. M., Ryzhov, M., & Vaičiulis, M. (2022). Tail index estimation of PageRanks in evolving random graphs. Mathematics, 10(16), 3026.
McCormick, D. A., & Contreras, D. (2001). On the cellular and network bases of epileptic seizures. Annual Review of Physiology, 63, 815. https://doi.org/10.1146/annurev.physiol.63.1.815
McElroy, T., & Politis, D. N. (2007). Moment-based tail index estimation. Journal of Statistical Planning and Inference, 137, 1389–1406. https://doi.org/10.1016/j.jspi.2006.04.002
Mester, A., Pop, A., Mursa, B.-E.-M., Grebla, H., Diosan, L., & Chira, C. (2021). Network analysis based on important node selection and community detection. Mathematics, 9, 2294.
Mosk-Aoyama D., & Shah D. (2006). Computing separable functions via gossip. In Proceedings of the twenty-fifth annual ACM symposium on principles of distributed computing (PODC ’06). (pp. 113–122). ACM. New York, USA.
Newman, M. E. J. (2018). Networks: An introduction (2nd ed.). Oxford: Oxford University Press.
Norros, I., & Reittu, H. (2006). On a conditionally poissonian graph process. Advances in Applied Probability (SGSA), 38, 59–75. https://doi.org/10.1239/aap/1143936140
Novak, S. Y. (2002). Inference of heavy tails from dependent data. Siberian Advances in Mathematics, 12(2), 73–96. https://doi.org/10.1016/C2015-0-01492-7
Olvera-Cravioto, M. (2012). Asymptotics for weighted random sums. Advances in Applied Probability, 44(4), 1142–1172. https://doi.org/10.1239/aap/1354716592
Pandurangan, G., Raghavan, P. & Upfal, E. (2002). Using PageRank to characterize web structure. In O. H. Ibarra, L. Zhang (Eds.) Computing and combinatorics. COCOON 2002. LNCS 2387. (pp. 330–339). Springer, Berlin.
Resnick, S. I., & Stǎricǎ, C. (1999). Smoothing the moment estimate of the extreme value parameter. Extremes, 1(3), 263–294. https://doi.org/10.1023/A:1009925716617
Robert, C. Y. (2009). Inference for the limiting cluster size distribution of extreme values. The Annals of Statistics, 37, 271–310.
Robert, C. Y., & Segers, J. (2008). Tails of random sums of a heavy-tailed number of light-tailed terms. Insurance: Mathematics and Economics, 43, 85–92. https://doi.org/10.1016/j.insmatheco.2007.10.001.
Rootzén, H. (1988). Maxima and exceedances of stationary Markov chains. Advances in Applied Probability, 20, 371–390. https://doi.org/10.2307/1427395
Samorodnitsky, G., Resnick, S., Towsley, D., Davis, R., Willis, A., & Wan, P. (2016). Nonstandard regular variation of in-degree and out-degree in the preferential attachment model. Journal of Applied Probability, 53(1), 146–161. https://doi.org/10.1017/jpr.2015.15
Schroeder, D. T., Langguth, J., Burchard, L., Pogorelov, K., & Lind, P. G. (2022). The connectivity network underlying the German’s Twittersphere: A testbed for investigating information spreading phenomena. Scientific Reports, 12, 4085. https://doi.org/10.1038/s41598-022-07961-3
Shen, C., Priebe, C. E., & Vogelstein, J. T. (2020). From distance correlation to multiscale graph correlation. Journal of the American Statistical Association, 115(529), 280–291. https://doi.org/10.1080/01621459.2018.1543125
Süveges, M., & Davison, A. C. (2010). Model misspecification in peaks over threshold analysis. The Annals of Applied Statistics, 4(1), 203–221. https://doi.org/10.1214/09-AOAS292
Tillier, C., & Wintenberger, O. (2018). Regular variation of a random length sequence of random variables and application to risk assessment. Extremes, 21, 27–56. https://doi.org/10.1007/s10687-017-0297-1
Volkovich Y., Litvak N., & Zwart B. (2008). Measuring extremal dependencies in Web graphs. In WWW ’08: Proceedings of the 17th international conference on World Wide WebApril. (pp. 1113–1114). https://doi.org/10.1145/1367497.1367682.
Volkovich, Y. V., & Litvak, N. (2010). Asymptotic analysis for personalized web search. Advances in Applied Probability, 42(2), 577–604. https://doi.org/10.1239/aap/1275055243
Wan, P., Wang, T., Davis, R. A., & Resnick, S. I. (2020). Are extreme value estimation methods useful for network data? Extremes, 23, 171–195. https://doi.org/10.1007/s10687-019-00359-x
Wang, T., & Resnick, S. I. (2019). Consistency of Hill estimators in a linear preferential attachment model. Extremes, 22, 1–28. https://doi.org/10.1007/s10687-018-0335-7
Wang, T., & Resnick, S. I. (2020). Degree growth rates and index estimation in a directed preferential attachment model. Stochastic Processes and their Applications, 130(2), 878–906. https://doi.org/10.1016/j.spa.2019.03.021
Xiong, J., Shen, C., Arroyo, J. & Vogelstein, J. (2020). Graph independence testing. arXiv:1906.03661.
Acknowledgements
The author was supported by the Russian Science Foundation (grant No. 22-21-00177). The author would like to thank anonymous reviewers for useful comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
The PageRank and max-linear model
PR (Langville & Meyer, 2006) and the MLM may be considered as node influence characteristics (Gissibl & Klüppelberg, 2018; Markovich et al., 2017). The PR R of a randomly chosen Web page (a node in the Web graph) is viewed as a r.v.. It was considered as the solution to the fixed-point problem
in Jelenkovic and Olvera-Cravioto (2010), Volkovich and Litvak (2010). \(=^D\) denotes equality in distribution. The r.v.s \(\{R_j\}\) are assumed to be iid copies of R and \(E(Q)<1\) holds. \((Q,N,\{A_j\})\) is a real-valued vector, \(\{A_j\}\) are independent non-negative r.v.s. distributed as some r.v. A with \(E(A)<1\). N denotes the in-degree of a node. Q is a personalization value of the vertex. Under the assumptions (we shall call them Assumptions A) that \(\{R_j\}\) are regularly varying iid and independent of \((Q, N, \{A_j\})\) with \(\{A_j\}\) independent of (N, Q), N is regularly varying r.v., and that N and Q are allowed to be dependent, it is stated in Jelenkovic and Olvera-Cravioto (2010), Volkovich and Litvak (2010) that the stationary distribution of R in (16) is regularly varying and its tail index is determined by the most heavy-tailed distributed term in the pair (N, Q). The approach implicitly assumes that the underlying graph is an infinite tree, an assumption that is not plausible in real-world networks. In Chen et al. (2014a, 2014b), the behavior of the PR is considered on a directed configuration model, which is a tree-like graph in a sense that the first loop is observed at a distance of order \(\log n\), where n is the size of the graph. It is derived that the PR in the latter model is well approximated by the PR of the root node of a suitably constructed tree as \(n\rightarrow \infty \).
In the same way, a MLM is considered as the ’minimal/endogeneous’ solution of the equation
(Jelenkovic & Olvera-Cravioto, 2015). Assuming that all r.v.s in the triple \((R_j,Q,N)\) are regularly varying and mutually independent and \(\{R_j\}\) are iid, PR R was proved to have a regularly varying tail in Jelenkovic and Olvera-Cravioto (2010), Volkovich and Litvak (2010) at the directed configuration model. A similar statement was proved in Jelenkovic and Olvera-Cravioto (2015) with regard to the MLM.
Important results from extreme value analysis
We recall the theorems derived in Markovich (2022) that are important for the prediction of the tail and extremal indices of evolving random graphs. These theorems generalize Theorems 3 and 4 in Markovich and Rodionov (2020a). The latter state the conditions when the sequences of sums \(Y_{n}(z, N_n)\) and maxima \(Y_{n}^*(z, N_n)\) (see (3)) have the same tail and extremal indices. There are the following constrains for these statements. The slowly varying functions \(\{\ell _i(x)\}\) in (2) are uniformly upper bounded in i by a polynomial function, i.e. for all constants \(A>1\), \(\delta >0\) there exists \(x_0(A, \delta )\) such that for all \(i\ge 1\)
holds. Despite \(N_n\) is integer-valued, one can accept a distribution with regularly varying tail with tail index \(\alpha >0\) as a relevant model for \(N_n\), i.e. it holds
\({\tilde{\ell }}_n(x)\) is a slowly varying function. This model is motivated in several papers, see Jessen and Mikosch (2006), Robert and Segers (2008) and Volkovich and Litvak (2010) among them.
In Markovich and Rodionov (2020a) it is assumed that there is a unique “column” sequence with a minimum tail index \(k_1<k\), \(k:= \lim _{n\rightarrow \infty } \inf _{2\le i\le l_n} k_i\), and \(N_n\) has a lighter tail than \(Y_{n,i}\), i.e. it holds
where the sequence of thresholds \(u_n\) is taken as \(u_n=yn^{1/k_1}\ell _1^{\sharp }(n)\), \(y>0\), \(\ell ^{\sharp }(x)\) is the de Brujin conjugate of \(\ell (x)\), the sequence \(l_n\) satisfies
and \(\chi \) satisfies
An arbitrary dependence between “column” sequences and between \(\{Y_{n,i}\}\) and \(\{N_n\}\) is allowed. In Theorem 4 in Markovich (2022) recalled here in Theorem 2 the number d of “column” sequences with a minimum tail index is allowed to be random. This is realistic for random graphs since a random number of communities considering as the “column” sequences may have a minimum tail index (or a tail index close to that).
Let us recall the following conditions for a fixed \(d>1\) proposed in Markovich (2022).
-
(A1)
The stationary sequences \(\{Y_{n,i}\}_{n\ge 1}\), \(i\in \{1,\ldots ,d\}\) are mutually independent, and independent of the sequences \(\{Y_{n,i}\}_{n\ge 1}\), \(i\in \{d+1,\ldots ,l_n\}\).
-
(A2)
Assume \(\{Y_{n,i}\}_{n\ge 1}\), \(i\in \{1,\ldots ,d\}\) satisfy the following conditions as \(x\rightarrow \infty \)
$$\begin{aligned}\frac{P\{Y_{n,i}>x\}}{x^{-k_1}\ell _1(x)}\rightarrow & {} c_i, ~~i\in \{1,\ldots ,d\}, \end{aligned}$$for some non-negative numbers \(c_i\),
$$\begin{aligned}\frac{P\{Y_{n,i}>x, Y_{n,j}>x\}}{x^{-k_1}\ell _1(x)}\rightarrow & {} 0, ~~ i\ne j, ~~i,j\in \{1,\ldots ,d\}. \end{aligned}$$ -
(A3)
Assume that for each \(n\ge 1\) there exists \(i\in \{1,\ldots ,d\}\) such that
$$\begin{aligned} \!\!\!\!\!\!\!\!{} & {} P\{\max _{1\le j\le d, j\ne i}(z_{j}Y_{n,j})>x, z_{i}Y_{n,i}\le x\} =o(P\{z_iY_{n,i}>x\}),~~x\rightarrow \infty \end{aligned}$$holds.
-
(A4)
Assume that there exists \(i\in \{1,\ldots ,d\}\) such that it holds
$$\begin{aligned} \!\!\!\!\!{} & {} P\{\max _{1\le j\le d, j\ne i}(z_{j}M_n^{(j)})>u_n, z_{i}M_n^{(i)}\le u_n\} =o(1),~~n\rightarrow \infty . \end{aligned}$$(23)
Let us denote \(M_n^{(i)}= \max \{Y_{1,i}, Y_{2,i},\ldots ,Y_{n,i}\}, ~i\in \{1,..,l_n\}\).
Theorem 2
(Markovich, 2022) Let the sets of slowly varying functions \(\{{\tilde{\ell }}_n(x)\}_{n\ge 1}\) in (19) and \(\{\ell _i(x)\}_{i\ge 1}\) in (2) satisfy the condition (18), and (20), (21), (22) hold. Assume that d and \(\{Y_{n,i}\}\) are independent.
-
(i)
Let d be a bounded discrete r.v. such that \(1<d<d_n = \min (C, l_n)\), \(C>1\) holds.
-
(a)
If (A1) or (A2) for any \(d\in \{2,3,\ldots ,\lfloor d_n-1\rfloor \}\) holds and \(N_n\) and \(\{Y_{n,i}\}\) are independent, then \(Y_{n}(z,N_n)\) and \(Y^*_{n}(z,N_n)\) have the tail index \(k_1\). If, instead of (A1) and (A2), (A3) holds, then \(Y_n^*(z,N_n)\) has the same tail index.
-
(b)
If (A4) where in (23) d is replaced by \(\lfloor d_n-1\rfloor \) holds, then \(Y_n^*(z,N_n)\) has the extremal index \(\theta _i\). If, in addition, (A1) (or (A2)) for any \(d\in \{2,3,\ldots ,\lfloor d_n-1\rfloor \}\) holds, then \(Y_n(z,N_n)\) has the same extremal index.
-
(a)
-
(ii)
Suppose that \(d>1\) is a bounded discrete r.v. equal to a positive integer a.s.. Then all statements of Item (i) are fulfilled.
It follows by Example 2 in Markovich (2022) that (23) is valid for all d “column” sequences such that
holds. Theorem 2 means that if there are a random number d of “column” sequences with a minimum tail index, then \(Y^*_{n}(z,N_n)\) has the extremal index \(\theta _i\) of the ith “column” satisfying (23), \(1\le i\le d\). If the latter “column” sequences are independent (see, (A1)) or weakly dependent (see, (A2)), then \(Y_{n}(z,N_n)\) has the same extremal index. For random networks this implies that the extremal index of the MLMs of the newly appended nodes is equal to the extremal index of the community with the minimum tail index that has a largest maximum PR among d dominating communities. The extremal index of the PRs of newly appended nodes is the same if the dominating communities satisfy (A1) or (A2). Since the communities are not enumerated, their maxima can be reordered as (24). The statements of Theorem 2 are asymptotic. Thus, the approximation can be applied for sufficiently large size communities.
In case of different pair-wise dependency among elements of the d “column” series with the minimum tail index, the extremal index of the maxima \(Y^*_{n}(z,N_n)\) and sums \(Y_{n}(z,N_n)\) may not exist due to a non-stationarity of these sequences.
Remark 2
The theorems in Markovich and Rodionov (2020a), Markovich (2022) are valid if there are non-zero elements in each row corresponding to the “column” sequences \(\{Y_{n,i}: n\ge 1\}\) with minimum tail index. If the most heavy-tailed column is unique and at least one element in the latter sequence is equal to zero, the sequences of the sums and maxima of the “row” elements are not stationary. This feature plays a role for graphs.
Corollary 1
The statements of Theorem 2 remain valid if the tail indices \(\{k_{n,i}\}\) of the elements in the “columns” \(\{Y_{n,i}: n\ge 1\}\) are different, apart of those columns with the minimum tail index.
Corollary 1 is very important for practice. It implies that the columns with non-minimum tail indices may be non-stationary distributed and hence, their extremal indices may not exist. Its proof follows from the proofs of Theorem 3 in Markovich and Rodionov (2020a) and Theorems 3 and 4 in Markovich (2022). The columns with the minimum tail index impact on the distribution and dependence structure of the sequence of sums and maxima over rows.
Dependence structures in graphs
We have to investigate the dependence of PRs of two communities. One of the approaches is to consider Pearson’s correlation of two r.v.s belonging to two graphs. Each r.v. shows whether there is an edge between two nodes in a graph or not. Each edge may be sampled iid from a Bernoulli distribution with some parameter p (Xiong et al., 2020). A distance correlation is an extension of Pearson’s correlation both to linear and nonlinear associations between two r.v.s or random vectors (Shen et al., 2020). It takes values in [0, 1]. The distance correlation equal to zero does imply independence. Since nodes can be enumerated arbitrarily, the distance correlation has to be combined with a permutation test to check the dependence hypothesis. The distance correlation is calculated first for an original pair of vectors. It is compared with those ones calculated by shuffles of these vectors.
In contrast to Shen et al. (2020), in our setting pairs of observations relating to two stationary distributed communities can be dependent and not necessarily identically distributed. One can use the distance correlation and the permutation test with regard to the row-column pairs. The \(p-\)value of the permutation test is the proportion of the number of the correlation measures from the samples with permutated pairs that are larger than the distance correlation that was calculated from the original data.
To measure dependencies in heavy tailed graph data using statistical inference for multivariate regular variation one can apply the polar coordinate transform to the examined random vectors \(\{X_i\}\) and \(\{Y_i\}\), \(i=1,\ldots ,n\) (Resnick & Stǎricǎ 1999; Volkovich et al., 2008; Wan et al. 2020). One can estimate the empirical distribution function (edf) of the angular coordinates for the k largest values of the radial coordinate. The total dependence (or total independence) corresponds to the concentration of the edf to \(\pi /4\) (or, to 0 or \(\pi /2\)). A Starica plot can be used to find a suitable value of k.
Preferential attachment
Let us consider a network growth where each node is attached to a small seed network at a unit time. The well-known tool is a linear PA. A node i can be attached randomly to existing nodes according to a probability \(P_{PA}(i)=d_i/\sum _{s=1}^N d_s\) proportional to its degree \(d_i\), or the number of its neighbors, where N is the number of nodes. Nodes i and j may be connected with probability \(d_id_j/\sum _{s=1}^N d_s\) (Norros & Reittu, 2006). A kind of PA with a Poisson random number of new edges to the new vertex is proposed in Norros and Reittu (2006). The PA provides the “rich-get-richer” mechanism since earlier appearing nodes may increase their numbers of edges longer. This property leads to a power-law degree distribution \(P(i)\sim i^{-(1+\alpha )}\) of node degrees (Newman, 2018; Wan et al., 2020). In Wan et al. (2020) it is derived that the linear and superstar linear PA models on directed graphs lead to networks with power-law distributed in- and out-degrees.
The \(\alpha -\), \(\beta -\) and \(\gamma -\)schemes of the linear PA provide proportions of new nodes with incoming (outgoing) links to (from) existing nodes (scheme \(\alpha -\) (\(\gamma -\))) or the directed edges between pairs of existing nodes (\(\beta -\)scheme) (Samorodnitsky et al., 2016; Wan et al., 2020). Let \(I_n(v)\) and \(O_n(v)\) be in- and out-degree of vertex \(v\in V_n\) in a graph \(G_n\), n and N(n) denote the numbers of edges and nodes in \(G_n\), respectively. Appending a new node v to an existing graph \(G_{n-1}\), one can select one of three scenarios by generating an iid sequence of trinomial r.v.s with cells marked 1, 2, 3 with probabilities \(\alpha , \beta , \gamma \). The probability to generate the edge \(v\rightarrow w\) from v to an existing node w is given by
by the \(\alpha -\)scheme; between the existing nodes v and w
by the \(\beta -\)scheme; from the existing node w to v
by the \(\gamma -\)scheme, where \(\delta _{in}\) and \(\delta _{out}\) are parameters of the PA method. The latter may be estimated by the semi-parametric extreme value method based on the maximum-likelihood method (Wan et al., 2020).
Proof of Theorem 1
Proof
Part (i) follows by Theorem 4 in Markovich and Rodionov (2020a). We start the induction with \(m=1\). The columns of matrix \(A^{(1)}\) of the first iteration are obtained using submatrices of \(A^{(0)}\) (7) for different j by recursions (5) and (6). By the latter theorem the jth columns \(\{Y^{(1)}_{i,j}\}\) and \(\{X^{(1)}_{i,j}\}\) have the same tail indices \(\{k^{(0)}_j\}\) and the same extremal indices \(\{\theta ^{(0)}_j\}\), \(j\ge 1\). Getting matrices \(A^{(2)}, A^{(3)},...\) for the next iterations both for sums and maxima similarly we obtain the same pairs of indices \((k^{(0)}_j,\theta ^{(0)}_j)\) for their jth columns.
Part (ii), (a). The independence condition (A1) for \(A^{(0)}\) is valid since communities (the “column” series of \(A^{(0)}\)) are nearly disconnected. The condition (A2) follows by (A1). The random number of communities \(N_n\) is evidently independent on the PRs of nodes within the communities. Then the first \(d_1^{(0)}\) “column” series of matrix \(A^{(1)}\) have the tail index \(k_1^{(0)}\) by Theorem 2. The next \(d_2^{(0)}\) columns have the tail index \(k_2^{(0)}>k_1^{(0)}\), etc. The columns of \(A^{(1)}\) have the same tail indices as \(A^{(0)}\).
Note that the “column” series \(\{Y^{(m)}_{i,j}\}\) and \(\{X^{(m)}_{i,j}\}\) of \(A^{(m)}\), \(m\ge 1\) are dependent due to their definition as partial sums and maxima of row elements of \(A^{(m-1)}\) by the “domino” principle. Hence, we have \(Y^{(m)}_{i,1}\ge Y^{(m)}_{i,2}\ge ...\ge Y^{(m)}_{i,N_n}\) and \(X^{(m)}_{i,1}\ge X^{(m)}_{i,2}\ge ...\ge X^{(m)}_{i,N_n}\). Elements of matrices \(A^{(m)}\), \(m\ge 1\) may be represented by elements of \(A^{(0)}\). Really, for \(m=2\) we have
and similarly for \(X^{(2)}_{n,i}\). Considering weights \(z_j=j\) in (28) as in (3), we obtain by Theorem 2 that the first \(d_1^{(0)}\) sequences \(\{Y^{(2)}_{n,i}\}\) and \(\{X^{(2)}_{n,i}\}\) have the tail index \(k_1^{(0)}\), the next \(d_2^{(0)}\) ones - \(k_2^{(0)}\), etc. in the same way as “column” series of matrix \(A^{(1)}\). The same is valid for \(A^{(m)}\) with \(m>2\) by induction.
Part (ii), (b). In order to find the extremal indices of the “column” series of \(A^{(1)}\), let us enumerate the \(d_j^{(0)}\) columns (the communities) in an descending order of the PR maxima of \(A^{(0)}\) over columns, i.e. \(M_n^{(1)}\ge M_n^{(2)}\ge ...\ge M_n^{(d_j^{(0)})}\) for each fixed value of r.v. \(d_j^{(0)}\in \{2,3,\ldots ,\lfloor d_n-1\rfloor \) and \(j\in \{1,2,...\}\). Then (A4) is fulfilled consistently for \(i\in \{1,2,\ldots ,d_j^{(0)}-1\}\). By Theorem 2 the first \(d_j^{(0)}\), \(j\ge 1\) “column” series \(\{X_{i,j}^{(1)}\}_{i\ge 1}\) of \(A^{(1)}\) have the extremal indices \(\theta _{d_{j-1}^{(0)}+1}^{(0)},\ldots , \theta ^{(0)}_{d_{j-1}^{(0)}+d_{j}^{(0)}}\) for any values of \(d_j^{(0)}\in \{2,3,\ldots ,\lfloor d_n-1\rfloor \}\), \(d_0^{(0)}=0\). Since, in addition, (A1) (or (A2)) for \(A^{(0)}\) and any \(d_j^{(0)}\in \{2,3,\ldots ,\lfloor d_n-1\rfloor \}\) holds, then \(\{Y_{i,j}^{(1)}\}_{i\ge 1}\) have the same extremal indices. Since elements of \(A^{(m)}\) may be represented as weighted sums or maxima of elements of \(A^{(0)}\), the extremal indices of the “column” series of \(A^{(m)}\), \(m\ge 1\) are the same as ones of \(A^{(0)}\). \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Markovich, N. Extremal properties of evolving networks: local dependence and heavy tails. Ann Oper Res 339, 1839–1870 (2024). https://doi.org/10.1007/s10479-023-05175-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-023-05175-y