[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

An efficient heuristic for betweenness estimation and ordering

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Centrality measures, erstwhile popular amongst the sociologists and psychologists, have seen broad and increasing applications across several disciplines of late. Amongst a plethora of application-specific definitions available in the literature to rank the vertices, closeness centrality, betweenness centrality and eigenvector centrality (page-rank) have been the widely applied ones. We are surrounded by networks where information, signal or commodities are flowing through the edges. Betweenness centrality comes as a handy tool to analyze such systems, but computation of these scores is a daunting task in large-size networks. Since computing the betweenness centrality of one node is conjectured to be same as time taken for computing the betweenness centrality of all the nodes, a fast algorithm was required that can efficiently estimate a node’s betweenness score. In this paper, we propose a heuristic that efficiently estimates the betweenness score of a given node. The algorithm incorporates a non-uniform node-sampling model which is developed based on the analysis of random Erdős-Rényi graphs. We apply the heuristic to estimate the ranking of an arbitrary k vertices, called betweenness-ordering problem, where k is much less than the total number of vertices. The proposed heuristic produces very efficient results even when runs for a linear time in the number of edges. An extensive experimental evidence is presented to demonstrate the performance of the proposed heuristic for betweenness estimation and ordering on several synthetic and real-world graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Agarwal M, Singh RR, Chaudhary S, Iyengar S (2015) An efficient estimation of a node’s betweenness. Complex networks VI. Springer, Berlin, pp 111–121

    Google Scholar 

  • Anthonisse J.M (1971) The rush in a directed graph. Stichting Mathematisch Centrum. Mathematische Besliskunde (BN 9/71), 1–10

  • Bader DA, Kintali S, Madduri K, Mihail M (2007) Approximating betweenness centrality. In: Proceedings of the 5th International Conference on Algorithms and Models for the Web-graph, WAW’07, Springer, Berlin, pp 124–137

  • Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512

    Article  MathSciNet  MATH  Google Scholar 

  • Batagelj V, Mrvar A (2006) Pajek datasets. http://vlado.fmf.uni-lj.si/pub/networks/data

  • Bauckhage C, Kersting K, Rastegarpanah B (2013) The Weibull as a model of shortest path distributions in random networks. In: Proc. Int. Workshop on Mining and Learning with Graphs, Chicago, IL, USA

  • Bergamini E, Meyerhenke H (2015) Fully-dynamic approximation of betweenness centrality. arXiv preprint arXiv:1504.07091

  • Bergamini E, Meyerhenke H, Staudt CL (2014) Approximating betweenness centrality in large evolving networks. arXiv preprint arXiv:1409.6241

  • Borgatti SP, Li X (2009) On social network analysis in a supply chain context*. J Supply Chain Manag 45(2):5–22

    Article  Google Scholar 

  • Brandes U (2001) A faster algorithm for betweenness centrality. J Math Sociol 25(2):163–177. https://doi.org/10.1080/0022250X.2001.9990249

    Article  MATH  Google Scholar 

  • Brandes U, Erlebach T (2005) Network analysis: methodological foundations, vol 3418. Springer, Berlin

    MATH  Google Scholar 

  • Brandes U, Pich C (2007) Centrality estimation in large networks. Int J Bifurcation Chaos 17(07):2303–2318. https://doi.org/10.1142/S0218127407018403

    Article  MathSciNet  MATH  Google Scholar 

  • Carvalho R, Buzna L, Bono F, Gutiérrez E, Just W, Arrowsmith D (2009) Robustness of trans-european gas networks. Phys Rev E 80(1):016,106

    Article  Google Scholar 

  • Chehreghani MH (2014) An efficient algorithm for approximate betweenness centrality computation. Comput J

  • Davis TA, Hu Y (2011) The university of florida sparse matrix collection. ACM Trans Math Softw (TOMS) 38(1):1

    MathSciNet  MATH  Google Scholar 

  • Derrible S (2012) Network centrality of metro systems. PloS One 7(7):e40,575

    Article  Google Scholar 

  • Dijkstra EW (1959) A note on two problems in connexion with graphs. Numerische Mathematik 1(1):269–271

    Article  MathSciNet  MATH  Google Scholar 

  • Eppstein D, Wang J (2004) Fast approximation of centrality. J Graph Algorithms Appl 8:39–45

    Article  MathSciNet  MATH  Google Scholar 

  • Ercsey-Ravasz M, Toroczkai Z (2010) Centrality scaling in large networks. Phys Rev Lett 105(3):038701

    Article  Google Scholar 

  • Ercsey-Ravasz M, Lichtenwalter RN, Chawla NV, Toroczkai Z (2012) Range-limited centrality measures in complex networks. Phys Rev E 85(6):066103

    Article  Google Scholar 

  • Erdos D, Ishakian V, Bestavros A, Terzi E (2014) A divide-and-conquer algorithm for betweenness centrality. arXiv preprint arXiv:1406.4173

  • Erdos P, Renyi A (1959) On random graphs I. Publ Math Debrecen 6:290–297

    MathSciNet  MATH  Google Scholar 

  • Floyd RW (1962) Algorithm 97: shortest path. Commun ACM 5(6):345

    Article  Google Scholar 

  • Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry 40(1):35–41

    Article  Google Scholar 

  • Geisberger R, Sanders P, Schultes D (2008) Better approximation of betweenness centrality, chap. 8, pp 90–100. https://doi.org/10.1137/1.9781611972887.9

  • Gkorou D, Pouwelse J, Epema D, Kielmann T, van Kreveld M, Niessen W (2010) Efficient approximate computation of betweenness centrality. In: 16th annual conf. of the Advanced School for Computing and Imaging (ASCI 2010)

  • Goel K, Singh RR, Iyengar S (2013) A faster algorithm to update betweenness centrality after node alteration. Algorithms and models for the web graph. Springer, Berlin, pp 170–184

    Chapter  Google Scholar 

  • Green O, McColl R, Bader D (2012) A fast algorithm for streaming betweenness centrality. In: Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Confernece on Social Computing (SocialCom), pp 11–20. https://doi.org/10.1109/SocialCom-PASSAT.2012.37

  • Hage P, Harary F (1995) Eccentricity and centrality in networks. Soc Netw 17(1):57–63. https://doi.org/10.1016/0378-8733(94)00248-9

    Article  Google Scholar 

  • Jackson MO (2008) Social and economic networks. Princeton University Press, Princeton

    MATH  Google Scholar 

  • Joy MP, Brock A, Ingber DE, Huang S (2005) High-betweenness proteins in the yeast protein interaction network. BioMed Res Int 2005(2):96–103

    Google Scholar 

  • Kas M, Wachs M, Carley KM, Carley LR (2013) Incremental algorithm for updating betweenness centrality in dynamically growing networks. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM ’13, ACM, New York, pp 33–40. https://doi.org/10.1145/2492517.2492533

  • Kintali S (2008) Betweenness centrality: algorithms and lower bounds. arXiv preprint arXiv:0809.1906

  • Knuth DE (1998) The art of computer programming: sorting and searching, vol 3. Pearson Education, London

    MATH  Google Scholar 

  • Lee C-Y (2006) Correlations among centrality measures in complex networks. arXiv preprint arXiv:physics/0605220

  • Le Merrer E, Le Scouarnec N, Trédan G (2014) Heuristical top-k: fast estimation of centralities in complex networks. Inf Process Lett 114(8):432–436

    Article  MathSciNet  MATH  Google Scholar 

  • Lee MJ, Choi S, Chung CW (2016) Efficient algorithms for updating betweenness centrality in fully dynamic graphs. Inf Sci 326:278–296

    Article  MathSciNet  MATH  Google Scholar 

  • Lee M.J, Lee J, Park JY, Choi RH, Chung CW (2012) Qube: a quick algorithm for updating betweenness centrality. In: Proceedings of the 21st International Conference on World Wide Web, WWW ’12, ACM, New York, pp 351–360. https://doi.org/10.1145/2187836.2187884

  • Leskovec J, Krevl A (2014) SNAP datasets: Stanford large network dataset collection. http://snap.stanford.edu/data

  • Lim YS, Menasche D.S, Ribeiro B, Towsley D, Basu P (2011) Online estimating the k central nodes of a network. IEEE Network Science Workshop, pp 118–122. http://doi.ieeecomputersociety.org/10.1109/NSW.2011.6004633

  • Lu TC, Zhang Y, Allen DL, Salman MA (2011) Design for fault analysis using multi-partite, multi-attribute betweenness centrality measures

  • Mizgier KJ, Jüttner MP, Wagner SM (2013) Bottleneck identification in supply chain networks. Int J Prod Res 51(5):1477–1490

    Article  Google Scholar 

  • Narayanan S (2005) The betweenness centrality of biological networks. Ph.D. thesis, Virginia Polytechnic Institute and State University

  • Nasre M, Pontecorvi M, Ramachandran V (2013) Betweenness centality–incremental and faster. arXiv preprint arXiv:1311.2147

  • Newman M (2010) Networks: an introduction. Oxford University Press Inc, New York

    Book  MATH  Google Scholar 

  • Riondato M, Kornaropoulos EM (2014) Fast approximation of betweenness centrality through sampling. In: Proceedings of the 7th ACM international conference on Web search and data mining, pp 413–422

  • Sariyüce AE, Saule E, Kaya K, Çatalyürek Ü.V (2013) Shattering and compressing networks for betweenness centrality. In: SIAM Data Mining Conference (SDM)

  • Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z (2008) Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 990–998

  • Tizghadam A, Leon-Garcia A (2010) Betweenness centrality and resistance distance in communication networks. Network 24(6):10–16

    Google Scholar 

  • Valente TW, Coronges K, Lakon C, Costenbader E (2008) How correlated are network centrality measures? Connections (Toronto, Ont.) 28(1):16

    Google Scholar 

  • Van Der Hofstad R (2009) Random graphs and complex networks. http://www.win.tue.nl/rhofstad/NotesRGCN.pdf

  • Wang X (2011) Deciding on the type of the degree distribution of a graph (network) from traceroute-like measurements

  • Warshall S (1962) A theorem on boolean matrices. J ACM (JACM) 9(1):11–12

    Article  MathSciNet  MATH  Google Scholar 

  • Welzl E (1991) Smallest enclosing disks (balls and ellipsoids). Springer, Berlin

    Book  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the IIT Ropar HPC committee for providing the resources for performing experiments. They also would like to thank the Malgudi team at IIT Ropar for their comments to improve the presentation of the paper. The authors would like to thank the referees for helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rishi Ranjan Singh.

Additional information

This is an expanded and extended version of the results appeared in CompleNet 2015 Agarwal et al. (2015).

Appendices

Appendix 1: Average error and efficiency vs number of nodes in graphs (n)

In this section, we present the experimental results exhibiting the scalability of the proposed heuristic. The plots in this section compare the average error in the estimation of betweenness score and the average efficiency in ordering the nodes based on the betweenness scores with respect to the number of nodes in graphs. We generated graphs with \(n=100\) to \(n=1000\) with a step of 100 while keeping the average degree constant. For each n, we generated 5 graphs and took mean of the average error and efficiency over all the 5 graphs. Fig. 5 contains the plots. The plot in Fig. 5a depicts the change in the average error and plot in Fig. 5b represents the change in the average efficiency by changing the number of nodes in graphs while keeping the average degree of the graphs as a constant. We plotted the results for the Erdős-Rényi random (G(np)) graphs with \(\hbox {average degree} = \{3, 5, 10\}\) and for Barabasi-Albert random scale-free (H(nk)) graphs with \(\hbox {average degree} = \{4, 6, 10\}\). These graphs have been discussed in Sect. 6.1.1. From these plots, we infer that the average error in the computation of betweenness score decreases and the average efficiency in ordering the nodes based on betweenness score increases with an increase in the number of nodes in graphs.

Fig. 5
figure 5

Iterative performance of the proposed heuristic when the number of nodes in graphs vary. The x axis represents the number on node (n). The y axis in plot (a) represents the average error in the estimation of betweenness centrality. The y axis in plot b represents the average efficiency in ordering the nodes based on their betweenness scores. Here, G(n; p) represent the Erdös-Rényi random graph and H(n; k) represents Barabasi-Albert random scale-free graph as discussed in Section 6.1.1

Fig. 6
figure 6

Performance of the proposed heuristic for k-betweenness-ordering where \(k=2,3,5,10\). The x axis in plot a represents the synthetic network instances and in plot b represents the real-world network instances. The y axis in both the plots a, b represents the average efficiency in ordering k nodes based on their betweenness scores. The different color bars represent the performance for various considered values of k

Appendix 2: Average efficiency in k-betweenness-ordering vs k

In this section, we plot the average efficiency of the proposed heuristic for ordering k nodes in respect of the considered synthetic and real-world graphs. We calculate the average efficiency for k-betweenness-ordering as following. In a given graph, we randomly pick a set of k nodes and calculate the efficiency in ordering these k nodes based on the formula given in Sect. 6.2.2. We do this for 1000 iterations and take the mean of the efficiency in these 1000 iterations.

The plots in Fig. 6 show the average efficiency of the extension of Algorithm 2 for k-betweenness-ordering on various synthetic networks (Fig. 6a) and real-world networks (Fig. 6b). Label BO denotes the efficiency of betweenness-ordering. Labels 3-BO, 5-BO, and 10-BO denotes the average efficiency of the proposed heuristic for k-betweenness-ordering when \(k=3,5,10\) respectively. The plots show that the performance of the extended Algorithm 2 for k-betweenness-ordering is nearly same as the performance of Algorithm 2 for the betweenness-ordering of two nodes.

Fig. 7
figure 7

Performance of the proposed heuristic on considered synthetic networks (with 1000 nodes) in terms of the average relaxed efficiency (\(\xi ^t\)) when the threshold parameter \(t=0,2,3,5,10\). The x axis in plot represents the synthetic network instances. The y axis in the plot represents the average relaxed efficiency (as defined in Sect. 6.2.2) in ordering nodes based on their betweenness scores. The different color bars represent the performance for various considered values of t

Appendix 3: Average relaxed efficiency (\(\xi ^t\)) vs t

In this section, we inspect the plot between the relaxed efficiency \(\xi ^t\) and the threshold parameter t (as discussed in Sect. 6.2.2) on some synthetic networks with 1000 nodes. Similar results are achieved on all considered networks. The relaxed efficiency is computed by the formula given in Sect. 6.2.2. We vary t for \(t=2, 3, 5, 10\). The plots are compiled in Fig. 7. For each t, we generate 5 synthetic networks and then calculate the relaxed efficiency on each graph and average them to get average relaxed efficiency. We plotted average relaxed efficiency for both type of considered synthetic graphs (ER and BA). BO, 2-E, 3-E, 5-E, 10-E are the labels assumed for average relaxed efficiency when \(t=0, 2, 3, 5, 10\) respectively. The plots demonstrate that the efficiency increases with an increase in t, i.e., in a real-world situation where relaxation is allowed in ordering nodes, the proposed heuristic will perform much better than its usual performance. The relaxation in ordering means that the error in the betweenness-ordering of the nodes with approximately same betweenness rank is ignored. It considers an error only if the nodes with the difference in their betweenness ranks greater than a threshold value (t) are ordered incorrectly.

Fig. 8
figure 8

Spearman rank correlation between the exact ranking of all the nodes based on their betweenness scores produced by Brandes’s algorithm (Brandes 2001) and rankings due to the proposed heuristic and considered competitive estimation algorithms. The x axis in plot a represents the synthetic network instances and in plot b represents the real-world network instances. The y axis in both the plots a, b represents the average Spearman rank correlation between the exact ranking and ranking due to an estimation algorithm. The different color bars represent the performance in ranking of various estimation algorithms

3.1 Correlation in ordering/average efficiency

In this section, we compare the performance of ranking all the nodes based on their estimated betweenness scores using the proposed heuristic and considered competitive estimation algorithms. The performance is evaluated in terms of the average Spearman rank correlation (\(\rho\)) with the exact ranking produced by Brandes’s algorithm (Brandes 2001). Spearman rank correlation \(\rho\) is a standard ranking correlation that measures the similarity between the ranking generated by two approaches. We plot the average Spearman rank correlation between the results produced by Algorithm 2 and the other competitive approaches on the considered synthetic and real-world networks. The plots are in Fig. 8.

Correlation of the proposed heuristic is very high (very close to 1) for almost all the considered networks. It outperforms all node-sampling based algorithms. In very dense networks, 2-BC sometimes produces a better result than our approach, but it should be noted that the difference in correlations are minute even when 2-BC takes a lot more time than our approach. In sparse networks, the proposed heuristic produces much better results than 2-BC in relatively smaller amount of time.

Appendix 4: Corroborating experimental evidence of linear time running for efficient ordering

In this section, we show experimentally that the expected time for betweenness-ordering heuristic is linear with the number of edges (m). If the average degree is constant, then the average ordering time is also shown linear in the number of nodes (n). We have picked Gnutella-family of networks from Leskovec and Krevl (2014). This family comprises of 9 different snapshots of Gnutella peer-to-peer file sharing network from August 2002. The details of the networks and betweenness-ordering results on the networks are summarized in table 8. The columns in the table contain name of the network, number of nodes (n), average degree of nodes in the network(AVG.D.), average ordering time over 500 random pairs of nodes (Avg. time) in seconds, and average efficiency in ordering the 500 randomly picked pairs of nodes based on their betweenness score (Avg. Efficiency) respectively. The average betweenness-ordering for all the networks in this family of networks is above 97% which is surely a high accuracy in very less ordering time.

Table 8 Average betweenness-ordering time and efficiency in Gnutella-family of networks from Leskovec and Krevl (2014)

The plots of the results are shown in Fig. 9. The x axis represents either the number of nodes or the number of edges and y axis denotes the average time taken for ordering two nodes based on their betweenness score. It is clear from the plots that the average betweenness-ordering time is linear with the number of edges. Here ordering time is also linear with the number of edges. It is due to the constant and small average degree of nodes in all of the networks in the considered network-family which made the number of edges as linear to the number of nodes. These plots are the corroborating experimental evidence of the proposed heuristic’s linear running time on real-world graphs. This result also concludes that scaling of graphs in the terms of the number of edges will linearly affect the ordering time and thus our ordering algorithm will be very quick and accurate than any other possible algorithms.

Fig. 9
figure 9

Average betweenness-ordering time of the proposed heuristic on Gnutella-family of networks. The x axis in plot a represents the number of nodes and in plot b represents the number of edges. The y axis represents the average time taken for ordering two nodes based on their betweenness scores

4.1 Eccentricity ordering

We illustrate the centrality-ordering problem in the context of a centrality measure called the eccentricity measure and give a simple approximation approach for eccentricity ordering (Fig. 10).

Fig. 10
figure 10

Eccentricity ordering in 2-D Euclidean plane

The eccentricity of a node v (Hage and Harary 1995) in a connected graph G is defined as the shortest distance to the farthest node from v in G. Center of a graph which is a solution to the facility location problems, is calculated by picking the nodes with least eccentricity. Finding eccentricity of all nodes is as expensive as finding closeness, betweenness or stress centrality for all nodes in respect of time. Given a graph in the two-dimensional Euclidean space, if we were to solve the eccentricity ordering problem of two nodes in that graph without computing the eccentricities, we would go about the following way: drawing a minimum circle (Disk) covering all the nodes is a very well known problem called smallest-circle problem or minimum covering circle problem. A linear (O(n)) time randomized algorithm by Welzl (1991) can find the smallest circle covering n points on a 2-D Euclidean plane. Once we find the smallest circle, an approximate solution to the eccentricity comparison problem is to compare the distance from the center of the smallest circle to the nodes. If the nodes are evenly distributed in the smallest circle, then the node closer to the center of that smallest circle is likely to have smaller eccentricity and vice versa. Therefore, the eccentricity ordering problem can be estimated in linear time as opposed to finding it the conventional way by considering all possible distances from the given vertex to all other vertices. A theoretical bound on the ordering efficiency of the above scheme is still open.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, R.R., Iyengar, S.R.S., Chaudhary, S. et al. An efficient heuristic for betweenness estimation and ordering. Soc. Netw. Anal. Min. 8, 66 (2018). https://doi.org/10.1007/s13278-018-0542-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-018-0542-x

Keywords

Navigation