Abstract
Online social networks (OSNs) have become increasingly popular on the web in recent years. There are millions of users on these networks, and they generate a great deal of interaction among them. Network sampling is often used to study and measure OSNs, which produce a small, accessible network with a limited number of edges and nodes. In this paper, two algorithms are proposed to sample OSNs. The first algorithm finds several spanning trees with the highest number of nodes that satisfy a given degree constraint from randomly chosen root nodes; edges and nodes of computed spanning trees are ranked based on how often they appear in them. Finally, a fraction of the highly ranked nodes and edges are considered in the sampled network. Second, a partial spanning tree algorithm is proposed in place of a full spanning tree algorithm to improve the performance of the first algorithm. We conduct several experiments on well-known real networks to determine the performance of the proposed sampling algorithms. As a result of the experiments, the proposed algorithms outperformed some of the baseline and recently presented algorithms in terms of the Kolmogorov-Smirnov (KS) test, Skew Divergence (SD) distance, and Normalized Distance (ND).
Similar content being viewed by others
Data availability
Data is provided within the supplementary information files.
Notes
A subgraph \(H\) is an induced subgraph of \(G\) if for any two vertices \(u v\) in \(H\), \(u\) and \(v\) are adjacent in \(H\) if and only if they are adjacent in G.
References
Ahmed NK, Neville J, Kompella R (2014b) Network sampling: from static to streaming graphs. ACM Trans Knowl Discov Data (TKDD) 8:7
Ahmed NK, Berchmans F, Neville J, Kompella R (2010) Time-based sampling of social network activity graphs. In: Proceedings of the Eighth workshop on mining and learning with graphs. ACM, pp 1–9
Ahmed NK, Duffield N, Neville J, Kompella R (2014a) Graph sample and hold: A framework for big-graph analytics. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1446–1455
Ansari N, Cheng G, Krishnan RN (2004) Efficient and reliable link state information dissemination. IEEE Commun Lett 8:317–319
Bearman PS, Moody J, Stovel K (2004) Chains of affection: the structure of adolescent romantic and sexual networks1. Am J Sociol 110:44–91
Bellur B, Ogier RG (1999) A reliable, efficient topology broadcast protocol for dynamic networks. In: INFOCOM’99. Eighteenth annual joint conference of the IEEE computer and communications societies. Proceedings. IEEE. IEEE, pp 178–186
Blagus N, Šubelj L, Weiss G, Bajec M (2015) Sampling promotes community structure in social and information networks. Physica A 432:206–215
Blomsma N, de Rooy B, Gerritse F et al (2022) Minimum spanning tree analysis of brain networks: a systematic review of network size effects, sensitivity for neuropsychiatric pathology, and disorder specificity. Netw Neurosci 6:301–319
Ebadi Jokandan SM, Bayat P, Farrokhbakht Foumani M (2021) CS- and GA-based hybrid evolutionary sampling algorithm for large-scale social networks. Soc Netw Anal Min 11:120. https://doi.org/10.1007/s13278-021-00836-x
Erdos P, Rényi A (1960) On the evolution of random graphs. Publ Math Inst Hung Acad Sci 5:17–61
Gao Q, Ding X, Pan F, Li W (2014) An improved sampling method of complex network. Int J Mod Phys C 25:1440007
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness, 1st edn. W. H Freeman, San Francisco
Gile KJ, Handcock MS (2010) Respondent-driven sampling: an assessment of current methodology. Sociol Methodol 40:285–327
Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in Facebook: a case study of unbiased sampling of OSNs. In: Proceedings IEEE INFOCOM 2010. San Diego, CA, pp 1–9
Hill RJ (1999) International comparisons using spanning trees. In: International and interarea comparisons of income, Output, and Prices. University of Chicago Press, pp 109–120
Jalali ZS, Rezvanian A, Meybodi MR (2016) Social network sampling using spanning trees. Int J Mod Phys C 27:1650052
James F (2006) Statistical methods in experimental physics. World Scientific
Jaouadi M, Romdhane LB (2021) A distributed model for sampling large scale social networks. Expert Syst Appl 186:115773
Kurant M, Markopoulou A, Thiran P (2011b) Towards unbiased BFS sampling. IEEE J Sel Areas Commun 29:1799–1809
Kurant M, Gjoka M, Butts CT, Markopoulou A (2011a) Walking on a graph with a magnifying glass: stratified sampling via weighted random walks. In: Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems. ACM, pp 281–292
Kurant M, Gjoka M, Wang Y, et al (2012) Coarse-grained topology estimation via graph sampling. In: Proceedings of the 2012 ACM workshop on Workshop on online social networks. ACM, pp 25–30
Lee L (2001) On the effectiveness of the skew divergence for statistical language analysis. In: AISTATS. Citeseer
Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data (TKDD) 1:1–41
Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math 6:29–123
Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, Philadelphia, pp 631–636
Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pp 177–187
Liu X, Zhang M, Fiumara G, De Meo P (2022) Complex network hierarchical sampling method combining node neighborhood clustering coefficient with random walk. New Gener Comput 40:765–807. https://doi.org/10.1007/s00354-022-00179-x
Lovász L (1993) Random walks on graphs: a survey. Comb Paul Erdos Eighty 2:1–46
Luo Q, Xie Z, Liu Y et al (2024) Sampling hypergraphs via joint unbiased random walk. World Wide Web 27:15. https://doi.org/10.1007/s11280-024-01253-8
Maiya AS, Berger-Wolf TY (2011) Benefits of bias: towards better characterization of network sampling. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 105–113
Murai F, Ribeiro B, Towsley D, Wang P (2013) On set size distribution estimation and the characterization of large networks via sampling. IEEE J Sel Areas Commun 31:1017–1025
Papagelis M, Das G, Koudas N (2013) Sampling online social networks. IEEE Trans Knowl Data Eng 25:662–676
Peng L, Yongli L, Chong W (2014) Towards cost-efficient sampling methods. http://arxiv.org/abs/arXiv:14055756
Piña-García CA, Gu D (2013) Spiraling facebook: an alternative Metropolis–Hastings random walk using a spiral proposal distribution. Soc Netw Anal Min 3:1403–1415
Rezvanian A, Meybodi MR (2015) Sampling social networks using shortest paths. Physica A 424:254–268
Rezvanian A, Rahmati M, Meybodi MR (2014) Sampling from complex networks using distributed learning automata. Physica A 396:224–234
Rezvanian A, Moradabadi B, Ghavipour M, et al (2019) Social network sampling. In: Learning automata approach for social networks. Springer, pp 91–149
Ribeiro B, Figueiredo D, de Souza e Silva E, Towsley D (2011) Characterizing continuous-time random walks on dynamic networks. In: Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems. ACM, pp 151–152
Ribeiro B, Wang P, Murai F, Towsley D (2012) Sampling directed graphs with random walks. In: Proceedings IEEE INFOCOM. Orlando, FL, pp 1692–1700
Roohollahi S, Khatibi Bardsiri A, Keynia F (2022) Sampling in weighted social networks using a levy flight-based learning automata. J Supercomput 78:1458–1478. https://doi.org/10.1007/s11227-021-03905-2
Santos FC, Pacheco JM (2005) Scale-free networks provide a unifying framework for the emergence of cooperation. Phys Rev Lett 95:098104. https://doi.org/10.1103/PhysRevLett.95.098104
Siciliano MD, Yenigun D, Ertan G (2012) Estimating network structure via random sampling: cognitive social structures and the adaptive threshold method. Soc Netw 34:585–600
Son S-W, Christensen C, Bizhani G et al (2012) Sampling properties of directed networks. Phys Rev E 86:046104
Sundar S, Singh A, Rossi A (2012) New heuristics for two bounded-degree spanning tree problems. Inf Sci 195:226–240. https://doi.org/10.1016/j.ins.2012.01.037
Tewarie P, Van Dellen E, Hillebrand A, Stam CJ (2015) The minimum spanning tree: an unbiased method for brain network analysis. Neuroimage 104:177–188
Wang H, Lu J (2013) Detect inflated follower numbers in OSN using star sampling. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining. ACM, pp 127–133
Wejnert C, Heckathorn DD (2008) Web-based network sampling: efficiency and efficacy of respondent-driven sampling for online research. Sociological Methods and Research
White DR, Newman M (2001) Fast approximation algorithms for finding node-independent paths in networks
Woolhouse ME, Dye C, Etard JF et al (1997) Heterogeneities in the transmission of infectious agents: implications for the design of control programs. Proc Natl Acad Sci USA 94:338–342. https://doi.org/10.1073/pnas.94.1.338
Yoon S-H, Kim K-N, Hong J et al (2015) A community-based sampling method using DPL for online social networks. Inf Sci 306:53–69
Funding
No funding.
Author information
Authors and Affiliations
Contributions
AR and ZSJ proposed the original idea, developed the code, designed, performed the simulation experiments, wrote the main manuscript text, and prepared Figs. 1, 2, 3, 4, 5, 6, 7, 8, 9 and S.M.V. participated in the coordination of the study. All authors planned the work, analyzed the results, and reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Ethical approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Rezvanian, A., Vahidipour, S.M. & Jalali, Z.S. A spanning tree approach to social network sampling with degree constraints. Soc. Netw. Anal. Min. 14, 101 (2024). https://doi.org/10.1007/s13278-024-01247-4
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-024-01247-4