Abstract
Social network analysis has drawn the attention of many researchers recently. As the advance of communication technologies, the scale of social networks grows rapidly. To capture the characteristics of very large social networks, graph sampling is an important approach that does not require visiting the entire network. Prior studies on graph sampling focused on preserving the properties such as degree distribution and clustering coefficient of a homogeneous graph, where each node and edge is treated equally. However, a node in a social network usually has its own attribute indicating a specific group membership or type. For example, people are of different races or nationalities. The link between individuals from the same or different types can thus be classified to intra- and inter-connections. Therefore, it is important whether a sampling method can preserve the node and link type distribution of the heterogeneous social networks. In this paper, we formally address this issue. Moreover, we apply five algorithms to the real Twitter data sets to evaluate their performance. The results show that respondent-driven sampling works well even if the sample sizes are small while random node sampling works best only under large sample sizes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Navlakha, S., Rastogi, R., Shrivastava, N.: Graph summarization with bounded error. In: Proc. of ACM SIGMOD Int. Conf. on Management of Data, pp. 419–432 (2008)
Gibson, D., Kumar, R., Tomkins, A.: Discovering large dense subgraphs in massive graphs. In: Proc. of Int. Conf. on Very Large Data Bases, p. 732 (2005)
Raghavan, S., Garcia-Molina, H.: Representing web graphs. In: Proc. of IEEE Int. Conf. on Data Engineering, pp. 405–416 (2003)
Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Extracting large-scale knowledge bases from the web. In: Proc. of Int. Conf. on Very Large Data Bases, pp. 639–650 (1999)
Li, C.T., Lin, S.D.: Egocentric Information Abstraction for Heterogeneous Social Networks. In: Proc. of Int. Conf. on Advances in Social Network Analysis and Mining, pp. 255–260 (2009)
Tian, Y., Hankins, R., Patel, J.: Efficient aggregation for graph summarization. In: Proc. of ACM SIGMOD Int. Conf. on Management of Data, pp. 567–580 (2008)
Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: Proc. of ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p. 636 (2006)
Hübler, C., Kriegel, H., Borgwardt, K., Ghahramani, Z.: Metropolis algorithms for representative subgraph sampling. In: Proc. of IEEE Int. Conf. on Data Mining, pp. 283–292 (2008)
Ma, H., Gustafson, S., Moitra, A., Bracewell, D.: Ego-centric Network Sampling in Viral Marketing Applications. In: Int. Conf. on Computational Science and Engineering, pp. 777–781 (2009)
Heckathorn, D.: Respondent-driven sampling: a new approach to the study of hidden populations. Social problems 44, 174–199 (1997)
Choudhury, M.D.: Social datasets by munmun de choudhury (2010), http://www.public.asu.edu/~mdechoud/datasets.html
Krishnamurthy, V., Faloutsos, M., Chrobak, M., Lao, L., Cui, J.-H., Percus, A.G.: Reducing large internet topologies for faster simulations. In: Boutaba, R., Almeroth, K.C., Puigjaner, R., Shen, S., Black, J.P. (eds.) NETWORKING 2005. LNCS, vol. 3462, pp. 328–341. Springer, Heidelberg (2005)
Heckathorn, D.: Respondent-driven sampling II: deriving valid population estimates from chain-referral samples of hidden populations. Social Problems 49, 11–34 (2002)
Lovász, L.: Random walks on graphs: A survey. Combinatorics, Paul Erdos is Eighty 2, 1–46 (1993)
Kemeny, J.G., Snell, J.L.: Finite Markov Chains, pp. 69–72. Springer, Heidelberg (1960)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, JY., Yeh, MY. (2011). On Sampling Type Distribution from Heterogeneous Social Networks. In: Huang, J.Z., Cao, L., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 6635. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20847-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-20847-8_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20846-1
Online ISBN: 978-3-642-20847-8
eBook Packages: Computer ScienceComputer Science (R0)