On Sampling Type Distribution from Heterogeneous Social Networks

Jhao-Yin Li²² &
Mi-Yen Yeh²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6635))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2540 Accesses
5 Citations

Abstract

Social network analysis has drawn the attention of many researchers recently. As the advance of communication technologies, the scale of social networks grows rapidly. To capture the characteristics of very large social networks, graph sampling is an important approach that does not require visiting the entire network. Prior studies on graph sampling focused on preserving the properties such as degree distribution and clustering coefficient of a homogeneous graph, where each node and edge is treated equally. However, a node in a social network usually has its own attribute indicating a specific group membership or type. For example, people are of different races or nationalities. The link between individuals from the same or different types can thus be classified to intra- and inter-connections. Therefore, it is important whether a sampling method can preserve the node and link type distribution of the heterogeneous social networks. In this paper, we formally address this issue. Moreover, we apply five algorithms to the real Twitter data sets to evaluate their performance. The results show that respondent-driven sampling works well even if the sample sizes are small while random node sampling works best only under large sample sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Triangle-induced and degree-wise sampling over large graphs in social networks

Article 08 November 2024

Extension Sampling Designs for Big Networks: Application to Twitter

Sampling Online Social Networks for Analysis Purpose

References

Navlakha, S., Rastogi, R., Shrivastava, N.: Graph summarization with bounded error. In: Proc. of ACM SIGMOD Int. Conf. on Management of Data, pp. 419–432 (2008)
Google Scholar
Gibson, D., Kumar, R., Tomkins, A.: Discovering large dense subgraphs in massive graphs. In: Proc. of Int. Conf. on Very Large Data Bases, p. 732 (2005)
Google Scholar
Raghavan, S., Garcia-Molina, H.: Representing web graphs. In: Proc. of IEEE Int. Conf. on Data Engineering, pp. 405–416 (2003)
Google Scholar
Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Extracting large-scale knowledge bases from the web. In: Proc. of Int. Conf. on Very Large Data Bases, pp. 639–650 (1999)
Google Scholar
Li, C.T., Lin, S.D.: Egocentric Information Abstraction for Heterogeneous Social Networks. In: Proc. of Int. Conf. on Advances in Social Network Analysis and Mining, pp. 255–260 (2009)
Google Scholar
Tian, Y., Hankins, R., Patel, J.: Efficient aggregation for graph summarization. In: Proc. of ACM SIGMOD Int. Conf. on Management of Data, pp. 567–580 (2008)
Google Scholar
Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: Proc. of ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p. 636 (2006)
Google Scholar
Hübler, C., Kriegel, H., Borgwardt, K., Ghahramani, Z.: Metropolis algorithms for representative subgraph sampling. In: Proc. of IEEE Int. Conf. on Data Mining, pp. 283–292 (2008)
Google Scholar
Ma, H., Gustafson, S., Moitra, A., Bracewell, D.: Ego-centric Network Sampling in Viral Marketing Applications. In: Int. Conf. on Computational Science and Engineering, pp. 777–781 (2009)
Google Scholar
Heckathorn, D.: Respondent-driven sampling: a new approach to the study of hidden populations. Social problems 44, 174–199 (1997)
Article Google Scholar
Choudhury, M.D.: Social datasets by munmun de choudhury (2010), http://www.public.asu.edu/~mdechoud/datasets.html
Krishnamurthy, V., Faloutsos, M., Chrobak, M., Lao, L., Cui, J.-H., Percus, A.G.: Reducing large internet topologies for faster simulations. In: Boutaba, R., Almeroth, K.C., Puigjaner, R., Shen, S., Black, J.P. (eds.) NETWORKING 2005. LNCS, vol. 3462, pp. 328–341. Springer, Heidelberg (2005)
Google Scholar
Heckathorn, D.: Respondent-driven sampling II: deriving valid population estimates from chain-referral samples of hidden populations. Social Problems 49, 11–34 (2002)
Article Google Scholar
Lovász, L.: Random walks on graphs: A survey. Combinatorics, Paul Erdos is Eighty 2, 1–46 (1993)
Google Scholar
Kemeny, J.G., Snell, J.L.: Finite Markov Chains, pp. 69–72. Springer, Heidelberg (1960)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei, 115, Taiwan
Jhao-Yin Li & Mi-Yen Yeh

Authors

Jhao-Yin Li
View author publications
You can also search for this author in PubMed Google Scholar
Mi-Yen Yeh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Shenzhen Institutes of Advanced Technology (SIAT), Chinese Academy of Sciences, 518055, Shenzhen, China
Joshua Zhexue Huang
Faculty of Engineering and Information Technology, Center for Quantum Computation and Intelligent Systems, Data Sciences and Knowledge Discovery Lab, University of Technology Sydney, 2007, Sydney, NSW, Australia
Longbing Cao
Department of Computer Science and Engineering, University of Minnesota, 55455, Minneapolis, MN, USA
Jaideep Srivastava

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, JY., Yeh, MY. (2011). On Sampling Type Distribution from Heterogeneous Social Networks. In: Huang, J.Z., Cao, L., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 6635. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20847-8_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-20847-8_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20846-1
Online ISBN: 978-3-642-20847-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics