[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3184558.3191567acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article
Public Access

Towards Quantifying Sampling Bias in Network Inference

Published: 23 April 2018 Publication History

Abstract

Relational inference leverages relationships between entities and links in a network to infer information about the network from a small sample. This method is often used when global information about the network is not available or difficult to obtain. However, how reliable is inference from a small labeled sample How should the network be sampled, and what effect does it have on inference error How does the structure of the network impact the sampling strategy We address these questions by systematically examining how network sampling strategy and sample size affect accuracy of relational inference in networks. To this end, we generate a family of synthetic networks where nodes have a binary attribute and a tunable level of homophily. As expected, we find that in heterophilic networks, we can obtain good accuracy when only small samples of the network are initially labeled, regardless of the sampling strategy. Surprisingly, this is not the case for homophilic networks, and sampling strategies that work well in heterophilic networks lead to large inference errors. This finding suggests that the impact of network structure on relational classification is more complex than previously thought.

References

[1]
Nesreen K Ahmed, Jennifer Neville, and Ramana Rao Kompella. 2012. Network Sampling Designs for Relational Classification. In ICWSM.
[2]
Rowland Atkinson and John Flint. 2001. Accessing hidden and hard-to-reach populations: Snowball research strategies. Social research update 33, 1 (2001), 1--4.
[3]
Konstantin Avrachenkov, Bruno Ribeiro, and Jithin K Sreedharan. 2016. Inference in OSNs via Lightweight Partial Crawls. In Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science. ACM, 165--177.
[4]
Albert-László Barabási. 2016. Network science. Cambridge university press.
[5]
Lise Getoor and Ben Taskar. 2007. Introduction to statistical relational learning. MIT press.
[6]
Leo A Goodman. 1961. Snowball sampling. The annals of mathematical statistics (1961), 148--170.
[7]
Fariba Karimi, Mathieu Génois, Claudia Wagner, Philipp Singer, and Markus Strohmaier. 2017. Visibility of minorities in social networks. arXiv:1702.00150 (2017).
[8]
Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 631--636.
[9]
Sofus A. Macskassy and Foster Provost. 2007. Classification in Networked Data: A Toolkit and a Univariate Case Study. J. Mach. Learn. Res. 8 (May 2007), 935--983. http://dl.acm.org/citation.cfmid=1248659.1248693
[10]
Flaviano Morone and Hernán A Makse. 2015. Influence maximization in complex networks through optimal percolation. Nature 524, 7563 (2015), 65.
[11]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.
[12]
Prithviraj Sen, Galileo Mark Namata, Mustafa Bilgic, Lise Getoor, Brian Gallagher, and Tina Eliassi-Rad. 2008. Collective Classification in Network Data. AI Magazine 29, 3 (2008), 93--106. http://www.cs.iit.edu/~ml/pdfs/sen-aimag08.pdf
[13]
Amanda L Traud, Peter J Mucha, and Mason A Porter. 2012. Social structure of Facebook networks. Physica A: Statistical Mechanics and its Applications 391, 16 (2012), 4165--4180.
[14]
Claudia Wagner, Philipp Singer, Fariba Karimi, Jürgen Pfeffer, and Markus Strohmaier. 2017. Sampling from Social Networks with Attributes. In Proceedings of the 26th International Conference on World Wide Web (WWW '17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1181--1190.
[15]
Jiasen Yang, Bruno Ribeiro, and Jennifer Neville. 2017. Should We Be Confident in Peer Effects Estimated From Social Network Crawls. In Proceedings of the Eleventh International Conference on Web and Social Media, ICWSM 2017, Montréal, Québec, Canada, May 15--18, 2017. 708--711. https://aaai.org/ocs/index.php/ ICWSM/ICWSM17/paper/view/15696
[16]
Jiasen Yang, Bruno Ribeiro, and Jennifer Neville. 2017. Stochastic Gradient Descent for Relational Logistic Regression via Partial Network Crawls. arXiv preprint arXiv:1707.07716 (2017).
[17]
Giselle Zeno and Jennifer Neville. 2016. Investigating the impact of graph structure and attribute correlation on collective classification performance. (2016).

Cited By

View all
  • (2021)An adaptive node embedding framework for multiplex networksIntelligent Data Analysis10.3233/IDA-19506525:2(483-503)Online publication date: 1-Jan-2021
  • (2021)Explaining classification performance and bias via network structure and sampling techniqueApplied Network Science10.1007/s41109-021-00394-36:1Online publication date: 21-Oct-2021

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '18: Companion Proceedings of the The Web Conference 2018
April 2018
2023 pages
ISBN:9781450356404
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 23 April 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bias
  2. relational classification
  3. sampling networks

Qualifiers

  • Research-article

Funding Sources

  • ARO
  • AFOSR

Conference

WWW '18
Sponsor:
  • IW3C2
WWW '18: The Web Conference 2018
April 23 - 27, 2018
Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,068 of 6,946 submissions, 15%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)124
  • Downloads (Last 6 weeks)14
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2021)An adaptive node embedding framework for multiplex networksIntelligent Data Analysis10.3233/IDA-19506525:2(483-503)Online publication date: 1-Jan-2021
  • (2021)Explaining classification performance and bias via network structure and sampling techniqueApplied Network Science10.1007/s41109-021-00394-36:1Online publication date: 21-Oct-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media