More Web Proxy on the site http://driver.im/

research-article

Defining and evaluating network communities based on ground-truth

Authors:

Jure LeskovecAuthors Info & Claims

MDS '12: Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics

Article No.: 3, Pages 1 - 8

https://doi.org/10.1145/2350190.2350193

Published: 12 August 2012 Publication History

Abstract

Nodes in real-world networks, such as social, information or technological networks, organize into communities where edges appear with high concentration among the members of the community. Identifying communities in networks has proven to be a challenging task mainly due to a plethora of definitions of a community, intractability of algorithms, issues with evaluation and the lack of a reliable gold-standard ground-truth.

We study a set of 230 large social, collaboration and information networks where nodes explicitly define group memberships. We use these groups to define the notion of ground-truth communities. We then propose a methodology which allows us to compare and quantitatively evaluate different definitions of network communities on a large scale. We choose 13 commonly used definitions of network communities and examine their quality, sensitivity and robustness. We show that the 13 definitions naturally group into four classes. We find that two of these definitions, Conductance and Triad-participation-ratio, consistently give the best performance in identifying ground-truth communities.

References

[1]

R. Andersen, F. Chung, and K. Lang. Local graph partitioning using PageRank vectors. In FOCS '06, pages 475--486, 2006.

Digital Library

[2]

L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. In KDD '06, pages 44--54, 2006.

Digital Library

[3]

L. Danon, J. Duch, A. Diaz-Guilera, and A. Arenas. Comparing community structure identification. Journal of Stat. Mech.: Theory and Experiment, 29(09): P09008, 2005.

[4]

I. Dhillon, Y. Guan, and B. Kulis. Weighted graph cuts without eigenvectors: A multilevel approach. IEEE Transactions on PAMI, 29(11): 1944--1957, 2007.

Digital Library

[5]

S. L. Feld. The focused organization of social ties. American Journal of Sociology, 86(5): 1015--1035, 1981.

[6]

G. Flake, S. Lawrence, and C. Giles. Efficient identification of web communities. In KDD '00, pages 150--160, 2000.

Digital Library

[7]

S. Fortunato. Community detection in graphs. Physics Reports, 486(3--5): 75--174, 2010.

[8]

S. Fortunato and M. Barthélemy. Resolution limit in community detection. PNAS, 104(1): 36--41, 2007.

[9]

M. Girvan and M. Newman. Community structure in social and biological networks. PNAS, 99(12): 7821--7826, 2002.

[10]

M. S. Granovetter. The strength of weak ties. American Journal of Sociology, 78: 1360--1380, 1973.

[11]

R. Guimerà, M. Sales-Pardo, and L. Amaral. Modularity from fluctuations in random graphs and complex networks. Physical Review E, 70: 025101, 2004.

[12]

S. Kairam, D. Wang, and J. Leskovec. The life and death of online groups: Predicting group growth and longevity. In WSDM '12, 2012.

Digital Library

[13]

G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20: 359--392, 1998.

Digital Library

[14]

H. W. Kuhn. The Hungarian method for the assignment problem. Naval Research Logistic Quarterly, 2: 83--97, 1955.

[15]

J. Leskovec, L. Adamic, and B. Huberman. The dynamics of viral marketing. ACM TWEB, 1(1), 2007.

Digital Library

[16]

J. Leskovec, K. Lang, and M. Mahoney. Empirical comparison of algorithms for network community detection. In WWW '10, 2010.

Digital Library

[17]

J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6, 2009.

[18]

A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In IMC '07, pages 29--42, 2007.

Digital Library

[19]

M. Newman. Modularity and community structure in networks. PNAS, 103(23): 8577--8582, 2006.

[20]

M. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, 2004.

[21]

G. Palla, I. Derényi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043): 814--818, 2005.

[22]

F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi. Defining and identifying communities in networks. PNAS, 101(9): 2658--2663, 2004.

[23]

S. Schaeffer. Graph clustering. Computer Science Review, 1(1): 27--64, 2007.

Digital Library

[24]

J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transcations on PAMI, 2000.

Digital Library

[25]

D. Spielman and S.-H. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In STOC '04, pages 81--90, 2004.

Digital Library

[26]

D. Watts and S. Strogatz. Collective dynamics of small-world networks. Nature, 393: 440--442, 1998.

[27]

W. Zachary. An information flow model for conflict and fission in small groups. J. of Anthropological Research, 1977.

Cited By

Fang PLi ZKhan ALuo SWang FShi ZFeng D(2025)Information-Oriented Random Walks and Pipeline Optimization for Distributed Graph EmbeddingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.342433337:1(408-422)Online publication date: Jan-2025
https://doi.org/10.1109/TKDE.2024.3424333
Ahmad WWang B(2025)A learning-based influence maximization framework for complex networks via K-core hierarchies and reinforcement learningExpert Systems with Applications10.1016/j.eswa.2024.125393259(125393)Online publication date: Jan-2025
https://doi.org/10.1016/j.eswa.2024.125393
Kumar CDar SBatchu CSingamaneni KPanguluri V(2025)Peer Collaboration in DBLP Using Graph Convolutional NetworkSN Computer Science10.1007/s42979-024-03615-56:1Online publication date: 11-Jan-2025
https://doi.org/10.1007/s42979-024-03615-5
Show More Cited By

Index Terms

Defining and evaluating network communities based on ground-truth
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Overlapping community detection at scale: a nonnegative matrix factorization approach
WSDM '13: Proceedings of the sixth ACM international conference on Web search and data mining

Network communities represent basic structures for understanding the organization of real-world networks. A community (also referred to as a module or a cluster) is typically thought of as a group of nodes with more connections amongst its members than ...
Structure and Overlaps of Ground-Truth Communities in Networks
Special Issue on Linking Social Granularity and Functions

One of the main organizing principles in real-world networks is that of network communities, where sets of nodes organize into densely linked clusters. Even though detection of such communities is of great interest, understanding the structure ...
Defining and evaluating network communities based on ground-truth

Nodes in real-world networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task due to a plethora of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MDS '12: Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics

August 2012

103 pages

ISBN:9781450315463

DOI:10.1145/2350190

Conference Chairs:
Ying Ding
Indiana University, Indiana
,
Jiawei Han
University of Illinois at Urbana-Champaign
,
Jie Tang
Tsinghua University, China
,
Philip S. Yu
University of Illinois at Chicago

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 August 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

KDD '12

Sponsor:

KDD '12: The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 12 - 16, 2012

Beijing, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

320
Total Citations
View Citations
1,415
Total Downloads

Downloads (Last 12 months)226
Downloads (Last 6 weeks)32

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fang PLi ZKhan ALuo SWang FShi ZFeng D(2025)Information-Oriented Random Walks and Pipeline Optimization for Distributed Graph EmbeddingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.342433337:1(408-422)Online publication date: Jan-2025
https://doi.org/10.1109/TKDE.2024.3424333
Ahmad WWang B(2025)A learning-based influence maximization framework for complex networks via K-core hierarchies and reinforcement learningExpert Systems with Applications10.1016/j.eswa.2024.125393259(125393)Online publication date: Jan-2025
https://doi.org/10.1016/j.eswa.2024.125393
Kumar CDar SBatchu CSingamaneni KPanguluri V(2025)Peer Collaboration in DBLP Using Graph Convolutional NetworkSN Computer Science10.1007/s42979-024-03615-56:1Online publication date: 11-Jan-2025
https://doi.org/10.1007/s42979-024-03615-5
Ono HMinami KHino H(2025)When should we use top coding in locally private estimation?International Journal of Information Security10.1007/s10207-024-00942-924:1Online publication date: 1-Feb-2025
https://dl.acm.org/doi/10.1007/s10207-024-00942-9
Uludağlı MOğuz K(2024)From attributes to communities: a novel approach in social network generationPeerJ Computer Science10.7717/peerj-cs.248310(e2483)Online publication date: 22-Nov-2024
https://doi.org/10.7717/peerj-cs.2483
Wei JYang ZLuo QZhang YQin LZhang W(2024)High-Order Local Clustering on HypergraphsICST Transactions on Scalable Information Systems10.4108/eetsis.743111:6Online publication date: 15-Nov-2024
https://doi.org/10.4108/eetsis.7431
Wang JWang KLin XZhang WZhang Y(2024)Efficient Unsupervised Community Search with Pre-Trained Graph TransformerProceedings of the VLDB Endowment10.14778/3665844.366585317:9(2227-2240)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.14778/3665844.3665853
Tang DWang JChen RWang LYu WZhou JLi K(2024)XGNN: Boosting Multi-GPU GNN Training via Global GNN Memory StoreProceedings of the VLDB Endowment10.14778/3641204.364121917:5(1105-1118)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.14778/3641204.3641219
Prokop PDráždilová PPlatoš J(2024)Overlapping community detection in weighted networks via hierarchical clusteringPLOS ONE10.1371/journal.pone.031259619:10(e0312596)Online publication date: 28-Oct-2024
https://doi.org/10.1371/journal.pone.0312596
Cau EMorini VRossetti G(2024)Trends and topics: Characterizing echo chambers’ topological stability and in-group attitudesPLOS Complex Systems10.1371/journal.pcsy.00000081:2(e0000008)Online publication date: 3-Oct-2024
https://doi.org/10.1371/journal.pcsy.0000008
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents