[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2350190.2350193acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

Defining and evaluating network communities based on ground-truth

Published: 12 August 2012 Publication History

Abstract

Nodes in real-world networks, such as social, information or technological networks, organize into communities where edges appear with high concentration among the members of the community. Identifying communities in networks has proven to be a challenging task mainly due to a plethora of definitions of a community, intractability of algorithms, issues with evaluation and the lack of a reliable gold-standard ground-truth.
We study a set of 230 large social, collaboration and information networks where nodes explicitly define group memberships. We use these groups to define the notion of ground-truth communities. We then propose a methodology which allows us to compare and quantitatively evaluate different definitions of network communities on a large scale. We choose 13 commonly used definitions of network communities and examine their quality, sensitivity and robustness. We show that the 13 definitions naturally group into four classes. We find that two of these definitions, Conductance and Triad-participation-ratio, consistently give the best performance in identifying ground-truth communities.

References

[1]
R. Andersen, F. Chung, and K. Lang. Local graph partitioning using PageRank vectors. In FOCS '06, pages 475--486, 2006.
[2]
L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. In KDD '06, pages 44--54, 2006.
[3]
L. Danon, J. Duch, A. Diaz-Guilera, and A. Arenas. Comparing community structure identification. Journal of Stat. Mech.: Theory and Experiment, 29(09): P09008, 2005.
[4]
I. Dhillon, Y. Guan, and B. Kulis. Weighted graph cuts without eigenvectors: A multilevel approach. IEEE Transactions on PAMI, 29(11): 1944--1957, 2007.
[5]
S. L. Feld. The focused organization of social ties. American Journal of Sociology, 86(5): 1015--1035, 1981.
[6]
G. Flake, S. Lawrence, and C. Giles. Efficient identification of web communities. In KDD '00, pages 150--160, 2000.
[7]
S. Fortunato. Community detection in graphs. Physics Reports, 486(3--5): 75--174, 2010.
[8]
S. Fortunato and M. Barthélemy. Resolution limit in community detection. PNAS, 104(1): 36--41, 2007.
[9]
M. Girvan and M. Newman. Community structure in social and biological networks. PNAS, 99(12): 7821--7826, 2002.
[10]
M. S. Granovetter. The strength of weak ties. American Journal of Sociology, 78: 1360--1380, 1973.
[11]
R. Guimerà, M. Sales-Pardo, and L. Amaral. Modularity from fluctuations in random graphs and complex networks. Physical Review E, 70: 025101, 2004.
[12]
S. Kairam, D. Wang, and J. Leskovec. The life and death of online groups: Predicting group growth and longevity. In WSDM '12, 2012.
[13]
G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20: 359--392, 1998.
[14]
H. W. Kuhn. The Hungarian method for the assignment problem. Naval Research Logistic Quarterly, 2: 83--97, 1955.
[15]
J. Leskovec, L. Adamic, and B. Huberman. The dynamics of viral marketing. ACM TWEB, 1(1), 2007.
[16]
J. Leskovec, K. Lang, and M. Mahoney. Empirical comparison of algorithms for network community detection. In WWW '10, 2010.
[17]
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6, 2009.
[18]
A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In IMC '07, pages 29--42, 2007.
[19]
M. Newman. Modularity and community structure in networks. PNAS, 103(23): 8577--8582, 2006.
[20]
M. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, 2004.
[21]
G. Palla, I. Derényi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043): 814--818, 2005.
[22]
F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi. Defining and identifying communities in networks. PNAS, 101(9): 2658--2663, 2004.
[23]
S. Schaeffer. Graph clustering. Computer Science Review, 1(1): 27--64, 2007.
[24]
J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transcations on PAMI, 2000.
[25]
D. Spielman and S.-H. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In STOC '04, pages 81--90, 2004.
[26]
D. Watts and S. Strogatz. Collective dynamics of small-world networks. Nature, 393: 440--442, 1998.
[27]
W. Zachary. An information flow model for conflict and fission in small groups. J. of Anthropological Research, 1977.

Cited By

View all
  • (2025)Information-Oriented Random Walks and Pipeline Optimization for Distributed Graph EmbeddingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.342433337:1(408-422)Online publication date: Jan-2025
  • (2025)A learning-based influence maximization framework for complex networks via K-core hierarchies and reinforcement learningExpert Systems with Applications10.1016/j.eswa.2024.125393259(125393)Online publication date: Jan-2025
  • (2025)Peer Collaboration in DBLP Using Graph Convolutional NetworkSN Computer Science10.1007/s42979-024-03615-56:1Online publication date: 11-Jan-2025
  • Show More Cited By

Index Terms

  1. Defining and evaluating network communities based on ground-truth

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MDS '12: Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics
    August 2012
    103 pages
    ISBN:9781450315463
    DOI:10.1145/2350190
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 August 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. community scores
    2. network communities
    3. social and information networks

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    KDD '12
    Sponsor:

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)226
    • Downloads (Last 6 weeks)32
    Reflects downloads up to 15 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Information-Oriented Random Walks and Pipeline Optimization for Distributed Graph EmbeddingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.342433337:1(408-422)Online publication date: Jan-2025
    • (2025)A learning-based influence maximization framework for complex networks via K-core hierarchies and reinforcement learningExpert Systems with Applications10.1016/j.eswa.2024.125393259(125393)Online publication date: Jan-2025
    • (2025)Peer Collaboration in DBLP Using Graph Convolutional NetworkSN Computer Science10.1007/s42979-024-03615-56:1Online publication date: 11-Jan-2025
    • (2025)When should we use top coding in locally private estimation?International Journal of Information Security10.1007/s10207-024-00942-924:1Online publication date: 1-Feb-2025
    • (2024)From attributes to communities: a novel approach in social network generationPeerJ Computer Science10.7717/peerj-cs.248310(e2483)Online publication date: 22-Nov-2024
    • (2024)High-Order Local Clustering on HypergraphsICST Transactions on Scalable Information Systems10.4108/eetsis.743111:6Online publication date: 15-Nov-2024
    • (2024)Efficient Unsupervised Community Search with Pre-Trained Graph TransformerProceedings of the VLDB Endowment10.14778/3665844.366585317:9(2227-2240)Online publication date: 1-May-2024
    • (2024)XGNN: Boosting Multi-GPU GNN Training via Global GNN Memory StoreProceedings of the VLDB Endowment10.14778/3641204.364121917:5(1105-1118)Online publication date: 1-Jan-2024
    • (2024)Overlapping community detection in weighted networks via hierarchical clusteringPLOS ONE10.1371/journal.pone.031259619:10(e0312596)Online publication date: 28-Oct-2024
    • (2024)Trends and topics: Characterizing echo chambers’ topological stability and in-group attitudesPLOS Complex Systems10.1371/journal.pcsy.00000081:2(e0000008)Online publication date: 3-Oct-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media