[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Centrality-Based Group Profiling: A Comparative Study in Co-authorship Networks

Published: 01 January 2018 Publication History

Abstract

Group profiling methods aim to construct a descriptive profile for communities in social networks. This task is similar to the traditional cluster labeling task, commonly adopted in document clustering to identify tags which characterize each derived cluster. This similarity encourages the direct application of cluster labeling methods for group profiling problems. However, in group profiling, an important additional information can be leveraged, which is the presence of links among the clustered individuals. This work extends our previous work by incorporating relational information to better describe communities. The proposed approach, so-called Centrality-based Group Profiling approach, makes use of network centrality measures in the selection of nodes for the characterization, i.e., nodes that generalize the content of the observed communities. The use of relational information to select relevant nodes in a community significantly reduces the complexity of the profiling task, at the same time retaining enough representative content to produce a good characterization. Experiments were conducted in a co-authorship network to evaluate different profiling strategies. The results demonstrated the ability of the proposed approach to producing good profiles for the observed groups with both group profiling and standard cluster labeling methods, with a considerably lower computational cost.

References

[1]
Getoor, L., Diehl, C.P.: Link mining: a survey. SIGKDD Explor. Newslett. 7(2), 3–12 (2005)
[2]
Baumes, J., Goldberg, M., Magdon-Ismail, M., Wallace, A.: Discovering hidden groups in communication networks. In: Proceedings of the 2nd NSF/NIJ Symposium on Intelligence and Security Informatics (2004)
[3]
Sun, J., Faloutsos, C., Papadimitriou, S., Yu, P.S.: Graphscope: parameter-free mining of large time-evolving graphs. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 687–696 (2007)
[4]
Tang, L., Liu, H.: Community Detection and Mining in Social Media. Morgan & Claypool, New York (2010)
[5]
Xie, J., Kelley, S., Szymanski, B.K.: Overlapping community detection in networks: the state of the art and comparative study. CoRR abs/1110.5813 (2011)
[6]
Tang, L., Liu, H., Zhang, J., Agarwal, N., Salerno, J.J.: Topic taxonomy adaptation for group profiling. ACM Trans. Knowl. Discov. Data 1(4), 1 (2008)
[7]
Tang, L., Wang, X., Liu, H.: Group profiling for understanding social structures. ACM Trans. Intell. Syst. Technol. 3, 15 (2011)
[8]
Carmel, D., Roitman, H., Zwerdling, N.: Enhancing cluster labeling using wikipedia. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, pp. 139–146. ACM, New York, NY, USA (2009)
[9]
Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/gather: a cluster-based approach to browsing large document collections. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’92, pp. 318–329. ACM, New York, NY, USA (1992)
[10]
Gomes, J.E.A., Prudncio, R., Nascimento, A.: A comparative study of group profiling techniques in co-authorship networks. In: Brazilian Conference on Intelligent Systems (BRACIS 2016) (2016)
[11]
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
[12]
Rossi, R.G., Rezende, S.O.: Building a topic hierarchy using the bag-of-related-words representation. In: Proceedings of the 11th ACM Symposium on Document Engineering, DocEng ’11, pp. 195–204. ACM, New York, NY, USA (2011)
[13]
Kouznetsov, A., Zouaq, A.: Knowledge Management and Acquisition for Smart Systems and Services: 13th Pacific Rim Knowledge Acquisition Workshop, PKAW 2014, Gold Cost, Qld, Australia, December 1–2, 2014. Proceedings, chap. A Comparison of Graph-Based and Statistical Metrics for Learning Domain Keywords, pp. 260–268. Springer International Publishing, Cham (2014)
[14]
Ienco, D., Meo, R.: Towards the automatic construction of conceptual taxonomies. In: Song, I.Y., Eder, J., Nguyen, T. (eds.) Data Warehousing and Knowledge Discovery, Lecture Notes in Computer Science, vol. 5182, pp. 327–336. Springer, Berlin, Heidelberg (2008)
[15]
Role, F., Nadif, M.: Beyond cluster labeling: semantic interpretation of clusters’ contents using a graph representation. Know. Based Syst. 56, 141–155 (2014)
[16]
Boudin, F.: A comparison of centrality measures for graph-based keyphrase extraction. In: Sixth International Joint Conference on Natural Language Processing, pp. 834–838 (2013)
[17]
Rossi, R.G., Marcacini, R.M., Rezende, S.O.: Analysis of domain independent statistical keyword extraction methods for incremental clustering. In: 12th Brazilian Symposium on Neural Networks, pp. 17–37 (2014)
[18]
Blondel, V., Guillaume, J., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 10, 8 (2008)
[19]
Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications (Structural Analysis in the Social Sciences). Cambridge University Press, New York (1994)
[20]
Chintalapudi, S.R., Prasad, M.H.M.K.: A survey on community detection algorithms in large scale real world networks. In: 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 1323–1327 (2015)
[21]
Gomes, J.E.A., Prudncio, R.B.C., Meira, L., Azevedo Filho, A., Nascimento, A.C.A., Oliveira, H.: Profiling for understanding educational social networking. Softw. Eng. Knowl. Eng. (SEKE 2013) (2013)
[22]
Luhn, H.P.: A statistical approach to mechanized encoding and searching of literary information. IBM J. Res. Dev. 1(4), 309–317 (1957)
[23]
Han, E.H., Karypis, G.: Centroid-based document classification: analysis and experimental results. In: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, PKDD ’00, pp. 424–431. Springer, London (2000)
[24]
Ramage, D., Heymann, P., Manning, C.D., Garcia-Molina, H.: Clustering the tagged web. In: Second ACM International Conference on Web Search and Data Mining (WSDM 2009) (2009)
[25]
Wei, T., Lu, Y., Chang, H., Zhou, Q., Bao, X.: A semantic approach for text clustering using wordnet and lexical chains. Expert Syst. Appl. 42(4), 2264–2275 (2015)
[26]
Hulpus, I., Hayes, C., Karnstedt, M., Greene, D.: Unsupervised graph-based topic labelling using dbpedia. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM ’13, pp. 465–474. ACM, New York, NY, USA (2013)
[27]
Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)
[28]
Mei, Q., Cai, D., Zhang, D., Zhai, C.: Topic modeling with network regularization. In: Proceedings of the 17th International Conference on World Wide Web, WWW ’08, pp. 101–110 (2008)
[29]
Treeratpituk, P., Callan, J.: Automatically labeling hierarchical clusters. In: Proceedings of the 2006 International Conference on Digital Government Research, dg.o ’06, pp. 167–176. Digital Government Society of North America (2006)
[30]
Maqbool, O., Babri, H.: Interpreting clustering results through cluster labeling. In: Proceedings of the IEEE Symposium on Emerging Technologies, 2005, pp. 429–434 (2005)
[31]
Popescul, A., Ungar, L.H.: Automatic labeling of document clusters, (2000). In press,  http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=7B8CE8FD896381B0BCBCD51B9080B647?doi=10.1.1.33.141&rep=rep1&type=pdf. Accessed 15 Oct 2017
[32]
Kuhn, A., Ducasse, S., Gírba, T.: Semantic clustering: identifying topics in source code. Inf. Softw. Technol. 49(3), 230–243 (2007)
[33]
Bollen, J., Gonalves, B., Ruan, G., Mao, H.: Happiness is assortative in online social networks. Artif. Life 17(3), 237–251 (2011)
[34]
McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: homophily in social networks. Ann. Rev. Sociol. 27(1), 415–444 (2001)
[35]
Yuan, Y.C., Gay, G.: Homophily of network ties and bonding and bridging social capital in computer-mediated distributed teams. J. Comput. Mediat. Commun. 11(4), 1062–1084 (2006)
[36]
Freeman, L.C.: Centrality in social networks conceptual clarification. Soc. Netw. 1(3), 215–239 (1979)
[37]
Clauset, A., Moore, C., Newman, M.E.J.: Hierarchical structure and the prediction of missing links in networks. Nature 453(7191), 98–101 (2008)
[38]
Mihalcea, R., Tarau, P.: Textrank: Bringing order into texts. In: D. Lin, D. Wu (eds.) Proceedings of EMNLP 2004. Association for Computational Linguistics, Barcelona, Spain, pp. 404–411 (2004)
[39]
Aggarwal, C.C., Zhai, C.X.: Mining Text Data. Springer Publishing Company, New York (2012)
[40]
Barrera, A., Verma, R.: Computational Linguistics and Intelligent Text Processing: 13th International Conference, CICLing 2012, New Delhi, India, March 11–17, 2012, Proceedings, Part II, chap. Combining Syntax and Semantics for Automatic Extractive Single-Document Summarization, pp. 366–377. Springer, Berlin, Heidelberg (2012)
[41]
Fortunato, S., Lancichinetti, A.: Community detection algorithms: a comparative analysis: invited presentation, extended abstract. In: Proceedings of the Fourth International ICST Conference on Performance Evaluation Methodologies and Tools, VALUETOOLS ’09, pp. 27:1–27:2 (2009)
[42]
Bastian, M., Heymann, S., Jacomy, M.: Gephi: An open source software for exploring and manipulating networks (2009)
[43]
Cha, S.H.: Comprehensive survey on distance/similarity measures between probability density functions (2007)
[44]
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques (Morgan Kaufmann Series in Data Management Systems), 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco (2005)
[45]
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
[46]
Bacardit, J., Browne, W., Drugowitsch, J., Bernadó-Mansilla, E., Butz, M.: Learning Classifier Systems: 11th International Workshop, IWLCS 2008, Atlanta, GA, USA, July 13, 2008, and 12th International Workshop, IWLCS 2009, Montreal, QC, Canada, July 9, 2009, Revised Selected Papers. Lecture Notes in Computer Science. Springer Berlin Heidelberg (2010). https://books.google.com.br/books?id=psa6BQAAQBAJ

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image New Generation Computing
New Generation Computing  Volume 36, Issue 1
Jan 2018
87 pages

Publisher

Ohmsha

Japan

Publication History

Published: 01 January 2018

Author Tags

  1. Social network analysis
  2. Communities
  3. Group profiling
  4. Relational information
  5. Centrality

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media