[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Friendship prediction and homophily in social media

Published: 04 June 2012 Publication History

Abstract

Social media have attracted considerable attention because their open-ended nature allows users to create lightweight semantic scaffolding to organize and share content. To date, the interplay of the social and topical components of social media has been only partially explored. Here, we study the presence of homophily in three systems that combine tagging social media with online social networks. We find a substantial level of topical similarity among users who are close to each other in the social network. We introduce a null model that preserves user activity while removing local correlations, allowing us to disentangle the actual local similarity between users from statistical effects due to the assortative mixing of user activity and centrality in the social network. This analysis suggests that users with similar interests are more likely to be friends, and therefore topical similarity measures among users based solely on their annotation metadata should be predictive of social links. We test this hypothesis on several datasets, confirming that social networks constructed from topical similarity capture actual friendship accurately. When combined with topological features, topical similarity achieves a link prediction accuracy of about 92%.

References

[1]
Aiello, L. M., Barrat, A., Cattuto, C., Ruffo, G., and Schifanella, R. 2010. Link creation and profile alignment in the aNobii social network. In Proceedings of the 2nd IEEE International Conference on Social Computing (SocialCom'10). IEEE, Los Alamitos, CA, 249--256.
[2]
Aral, S., Muchnik, L., and Sundararajan, A. 2009. Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proc. Nat. Acad. Sci. 106, 51, 21544--21549.
[3]
Backstrom, L. and Leskovec, J. 2011. Supervised random walks: Predicting and recommending links in social networks. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM'11), ACM, New York.
[4]
Benchettara, N., Kanawati, R., and Rouveirol, C. 2010. Supervised machine learning applied to link prediction in bipartite social networks. In Proceedings of the International Conference on Advances in Social Network Analysis and Mining. IEEE, Los Alamitos, CA, 326--330.
[5]
Caragea, D., Bahirwani, V., Aljandal, W., and H. Hsu, W. 2009. Ontology-based link prediction in the live journal social network. In Proceedings of the 8th Symposium on Abstraction, Reformulation and Approximation (SARA'09).
[6]
Catanzaro, M., Boguñá, M., and Pastor-Satorras, R. 2005. Generation of uncorrelated random scale-free networks. Phys. Rev. E 71, 027103.
[7]
Cattuto, C., Benz, D., Hotho, A., and Stumme, G. 2008. Semantic grounding of tag relatedness in social bookmarking systems. In Proceedings of the 7th International Semantic Web Conference (ISWC'08). Lecture Notes in Computer Science, vol. 5318, Springer, Berlin, 615--631.
[8]
Clauset, A., Moore, C., and Newman, M. 2008. Hierarchical structure and the prediction of missing links in networks. Nature 453, 98--101.
[9]
Crandall, D., Cosley, D., Huttenlocher, D., Kleinberg, J., and Suri, S. 2008. Feedback effects between similarity and social influence in online communities. In Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining (KDD '08), ACM, New York,160--168.
[10]
Dunlavy, D. M., Kolda, T. G., and Acar, E. 2010. Temporal link prediction using matrix and tensor factorizations. arXiv:1005.4006, Cornell University Library.
[11]
Fawcett, T. 2006. An introduction to roc analysis. Pattern Recogn. Lett. 27, 861--874.
[12]
Feldman, R. and Sanger, J. 2006. Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press.
[13]
Getoor, L. and Diehl, C. P. 2005. Link mining: a survey. ACM SIGKDD Explorations Newslett. 7, 2, 3--12.
[14]
Getoor, L., Friedman, N., Koller, D., and Taskar, B. 2003. Learning probabilistic models of link structure. J. Mach. Learn. Res. 3, 679--707.
[15]
Golder, S. and Huberman, B. A. 2006. The structure of collaborative tagging systems. J. Inf. Sci. 32, 2, 198--208.
[16]
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. 2009. The weka data mining software: An update. SIGKDD Explorations Newslett. 11.
[17]
Hasan, M. A., Chaoji, V., Salem, S., and Zaki, M. 2006. Link prediction using supervised learning. In Proceedings of the SDM Workshop on Link Analysis, Counterterrorism and Security.
[18]
Haveliwala, T. H. 2003. Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE Trans. Knowl. Data Eng. 15, 784--796.
[19]
Huan, Z. 2006. Link prediction based on graph topology: The predictive value of the generalized clustering coefficient. In Proceedings of 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (LinkKDD'06). ACM, New York.
[20]
Kashima, H. and Abe, N. 2006. A parameterized probabilistic model of network evolution for supervised link prediction. In Proceedings of the 6th International Conference on Data Mining (ICDM '06), IEEE, Los Alamitos, CA, 340--349.
[21]
Kumar, R., Novak, J., and Tomkins, A. 2006. Structure and evolution of online social networks. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '06). ACM, New York, 611--617.
[22]
Kunegis, J., De Luca, E., and Albayrak, S. 2010. The link prediction problem in bipartite networks. In Computational Intelligence for Knowledge-Based Systems Design, E. Hllermeier et al., Eds., Lecture Notes in Computer Science, vol. 6178, Springer, Berlin, 380--389.
[23]
Leenders, R. 1997. Longitudinal behavior of network structure and actor attributes: Modeling interdependence of contagion and selection. In Evolution of Social Networks, Vol. 1, P. Doreian and F. Stokman, Eds.
[24]
Lerman, K. and Jones, L. 2007. Social browsing on flickr. In Proceedings of International Conference on Weblogs and Social Media (ICWSM'07). http://arxiv.org/abs/cs.HC/0612047.
[25]
Leroy, V., Cambazoglu, B. B., and Bonchi, F. 2010. Cold start link prediction. In Proceedings of the 16th ACM Conference on Knowledge Discovery and Data Mining (SIGKDD'10). ACM, New York.
[26]
Leskovec, J., Backstrom, L., Kumar, R., and Tomkins, A. 2008. Microscopic evolution of social networks. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'08). ACM, New York, 462--470.
[27]
Leskovec, J. and Horvitz, E. 2008. Planetary-scale views on a large instant-messaging network. In Proceedings of the 17th International Conference on World Wide Web (WWW'08). ACM, New York, 915--924.
[28]
Leskovec, J., Huttenlocher, D., and Kleinberg, J. 2010. Predicting positive and negative links in online social networks. In Proceedings of the 19th International Conference on World Wide Web (WWW '10). ACM, New York.
[29]
Li, X., Guo, L., and Zhao, Y. E. 2008. Tag-based social interest discovery. In Proceedings of the 17th International Conference on World Wide Web (WWW'08). ACM, New York, 675--684.
[30]
Liben-Nowell, D. and Kleinberg, J. 2003. The link prediction problem for social networks. In Proceedings of the 12th International Conference on Information and Knowledge Management (CIKM'03). ACM, New York, 556--559.
[31]
Lin, D. 1998. An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning (ICML). J. W. Shavlik, Ed., Morgan Kaufmann, 296--304.
[32]
Lü, L. and Zhou, T. 2009. Role of weak ties in link prediction of complex networks. In Proceedings of the 1st ACM International Workshop on Complex Networks Meet Information and Knowledge Management (CNIKM '09). ACM, New York, 55--58.
[33]
Lü, L. and Zhou, T. 2010. Link prediction in complex networks: A survey. Preprint. http://arxiv.org/abs/1010.0725.
[34]
Markines, B., Cattuto, C., Menczer, F., Benz, D., Hotho, A., and Stumme, G. 2009. Evaluating similarity measures for emergent semantics of social tagging. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). ACM, New York.
[35]
Markines, B. and Menczer, F. 2009. A scalable, collaborative similarity measure for social annotation systems. In Proceedings of the 20th ACM Conference on Hypertext and Hypermedia (HT'09). ACM, New York.
[36]
Markines, B., Roinestad, H., and Menczer, F. 2008. Efficient assembly of social semantic networks. In Proceedings of the 19th ACM Conference on Hypertext and Hypermedia (HT'08). ACM, New York, 149--156.
[37]
Marlow, C., Naaman, M., Boyd, D., and Davis, M. 2006. Ht06, tagging paper, taxonomy, flickr, academic article to read. In Proceedings of the 17th Conference on Hypertext and Hypermedia (HYPERTEXT '06). ACM, New York, 31--40.
[38]
Maslov, S., Sneppen, K., and Zaliznyak, A. 2004. Detection of topological patterns in complex networks: Correlation profile of the Internet. Physica A 333, 529--540.
[39]
McPherson, M., Smith-Lovin, L., and Cook, J. 2001. Birds of a feather: Homophily in social networks. Ann. Rev. Sociology 27, 415--444.
[40]
Mislove, A., Koppula, H. S., Gummadi, K. P., Druschel, P., and Bhattacharjee, B. 2008. Growth of the Flickr social network. In Proceedings of the 1st Workshop on Online Social Networks (WOSP '08). ACM, New York, 25--30.
[41]
Mislove, A., Marcon, M., Gummadi, K. P., Druschel, P., and Bhattacharjee, B. 2007. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (IMC '07). ACM, New York, 29--42.
[42]
Mislove, A., Viswanath, B., Gummadi, K. P., and Druschel, P. 2010. You are who you know: Inferring user profiles in online social networks. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. ACM, New York, 251--260.
[43]
Molloy, M. and Reed, B. 1995. A critical point for random graphs with a given degree sequence. Random Struct. Alg. 6, 161--179.
[44]
Murata, T. and Moriyasu, S. 2007. Link prediction of social networks based on weighted proximity measures. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI '07). IEEE, Los Alamitos, CA, 85--88.
[45]
Newman, M. E. J. 2001. Clustering and preferential attachment in growing networks. Phys. Rev. E 64, 025102.
[46]
Newman, M. E. J. 2002. Assortative mixing in networks. Phys. Rev. Lett. 89, 208701.
[47]
Newman, M. E. J. 2003. Mixing patterns in networks. Phys. Rev. E 67, 026126.
[48]
Newman, M. E. J. and Park, J. 2003. Why social networks are different from other types of networks. Phys. Rev. E 68, 3, 036122.
[49]
Pastor-Satorras, R., Vázquez, A., and Vespignani, A. 2001. Dynamical and correlation properties of the Internet. Phys. Rev. Lett. 87, 258701.
[50]
Popescul, A., Popescul, R., and Ungar, L. H. 2003. Structural logistic regression for link analysis. In Proceedings of the 2nd International Workshop on Multirelational Data Mining.
[51]
Prieur, C., Cardon, D., Beuscart, J.-S., Pissard, N., and Pons, P. 2008. The strength of weak cooperation: A case study on flickr. Tech. rep. arXiv:0802.2317v1, CoRR.
[52]
Salton, G. 1989. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Boston, MA.
[53]
Santos-Neto, E., Condon, D., Andrade, N., Iamnitchi, A., and Ripeanu, M. 2009. Individual and social behavior in tagging systems. In Proceedings of the 20th ACM Conference on Hypertext and Hypermedia (HT'09). C. Cattuto et al. Eds., ACM, New York,183--192.
[54]
Schifanella, R., Barrat, A., Cattuto, C., Markines, B., and Menczer, F. 2010. Folks in folksonomies: social link prediction from shared metadata. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM '10). ACM, New York, 271--280.
[55]
Serrano, M. A. and Boguñá, M. 2005. Tuning clustering in random networks with arbitrary degree distributions. Phys. Rev. E 72, 036133.
[56]
Shalizi, C. and Thomas, A. 2010. Homophily and contagion are generically confounded in observational social network studies. Preprint, arxiv:1004.4704.
[57]
Stäger, M., Lukowicz, P., and Troster, G. 2006. Dealing with class skew in context recognition. In Proceedings of the 26th IEEE International Conference on Distributed Computing Systems Workshops. IEEE, Los Alamitos, CA, 58.
[58]
Szell, M., Lambiotte, R., and Thurner, S. 2010. Multirelational organization of large-scale social networks in an online world. Proc. Nat. Acad. Sci. 107, 31, 13636--13641.
[59]
Taskar, B.,Wong, M. F., Abbeel, P., and Koller, D. 2003. Link prediction in relational data. In Proceedings of the Neural Information Processing Systems Conference (NIPS'03).
[60]
van Zwol, R. 2007. Flickr: Who is looking? In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI '07). IEEE, Los Alamitos, CA, 184--190.
[61]
Vázquez, A., Pastor-Satorras, R., and Vespignani, A. 2002. Large-scale topological and dynamical properties of the Internet. Phys. Rev. E 65, 066130.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on the Web
ACM Transactions on the Web  Volume 6, Issue 2
May 2012
137 pages
ISSN:1559-1131
EISSN:1559-114X
DOI:10.1145/2180861
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2012
Accepted: 01 October 2011
Revised: 01 January 2011
Received: 01 July 2010
Published in TWEB Volume 6, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Social media
  2. collaborative tagging
  3. folksonomies
  4. homophily
  5. link prediction
  6. maximum Information path
  7. social network
  8. topical similarity

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)300
  • Downloads (Last 6 weeks)40
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Link prediction based on spectral analysisPLOS ONE10.1371/journal.pone.028738519:1(e0287385)Online publication date: 2-Jan-2024
  • (2024)Link Prediction by Combining Local Structure Similarity With Node Behavior SynchronizationIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.333529511:3(3816-3825)Online publication date: Jun-2024
  • (2024)Predicting Higher Order Links in Social Interaction NetworksIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.329307511:2(2796-2806)Online publication date: Apr-2024
  • (2024)Link Prediction in Dynamic Real-Life Friendship Networks: A Case Study2024 4th International Conference on Computer Communication and Artificial Intelligence (CCAI)10.1109/CCAI61966.2024.10603341(137-141)Online publication date: 24-May-2024
  • (2024)Line Graph is the Key: An Exploration of Line Graph on Link Prediction with Social Networks2024 11th International Conference on Behavioural and Social Computing (BESC)10.1109/BESC64747.2024.10780611(1-7)Online publication date: 16-Aug-2024
  • (2024)Food choice mimicry on a large university campusPNAS Nexus10.1093/pnasnexus/pgae517Online publication date: 16-Nov-2024
  • (2024)Inconsistency among evaluation metrics in link predictionPNAS Nexus10.1093/pnasnexus/pgae4983:11Online publication date: 6-Nov-2024
  • (2024)A similarity-based assortativity measure for complex networksJournal of Complex Networks10.1093/comnet/cnae01012:2Online publication date: 13-Mar-2024
  • (2024)Comparing discriminating abilities of evaluation metrics in link predictionJournal of Physics: Complexity10.1088/2632-072X/ad46be5:2(025014)Online publication date: 16-May-2024
  • (2024)Social media and affective polarization: does Facebook news use fuel political in- and out-group affect in a multi-party context?Acta Politica10.1057/s41269-023-00324-4Online publication date: 10-Jan-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media