[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

A hybrid clustering approach for link prediction in heterogeneous information networks

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In recent years, researchers from academic and industrial fields have become increasingly interested in social network data to extract meaningful information. This information is used in applications such as link prediction between people groups, community detection, protein module identification, etc. Therefore, the clustering technique has emerged as a solution to finding similarities between social network members. Recently, in most graph clustering solutions, the structural similarity of nodes is combined with their attribute similarity. The results of these solutions indicate that the graph's topological structure is more important. Since most social networks are sparse, these solutions often suffer from insufficient use of node features. This paper proposes a hybrid clustering approach as an application for link prediction in heterogeneous information networks (HINs). In our approach, an adjacency vector is determined for each node until, in this vector, the weight of the direct edge or the weight of the shortest communication path among every pair of nodes is considered. A similarity metric is presented that calculates similarity using the direct edge weight between two nodes and the correlation between their adjacency vectors. Finally, we evaluated the effectiveness of our proposed method using DBLP, Political blogs, and Citeseer datasets under entropy, density, purity, and execution time metrics. The simulation results demonstrate that while maintaining the cluster density significantly reduces the entropy and the execution time compared with the other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. http://www-personal.umich.edu/~mejn/netdata/.

  2. https://www.aminer.org/aminernetwork.

  3. http://konect.cc/files/download.tsv.citeseer.tar.bz2.

References

  1. Aggarwal CC (ed) (2011) Social Network Data Analytics. Springer, US, Boston, MA

    MATH  Google Scholar 

  2. Nawaz W et al (2015) Intra graph clustering using collaborative similarity measure. Distrib Parallel Databases 33(4):583–603. https://doi.org/10.1007/s10619-014-7170-x

    Article  MathSciNet  Google Scholar 

  3. Skabar A (2017) Clustering mixed-attribute data using random walk. Procedia Comput Sci 108:988–997. https://doi.org/10.1016/j.procs.2017.05.083

    Article  Google Scholar 

  4. Roh G-P, Hwang S-W (2011) Online clustering algorithms for semantic-rich network trajectories. J Comput Sci Eng JCSE 5(4):346–353. https://doi.org/10.5626/jcse.2011.5.4.346

    Article  Google Scholar 

  5. Tian Y, Hankins RA and Patel JM (2008) Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data - SIGMOD '08. New York, New York, USA: ACM Press.

  6. Fortunato S. Hric D (2016) Community detection in networks: a user guide. arXiv [physics. Soc-ph]. Available at: http://arxiv.org/abs/1608.00163.

  7. Zhou Y, Cheng H, Yu JX (2009) Graph clustering based on structural/attribute similarities. Proc VLDB Endow 2(1):718–729. https://doi.org/10.14778/1687627.1687709

    Article  Google Scholar 

  8. Cheng H, Zhou Y, Yu JX (2011) Clustering large attributed graphs: a balance between structural and attribute similarities. ACM Trans Knowl Discov Data (TKDD) 5(2):1–33

    Article  MathSciNet  Google Scholar 

  9. Sun Y et al (2011) Pathsim: meta path-based top-k similarity search in heterogeneous information networks. Proc VLDB Endow 4:992–1003

    Article  Google Scholar 

  10. Shi C et al (2014) HeteSim: a general framework for relevance measure in heterogeneous networks. IEEE Trans Knowl Data Eng 26(10):2479–2492. https://doi.org/10.1109/tkde.2013.2297920

    Article  Google Scholar 

  11. Li X et al. (2017) Semi-supervised clustering in attributed heterogeneous information networks. In: Proceedings of the 26th International Conference on World Wide Web. Republic and Canton of Geneva, Switzerland: International World Wide Web Conferences Steering Committee.

  12. Zhou H et al (2017) A graph clustering method for community detection in complex networks. Physica A 469:551–562. https://doi.org/10.1016/j.physa.2016.11.015

    Article  Google Scholar 

  13. Yang J, McAuley J and Leskovec J (2014) Community detection in networks with Node Attributes. arXiv [cs.SI]. Available at: http://arxiv.org/abs/1401.7267.

  14. Lu J, Gong Z, Lin X (2017) A novel and fast SimRank algorithm. IEEE Trans Knowl Data Eng 29(3):572–585. https://doi.org/10.1109/tkde.2016.2626282

    Article  Google Scholar 

  15. Shakibian H, Moghadam Charkari N (2017) Mutual information model for link prediction in heterogeneous complex networks. Sci Rep. https://doi.org/10.1038/srep44981

    Article  Google Scholar 

  16. Bai L et al (2017) Fast graph clustering with a new description model for community detection. Inf Sci. https://doi.org/10.1016/j.ins.2017.01.026

    Article  Google Scholar 

  17. Huang X, Cheng H, Yu JX (2015) Dense community detection in multi-valued attributed networks. Inf Sci 314:77–99. https://doi.org/10.1016/j.ins.2015.03.075

    Article  MATH  Google Scholar 

  18. Li X et al (2022) SCHAIN-IRAM: an efficient and effective semi-supervised clustering algorithm for attributed heterogeneous information networks. IEEE Trans Knowl Data Eng 34(4):1980–1992. https://doi.org/10.1109/tkde.2020.2997938

    Article  Google Scholar 

  19. Bobadilla J, Ortega F, Hernando A, Gutiérrez A (2013) Recommender systems survey. Knowl Based Syst 46:109–132

    Article  Google Scholar 

  20. Lei C, Ruan J (2013) A novel link prediction algorithm for reconstructing protein–protein interaction networks by topological similarity. Bioinformatics 29(3):355–364

    Article  MathSciNet  Google Scholar 

  21. Rezaeipanah A, Ahmadi G, Sechin Matoori S (2020) A classification approach to link prediction in multiplex online ego-social networks. Soc Netw Anal Min 10(1):27

    Article  Google Scholar 

  22. Ghorbanzadeh H et al (2021) A hybrid method of link prediction in directed graphs. Expert Syst Appl 165:113896. https://doi.org/10.1016/j.eswa.2020.113896

    Article  Google Scholar 

  23. Zareie A, Sakellariou R (2020) Similarity-based link prediction in social networks using latent relationships between the users. Sci Rep 10(1):20137. https://doi.org/10.1038/s41598-020-76799-4

    Article  Google Scholar 

  24. Wang X et al (2021) Link prediction in heterogeneous information networks: an improved deep graph convolution approach. Decis Support Syst 141:113448. https://doi.org/10.1016/j.dss.2020.113448

    Article  Google Scholar 

  25. Jin W, Jung J, Kang U (2019) Supervised and extended restart in random walks for ranking and link prediction in networks. PloS one 14(3):e0213857. https://doi.org/10.1371/journal.pone.0213857

    Article  Google Scholar 

  26. Berahmand K et al (2022) A new attributed graph clustering by using label propagation in complex networks. J King Saud Univ Comput Inf Sci 34(5):1869–1883. https://doi.org/10.1016/j.jksuci.2020.08.013

    Article  Google Scholar 

  27. Agrawal S, Patel A (2021) SAG cluster: an unsupervised graph clustering based on collaborative similarity for community detection in complex networks. Physica A 563:125459. https://doi.org/10.1016/j.physa.2020.125459

    Article  Google Scholar 

  28. Kumar A et al (2020) Link prediction in complex networks based on significance of higher-order path index (SHOPI). Physica A 545:123790. https://doi.org/10.1016/j.physa.2019.123790

    Article  Google Scholar 

  29. Kumar A et al (2019) Level-2 node clustering coefficient-based link prediction. Appl Intell 49(7):2762–2779. https://doi.org/10.1007/s10489-019-01413-8

    Article  Google Scholar 

  30. Ghasemi S, Zarei A (2022) Improving link prediction in social networks using local and global features: a clustering-based approach. Prog Artif Intell 11(1):79–92. https://doi.org/10.1007/s13748-021-00261-3

    Article  Google Scholar 

  31. Lande D et al (2020) Link prediction of scientific collaboration networks based on information retrieval. World Wide Web 23(4):2239–2257. https://doi.org/10.1007/s11280-019-00768-9

    Article  Google Scholar 

  32. Wei H, Xiong G, Wei Q, Cao W, Li X (2023) Structure-aware attributed heterogeneous network embedding. Knowl Inf Syst 65(4):1769–1785

    Article  Google Scholar 

  33. Berahmand K, Mohammadi M, Faroughi A and Mohammadiani RP (2022) A novel method of spectral clustering in attributed networks by constructing parameter-free affinity matrix. Cluster Comput 1–20

  34. Zhao W, Pu S (2021) Collaboration prediction in heterogeneous academic network with dynamic structure and topic. Knowl Inf Syst 63(8):2053–2074

    Article  Google Scholar 

  35. Li W, Li T, Berahmand K (2023) An effective link prediction method in multiplex social networks using local random walk towards dependable pathways. J Comb Optim 45(1):31

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

ZSS was involved in conceptualization, data curation, formal analysis, methodology, software, validation, writing—original draft. ME helped in conceptualization, data curation, supervision. MG-A contributed to conceptualization, data curation, supervision. BM helped in conceptualization, writing—review & editing.

Corresponding author

Correspondence to Mostafa Ghobaei-Arani.

Ethics declarations

Conflict of interests

The authors declare no conflict of interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sajjadi, Z.S., Esmaeili, M., Ghobaei-Arani, M. et al. A hybrid clustering approach for link prediction in heterogeneous information networks. Knowl Inf Syst 65, 4905–4937 (2023). https://doi.org/10.1007/s10115-023-01914-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-023-01914-6

Keywords

Navigation