Abstract
In recent years, researchers from academic and industrial fields have become increasingly interested in social network data to extract meaningful information. This information is used in applications such as link prediction between people groups, community detection, protein module identification, etc. Therefore, the clustering technique has emerged as a solution to finding similarities between social network members. Recently, in most graph clustering solutions, the structural similarity of nodes is combined with their attribute similarity. The results of these solutions indicate that the graph's topological structure is more important. Since most social networks are sparse, these solutions often suffer from insufficient use of node features. This paper proposes a hybrid clustering approach as an application for link prediction in heterogeneous information networks (HINs). In our approach, an adjacency vector is determined for each node until, in this vector, the weight of the direct edge or the weight of the shortest communication path among every pair of nodes is considered. A similarity metric is presented that calculates similarity using the direct edge weight between two nodes and the correlation between their adjacency vectors. Finally, we evaluated the effectiveness of our proposed method using DBLP, Political blogs, and Citeseer datasets under entropy, density, purity, and execution time metrics. The simulation results demonstrate that while maintaining the cluster density significantly reduces the entropy and the execution time compared with the other methods.
Similar content being viewed by others
References
Aggarwal CC (ed) (2011) Social Network Data Analytics. Springer, US, Boston, MA
Nawaz W et al (2015) Intra graph clustering using collaborative similarity measure. Distrib Parallel Databases 33(4):583–603. https://doi.org/10.1007/s10619-014-7170-x
Skabar A (2017) Clustering mixed-attribute data using random walk. Procedia Comput Sci 108:988–997. https://doi.org/10.1016/j.procs.2017.05.083
Roh G-P, Hwang S-W (2011) Online clustering algorithms for semantic-rich network trajectories. J Comput Sci Eng JCSE 5(4):346–353. https://doi.org/10.5626/jcse.2011.5.4.346
Tian Y, Hankins RA and Patel JM (2008) Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data - SIGMOD '08. New York, New York, USA: ACM Press.
Fortunato S. Hric D (2016) Community detection in networks: a user guide. arXiv [physics. Soc-ph]. Available at: http://arxiv.org/abs/1608.00163.
Zhou Y, Cheng H, Yu JX (2009) Graph clustering based on structural/attribute similarities. Proc VLDB Endow 2(1):718–729. https://doi.org/10.14778/1687627.1687709
Cheng H, Zhou Y, Yu JX (2011) Clustering large attributed graphs: a balance between structural and attribute similarities. ACM Trans Knowl Discov Data (TKDD) 5(2):1–33
Sun Y et al (2011) Pathsim: meta path-based top-k similarity search in heterogeneous information networks. Proc VLDB Endow 4:992–1003
Shi C et al (2014) HeteSim: a general framework for relevance measure in heterogeneous networks. IEEE Trans Knowl Data Eng 26(10):2479–2492. https://doi.org/10.1109/tkde.2013.2297920
Li X et al. (2017) Semi-supervised clustering in attributed heterogeneous information networks. In: Proceedings of the 26th International Conference on World Wide Web. Republic and Canton of Geneva, Switzerland: International World Wide Web Conferences Steering Committee.
Zhou H et al (2017) A graph clustering method for community detection in complex networks. Physica A 469:551–562. https://doi.org/10.1016/j.physa.2016.11.015
Yang J, McAuley J and Leskovec J (2014) Community detection in networks with Node Attributes. arXiv [cs.SI]. Available at: http://arxiv.org/abs/1401.7267.
Lu J, Gong Z, Lin X (2017) A novel and fast SimRank algorithm. IEEE Trans Knowl Data Eng 29(3):572–585. https://doi.org/10.1109/tkde.2016.2626282
Shakibian H, Moghadam Charkari N (2017) Mutual information model for link prediction in heterogeneous complex networks. Sci Rep. https://doi.org/10.1038/srep44981
Bai L et al (2017) Fast graph clustering with a new description model for community detection. Inf Sci. https://doi.org/10.1016/j.ins.2017.01.026
Huang X, Cheng H, Yu JX (2015) Dense community detection in multi-valued attributed networks. Inf Sci 314:77–99. https://doi.org/10.1016/j.ins.2015.03.075
Li X et al (2022) SCHAIN-IRAM: an efficient and effective semi-supervised clustering algorithm for attributed heterogeneous information networks. IEEE Trans Knowl Data Eng 34(4):1980–1992. https://doi.org/10.1109/tkde.2020.2997938
Bobadilla J, Ortega F, Hernando A, Gutiérrez A (2013) Recommender systems survey. Knowl Based Syst 46:109–132
Lei C, Ruan J (2013) A novel link prediction algorithm for reconstructing protein–protein interaction networks by topological similarity. Bioinformatics 29(3):355–364
Rezaeipanah A, Ahmadi G, Sechin Matoori S (2020) A classification approach to link prediction in multiplex online ego-social networks. Soc Netw Anal Min 10(1):27
Ghorbanzadeh H et al (2021) A hybrid method of link prediction in directed graphs. Expert Syst Appl 165:113896. https://doi.org/10.1016/j.eswa.2020.113896
Zareie A, Sakellariou R (2020) Similarity-based link prediction in social networks using latent relationships between the users. Sci Rep 10(1):20137. https://doi.org/10.1038/s41598-020-76799-4
Wang X et al (2021) Link prediction in heterogeneous information networks: an improved deep graph convolution approach. Decis Support Syst 141:113448. https://doi.org/10.1016/j.dss.2020.113448
Jin W, Jung J, Kang U (2019) Supervised and extended restart in random walks for ranking and link prediction in networks. PloS one 14(3):e0213857. https://doi.org/10.1371/journal.pone.0213857
Berahmand K et al (2022) A new attributed graph clustering by using label propagation in complex networks. J King Saud Univ Comput Inf Sci 34(5):1869–1883. https://doi.org/10.1016/j.jksuci.2020.08.013
Agrawal S, Patel A (2021) SAG cluster: an unsupervised graph clustering based on collaborative similarity for community detection in complex networks. Physica A 563:125459. https://doi.org/10.1016/j.physa.2020.125459
Kumar A et al (2020) Link prediction in complex networks based on significance of higher-order path index (SHOPI). Physica A 545:123790. https://doi.org/10.1016/j.physa.2019.123790
Kumar A et al (2019) Level-2 node clustering coefficient-based link prediction. Appl Intell 49(7):2762–2779. https://doi.org/10.1007/s10489-019-01413-8
Ghasemi S, Zarei A (2022) Improving link prediction in social networks using local and global features: a clustering-based approach. Prog Artif Intell 11(1):79–92. https://doi.org/10.1007/s13748-021-00261-3
Lande D et al (2020) Link prediction of scientific collaboration networks based on information retrieval. World Wide Web 23(4):2239–2257. https://doi.org/10.1007/s11280-019-00768-9
Wei H, Xiong G, Wei Q, Cao W, Li X (2023) Structure-aware attributed heterogeneous network embedding. Knowl Inf Syst 65(4):1769–1785
Berahmand K, Mohammadi M, Faroughi A and Mohammadiani RP (2022) A novel method of spectral clustering in attributed networks by constructing parameter-free affinity matrix. Cluster Comput 1–20
Zhao W, Pu S (2021) Collaboration prediction in heterogeneous academic network with dynamic structure and topic. Knowl Inf Syst 63(8):2053–2074
Li W, Li T, Berahmand K (2023) An effective link prediction method in multiplex social networks using local random walk towards dependable pathways. J Comb Optim 45(1):31
Author information
Authors and Affiliations
Contributions
ZSS was involved in conceptualization, data curation, formal analysis, methodology, software, validation, writing—original draft. ME helped in conceptualization, data curation, supervision. MG-A contributed to conceptualization, data curation, supervision. BM helped in conceptualization, writing—review & editing.
Corresponding author
Ethics declarations
Conflict of interests
The authors declare no conflict of interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sajjadi, Z.S., Esmaeili, M., Ghobaei-Arani, M. et al. A hybrid clustering approach for link prediction in heterogeneous information networks. Knowl Inf Syst 65, 4905–4937 (2023). https://doi.org/10.1007/s10115-023-01914-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-023-01914-6