[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3485447.3511952acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

CoSimHeat: An Effective Heat Kernel Similarity Measure Based on Billion-Scale Network Topology✱

Published: 25 April 2022 Publication History

Abstract

Myriads of web applications in the Big Data era demand an effective measure of similarity based on billion-scale network structures, e.g., collaborative filtering. Recently, CoSimRank has been devised as a promising graph-theoretic similarity model, which iteratively captures the notion that “two distinct nodes are evaluated as similar if they are connected with similar nodes”. However, the existing CoSimRank model for assessing similarities may either yield unsatisfactory results or rather cost-inhibitive, rendering it impractical in massive graphs. In this paper, we propose CoSimHeat, a novel scalable graph-theoretic similarity model based on heat diffusion. Specifically, we first formulate CoSimHeat model by taking advantage of heat diffusion to emulate the activities of similarity propagations on the Web. Then, we show that the similarities produced by CoSimHeat are more satisfactory than those from CoSimRank families since CoSimHeat fulfils four axioms that an ideal similarity model should satisfy while circumventing the “dead-loop” problem of CoSimRank. Next, we propose a fast algorithm to substantially accelerate CoSimHeat computations on billion-sized graphs, with guarantees of accuracy. Our experiments on various datasets validate that CoSimHeat achieves higher accuracy and is order-of-magnitude faster than state-of-the-art competitors.

References

[1]
2018. DBLP. (2018). http://dblp.uni-trier.de/~ley/db/
[2]
2018. Larger Crawls. (2018). https://law.di.unimi.it/datasets.php
[3]
2021. Microsoft Academic Research. (2021). https://academic.microsoft.com/
[4]
2021. Project WordGraph. (2021). https://www.ims.uni-stuttgart.de/en/research/projects/wordgraph/
[5]
2021. Stanford Large Network Datasets. (2021). http://snap.stanford.edu/data/index.html
[6]
Ioannis Antonellis, Hector Garcia-Molina, and Chi-Chao Chang. 2008. SimRank++: Query rewriting through link analysis of the click graph. In WWW. 1177–1178. https://doi.org/10.1145/1367497.1367714
[7]
Arieh Ben-Naim. 2010. Discover entropy and the second law of thermodynamics: A playful way of discovering a law of nature. World Scientific.
[8]
Hung-Hsuan Chen and C. Lee Giles. 2015. ASCOS++: An Asymmetric Similarity Measure for Weighted Networks to Address the Problem of SimRank. ACM Trans. Knowl. Discov. Data 10, 2 (2015), 15:1–15:26. https://doi.org/10.1145/2776894
[9]
Fan Chung. 2007. The heat kernel as the PageRank of a graph. Proceedings of the National Academy of Sciences 104, 50(2007), 19735–19740.
[10]
Hewayda El-Ghawalby and Edwin R Hancock. 2009. Geometric characterizations of graphs using heat kernel embeddings. In IMA International Conference on Mathematics of Surfaces. Springer, 124–142.
[11]
Paul Jaccard. 1912. The distribution of the flora in the alpine zone. New phytologist 11, 2 (1912), 37–50.
[12]
Glen Jeh and Jennifer Widom. 2002. SimRank: A measure of structural-context similarity. In SIGKDD. 538–543. https://doi.org/10.1145/775047.775126
[13]
Kyle Kloster and David F Gleich. 2014. Heat kernel based community detection. In KDD. 1386–1395.
[14]
Xueting Liao, Yubao Wu, and Xiaojun Cao. 2019. Second-Order CoSimRank for Similarity Measures in Social Networks. In ICC. IEEE, 1–6. https://doi.org/10.1109/ICC.2019.8761899
[15]
Zhenjiang Lin, Michael R. Lyu, and Irwin King. 2006. PageSim: A novel link-based measure of web page similarity. In WWW. 1019–1020. https://doi.org/10.1145/1135777.1135994
[16]
Dmitry Lizorkin, Pavel Velikhov, Maxim N. Grinev, and Denis Turdakov. 2010. Accuracy estimate and optimization techniques for SimRank computation. VLDB J. 19, 1 (2010), 45–66. https://doi.org/10.1007/s00778-009-0168-8
[17]
Zhenqi Lu, Johan Wahlström, and Arye Nehorai. 2021. Local clustering via approximate heat kernel PageRank with subgraph sampling. Scientific Reports 11, 1 (2021), 1–19.
[18]
Sascha Rothe and Hinrich Schütze. 2014. CoSimRank: A Flexible & Efficient Graph-Theoretic Similarity Measure. In ACL. 1392–1402. http://aclweb.org/anthology/P/P14/P14-1131.pdf
[19]
Gerard Salton. 1989. Automatic text processing: The transformation, analysis, and retrieval of. Reading: Addison-Wesley 169 (1989).
[20]
Hanzhi Wang, Zhewei Wei, Ye Yuan, Xiaoyong Du, and Ji-Rong Wen. 2020. Exact Single-Source SimRank Computation on Large Graphs. In SIGMOD, David Maier, Rachel Pottinger, AnHai Doan, Wang-Chiew Tan, Abdussalam Alawini, and Hung Q. Ngo (Eds.). ACM, 653–663. https://doi.org/10.1145/3318464.3389781
[21]
Wensi Xi, Edward A. Fox, Weiguo Fan, Benyu Zhang, Zheng Chen, Jun Yan, and Dong Zhuang. 2005. SimFusion: Measuring similarity using unified relationship matrix. In SIGIR.
[22]
Haixuan Yang, Irwin King, and Michael R Lyu. 2007. DiffusionRank: A possible penicillin for web spamming. In SIGIR. 431–438.
[23]
Renchi Yang. 2020. Fast Approximate CoSimRanks via Random Projections. CoRR abs/2010.11880(2020). arxiv:2010.11880https://arxiv.org/abs/2010.11880
[24]
Renchi Yang, Xiaokui Xiao, Zhewei Wei, Sourav S Bhowmick, Jun Zhao, and Rong-Hua Li. 2019. Efficient estimation of heat kernel PageRank for local clustering. In SIGMOD. 1339–1356.
[25]
Seok-Ho Yoon, Sang-Wook Kim, and Sunju Park. 2016. C-Rank: A link-based similarity measure for scientific literature databases. Inf. Sci. 326(2016), 25–40. https://doi.org/10.1016/j.ins.2015.07.036
[26]
Weiren Yu, Sima Iranmanesh, Aparajita Haldar, Maoyin Zhang, and Hakan Ferhatosmanoglu. 2021. RoleSim*: Scaling axiomatic role-based similarity ranking on large graphs. World Wide Web (2021), 1–45.
[27]
Weiren Yu, Xuemin Lin, Wenjie Zhang, Jian Pei, and Julie A. McCann. 2019. SimRank*: Effective and scalable pairwise similarity search based on graph topology. VLDB J. 28, 3 (2019), 401–426. https://doi.org/10.1007/s00778-018-0536-3
[28]
Weiren Yu, Xuemin Lin, Wenjie Zhang, Ying Zhang, and Jiajin Le. 2012. SimFusion+: Extending SimFusion towards efficient estimation on large and dynamic networks. In SIGIR.
[29]
Weiren Yu, Julie McCann, Chengyuan Zhang, and Hakan Ferhatosmanoglu. 2022. Scaling High-Quality Pairwise Link-Based Similarity Retrieval on Billion-Edge Graphs. ACM Trans. Inf. Syst. 40, 4, Article 78 (2022), 45 pages. https://doi.org/10.1145/3495209
[30]
Weiren Yu and Julie A. McCann. 2015. Co-Simmate: Quick Retrieving All Pairwise Co-Simrank Scores. In ACL. 327–333. http://aclweb.org/anthology/P/P15/P15-2054.pdf
[31]
Weiren Yu and Julie A. McCann. 2016. Random Walk with Restart over Dynamic Graphs. In ICDM. 589–598. https://doi.org/10.1109/ICDM.2016.0070
[32]
Weiren Yu and Fan Wang. 2018. Fast Exact CoSimRank Search on Evolving and Static Graphs. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, April 23-27, 2018, Pierre-Antoine Champin, Fabien Gandon, Mounia Lalmas, and Panagiotis G. Ipeirotis (Eds.). ACM, 599–608. https://doi.org/10.1145/3178876.3186126
[33]
Peixiang Zhao, Jiawei Han, and Yizhou Sun. 2009. P-Rank: A comprehensive structural similarity measure over information networks. In CIKM. 553–562. https://doi.org/10.1145/1645953.1646025

Cited By

View all
  • (2024)P-Rank+: A Scalable Efficient P-Rank Search AlgorithmProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679976(4278-4282)Online publication date: 21-Oct-2024
  • (2024)TPGraph: A Spatial-Temporal Graph Learning Framework for Accurate Traffic Prediction on Arterial RoadsIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.333455825:5(3911-3926)Online publication date: May-2024
  • (2023)StyleMe: Towards Intelligent Fashion Generation with Designer StyleProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581377(1-16)Online publication date: 19-Apr-2023
  • Show More Cited By

Index Terms

  1. CoSimHeat: An Effective Heat Kernel Similarity Measure Based on Billion-Scale Network Topology✱
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        WWW '22: Proceedings of the ACM Web Conference 2022
        April 2022
        3764 pages
        ISBN:9781450390965
        DOI:10.1145/3485447
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 25 April 2022

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Link Analysis
        2. Scalable Algorithms
        3. Similarity Search

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Funding Sources

        Conference

        WWW '22
        Sponsor:
        WWW '22: The ACM Web Conference 2022
        April 25 - 29, 2022
        Virtual Event, Lyon, France

        Acceptance Rates

        Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)39
        • Downloads (Last 6 weeks)6
        Reflects downloads up to 11 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)P-Rank+: A Scalable Efficient P-Rank Search AlgorithmProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679976(4278-4282)Online publication date: 21-Oct-2024
        • (2024)TPGraph: A Spatial-Temporal Graph Learning Framework for Accurate Traffic Prediction on Arterial RoadsIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.333455825:5(3911-3926)Online publication date: May-2024
        • (2023)StyleMe: Towards Intelligent Fashion Generation with Designer StyleProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581377(1-16)Online publication date: 19-Apr-2023
        • (2023)SimSky: An Accuracy-Aware Algorithm for Single-Source SimRank SearchMachine Learning and Knowledge Discovery in Databases: Research Track10.1007/978-3-031-43418-1_14(226-241)Online publication date: 18-Sep-2023

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media