[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1401890.1401984acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances

Published: 24 August 2008 Publication History

Abstract

This work introduces a new family of link-based dissimilarity measures between nodes of a weighted directed graph. This measure, called the randomized shortest-path (RSP) dissimilarity, depends on a parameter θ and has the interesting property of reducing, on one end, to the standard shortest-path distance when θ is large and, on the other end, to the commute-time (or resistance) distance when θ is small (near zero). Intuitively, it corresponds to the expected cost incurred by a random walker in order to reach a destination node from a starting node while maintaining a constant entropy (related to θ) spread in the graph. The parameter θ is therefore biasing gradually the simple random walk on the graph towards the shortest-path policy. By adopting a statistical physics approach and computing a sum over all the possible paths (discrete path integral), it is shown that the RSP dissimilarity from every node to a particular node of interest can be computed efficiently by solving two linear systems of n equations, where n is the number of nodes. On the other hand, the dissimilarity between every couple of nodes is obtained by inverting an n x n matrix. The proposed measure can be used for various graph mining tasks such as computing betweenness centrality, finding dense communities, etc, as shown in the experimental section.

References

[1]
R. Agaev and P. Chebotarev. The Matrix of Maximum Out Forests of a Digraph and Its Applications. Automation and Remote Control, 61(9):1424--1450, 2000.
[2]
R. Agaev and P. Chebotarev. Spanning Forests of a Digraph and Their Applications. Automation and Remote Control, 62(3):443--466, 2001.
[3]
T. Akamatsu. Cyclic flows, Markov process and stochastic traffic assignment. Transportation Research B, 30(5):369--386, 1996.
[4]
R. B. Bapat. Resistance distance in graphs. The Mathematics Student, 68:87--98, 1999.
[5]
D. P. Bertsekas. Dynamic Programming and Optimal Control. Athena Scientific, 2000.
[6]
V. D. Blondel and P. V. Dooren. A measure of similarity between graph vertices, with application to synonym extraction and web searching. SIAM Review, 46(4):647--666, 2004.
[7]
I. Borg and P. Groenen. Modern multidimensional scaling: Theory and applications. Springer, 1997.
[8]
M. Brand. A random walks perspective on maximizing satisfaction and profit. Proceedings of the 2005 SIAM International Conference on Data Mining, 2005.
[9]
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1-7):107--117, 1998.
[10]
A. K. Chandra, P. Raghavan, W. L. Ruzzo, R. Smolensky, and P. Tiwari. The electrical resistance of a graph captures its commute and cover times. Annual ACM Symposium on Theory of Computing, pages 574--586, 1989.
[11]
P. Chebotarev and E. Shamis. The matrix-forest theorem and measuring relations in small social groups. Automation and Remote Control, 58(9):1505--1514, 1997.
[12]
P. Chebotarev and E. Shamis. On proximity measures for graph vertices. Automation and Remote Control, 59(10):1443--1459, 1998.
[13]
T. Chen and Q. Yang and X. Tang. Directed graph embedding. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 2707--2712, 2007.
[14]
F.R. Chung. Laplacians and the Cheeger inequality for directed graphs. Annals of Combinatorics, 9:1--19,2005.
[15]
D. J. Cook and L. B. Holder. Mining graph data. Wiley and Sons, 2006.
[16]
T. Cover and J. Thomas. Elements of Information Theory, 2nd ed. Wiley and Sons, 2006.
[17]
W. de Nooy, A. Mrvar, and V. Batagelj. Exploratory Social Network Analysis with Pajek. Cambridge University Press, 2005.
[18]
F. Fouss, A. Pirotte, J.-M. Renders, and M. Saerens. Random-walk computation of similarities between nodes of a graph, with application to collaborative recommendation. IEEE Transactions on Knowledge and Data Engineering, 19(3):355--369, 2007.
[19]
F. Fouss, L. Yen, A. Pirotte, and M. Saerens. An experimental investigation of graph kernels on a collaborative recommendation task. Proceedings of the 6th International Conference on Data Mining (ICDM 2006), pages 863--868, 2006.
[20]
F. Gobel and A. A. Jagers. Random walks on graphs. Stochastic Processes and their Applications, 2:311--336, 1974.
[21]
G. H. Golub and C. F. V. Loan. Matrix Computations, 3th Ed. The Johns Hopkins University Press, 1996.
[22]
M. Gori and A. Pucci. A random-walk based scoring algorithm with application to recommender systems for large-scale e-commerce. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006.
[23]
J. Ham, D. Lee, S. Mika, and B. Schölkopf. A kernel view of the dimensionality reduction of manifolds. Proceedings of the 21st International Conference on Machine Learning (ICML2004), pages 369--376, 2004.
[24]
D. Harel and Y. Koren. On clustering using random walks. Proceedings of the conference on the Foundations of Software Technology and Theoretical Computer Science; Lecture Notes in Computer Science, 2245:18--41, 2001.
[25]
T. Ito, M. Shimbo, T. Kudo, and Y. Matsumoto. Application of kernels to link analysis. Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 586--592, 2005.
[26]
E. T. Jaynes. Information theory and statistical mechanics. Physical Review, 106:620--630, 1957.
[27]
J. Kandola, N. Cristianini, and J. Shawe-Taylor. Learning semantic similarity. Advances in Neural Information Processing Systems (NIPS) 15, pages 657--664, 2002.
[28]
J. Kemeny and L. Snell Finite Markov Chains. Springer-Verlag, 1976.
[29]
M. M. Kessler. Bibliographic coupling between scientific papers. American Documentation, 14(1):10--25, 1963.
[30]
D. J. Klein and M. Randic. Resistance distance. Journal of Mathematical Chemistry, 12:81--95, 1993.
[31]
R. I. Kondor and J. Lafferty. Diffusion kernels on graphs and other discrete structures. Proceedings of the 19th International Conference on Machine Learning, pages 315--322, 2002.
[32]
Y. Koren, S. North, and C. Volinsky. Measuring and extracting proximity in networks. Proceedings of the SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 245--255, 2006.
[33]
Y. Koren, S. North, and C. Volinsky. Measuring and extracting proximity graphs in networks. ACM Transactions on Knowledge Discovery in Data, 1(3):12:1--12:30, 2007.
[34]
S. Lafon and A. B. Lee. Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9):1393--1403, 2006.
[35]
A. N. Langville and C. D. Meyer. Google's PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, 2006.
[36]
D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7):1019--1031, 2007.
[37]
L. Lovasz. Random walks on graphs: A survey. Combinatorics: Paul Erdos is eighty, 2:353--397, 1996.
[38]
W. Lu, J. Janssen, E. Milos, N. Japkowicz, and Y. Zhang. Node similarity in the citation graph. Knowledge and Information Systems, 11(1):105--129, 2006.
[39]
B. Nadler, S. Lafon, R. Coifman, and I. Kevrekidis. Diffusion maps, spectral clustering and eigenfunctions of Fokker-Planck operators. Advances in Neural Information Processing Systems (NIPS) 18, pages 955--962, 2005.
[40]
B. Nadler, S. Lafon, R. Coifman, and I. Kevrekidis. Diffusion maps, spectral clustering and reaction coordinate of dynamical systems. Applied and Computational Harmonic Analysis, 21:113--127, 2006.
[41]
M. Newman. A measure of betweenness centrality based on random walks. Social Networks, 27 (1):39--54, 2005.
[42]
M. Newman. The structure and dynamics of networks. Princeton University Press, 2006.
[43]
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical Report 1999-0120, Computer Science Department, Stanford University, 1999.
[44]
C. Palmer and C. Faloutsos. Electricity based external similarity of categorical attributes. Proceedings of the 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'03), pages 486--500, 2003.
[45]
J.-Y. Pan, H.-J. Yang, C. Faloutsos, and P. Duygulu. Automatic multimedia cross-modal correlation discovery. Proceedings of the 10th ACM SIGKDD international conference on Knowledge Discovery and Data Mining, pages 653--658, 2004.
[46]
P. Pons and M. Latapy. Computing communities in large networks using random walks. Journal of Graph Algorithms and Applications, 10(2):191--218, 2006.
[47]
H. Qiu and E. R. Hancock. Image segmentation using commute times. Proceedings of the 16th British Machine Vision Conference (BMVC 2005), pages 929--938, 2005.
[48]
H. Qiu and E. R. Hancock. Clustering and embedding using commute times. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(11):1873--1890, 2007.
[49]
L. Rabiner and B.-H. Juang. Fundamentals of speech recognition. Prentice Hall, 1993.
[50]
L. Reichl. A modern course in statistical physics, 2nd ed. Wiley, 1998.
[51]
J. Rudnick and G. Gaspari. Elements of the random walk. Cambridge University Press, 2004.
[52]
M. Saerens, Y. Achbany, F. Fouss, and L. Yen. Randomized shortest-path problems: Two seemingly unrelated problems. Manuscript submitted for publication, 2007.
[53]
M. Saerens, F. Fouss, L. Yen, and P. Dupont. The principal components analysis of a graph, and its relationships to spectral clustering. Proceedings of the 15th European Conference on Machine Learning (ECML 2004). Lecture Notes in Artificial Intelligence, vol. 3201, Springer-Verlag, Berlin, pages 371--383, 2004.
[54]
P. Sarkar and A. Moore. A tractable approach to finding closest truncated-commute-time neighbors in large graphs. Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), 2007.
[55]
E. Schrodinger. Statistical thermodynamics, 2nd ed. Cambridge University Press, 1952.
[56]
J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004.
[57]
M. Shimbo and T. Ito. Kernels as link analysis measures. In Mining Graph Data, D. Cook and L. Holder (editors). John Wiley & Sons, pages 283--310, 2006.
[58]
H. Small. Co-citation in the scientific literature: a new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4):265--269, 1973.
[59]
A. J. Smola and R. Kondor. Kernels and regularization on graphs. In M. Warmuth and B. Schölkopf, editors, Proceedings of the Conference on Learning Theory (COLT), pages 144--158, 2003.
[60]
A. Tahbaz and A. Jadbabaie. A one-parameter family of distributed consensus algorithms with boundary: from shortest paths to mean hitting times. In Proceedings of IEEE Conference on Decision and Control, pages 4664--4669, 2006.
[61]
H. Tong, C. Faloutsos, and J.-Y. Pan. Random walk with restart: fast solutions and applications. To appear in Knowledge and Information Systems, 2008.
[62]
H. Tong, Y. Koren, and C. Faloutsos. Fast direction-aware proximity for graph mining. Proceedings of the ACM SIGKDD international conference on Knowledge Discovery and Data Mining (KDD), pages 747--756.
[63]
S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, 1994.
[64]
S. White and P. Smyth. Algorithms for estimating relative importance in networks. Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pages 266--275, 2003.
[65]
L. Yen, F. Fouss, C. Decaestecker, P. Francq, and M. Saerens. Graph nodes clustering based on the commute-time kernel. In Proceedings of the 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2007). Lecture notes in Computer Science, volume LNAI4426, pages 1037--1045, 2007.
[66]
L. Yen, F. Fouss, C. Decaestecker, P. Francq, and M. Saerens. Link-based community detection based on the sigmoid commute-time kernel. Manuscript submitted for publication, 2008.
[67]
L. Yen, D. Vanvyve, F. Wouters, F. Fouss, M. Verleysen, and M. Saerens. Clustering using a random walk-based distance measure. In Proceedings of the 13th European Symposium on Artificial Neural Networks (ESANN2005), pages 317--324, 2005.
[68]
D. Zhao, Z.C. Lin and X.O. Tang. Contextual Distance for Data Perception. Proceedings of the Eleventh IEEE International Conference on Computer Vision (ICCV), pages 1--8, 2007.
[69]
D. Zhou and J. Huang and B. Schölkopf. Learning from labeled and unlabeled data on a directed graph. Proceedings of the 22nd International Conference on Machine Learning (ICML), pages 1041--1048, 2005.
[70]
H. Zhou. Distance, dissimilarity index, and network community structure. Physical Review E, 67(061901), 2003.
[71]
H. Zhou. Network landscape from a Brownian particle perspective. Physical Review E, 67(041908), 2003.
[72]
X. Zhu, J. Kandola, J. Lafferty, and Z. Ghahramani. Graph kernels by spectral transforms. In Semi-supervised learning, O. Chapelle, B. Schölkopf and A. Zien (editors), pages 277--291. MIT Press, 2006.

Cited By

View all
  • (2024)FairSample: Training Fair and Accurate Graph Convolutional Neural Networks EfficientlyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.330637836:4(1537-1551)Online publication date: Apr-2024
  • (2024)Scalable Distance Labeling Maintenance and Construction for Dynamic Small-World Networks2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00348(4573-4585)Online publication date: 13-May-2024
  • (2024)Brain functional gradients are related to cortical folding gradientCerebral Cortex10.1093/cercor/bhae45334:11Online publication date: 21-Nov-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2008
1116 pages
ISBN:9781605581934
DOI:10.1145/1401890
  • General Chair:
  • Ying Li,
  • Program Chairs:
  • Bing Liu,
  • Sunita Sarawagi
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. biased random walk
  2. commute-time distance
  3. graph mining
  4. kernel on a graph
  5. resistance distance
  6. shortest path

Qualifiers

  • Research-article

Conference

KDD08

Acceptance Rates

KDD '08 Paper Acceptance Rate 118 of 593 submissions, 20%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)31
  • Downloads (Last 6 weeks)1
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)FairSample: Training Fair and Accurate Graph Convolutional Neural Networks EfficientlyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.330637836:4(1537-1551)Online publication date: Apr-2024
  • (2024)Scalable Distance Labeling Maintenance and Construction for Dynamic Small-World Networks2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00348(4573-4585)Online publication date: 13-May-2024
  • (2024)Brain functional gradients are related to cortical folding gradientCerebral Cortex10.1093/cercor/bhae45334:11Online publication date: 21-Nov-2024
  • (2024)Sparse randomized policies for Markov decision processes based on Tsallis divergence regularizationKnowledge-Based Systems10.1016/j.knosys.2024.112105300(112105)Online publication date: Sep-2024
  • (2023)Multi-class graph clustering via approximated effective p-resistanceProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619641(29697-29733)Online publication date: 23-Jul-2023
  • (2023)Efficient Estimation of Pairwise Effective ResistanceProceedings of the ACM on Management of Data10.1145/35886961:1(1-27)Online publication date: 30-May-2023
  • (2023)Parallel Hub Labeling Maintenance With High Efficiency in Dynamic Small-World NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.323663235:11(11751-11768)Online publication date: 1-Nov-2023
  • (2022)Free Energy Node Embedding via Generalized Skip-gram with Negative SamplingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3206175(1-13)Online publication date: 2022
  • (2022)Design of biased random walks on a graph with application to collaborative recommendationPhysica A: Statistical Mechanics and its Applications10.1016/j.physa.2021.126752590(126752)Online publication date: Mar-2022
  • (2022)Relative entropy-regularized optimal transport on a graph: a new algorithm and an experimental comparisonInternational Journal of Machine Learning and Cybernetics10.1007/s13042-022-01704-614:4(1365-1390)Online publication date: 23-Dec-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media