[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3459637.3482217acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

VerSaChI: Finding Statistically Significant Subgraph Matches using Chebyshev's Inequality

Published: 30 October 2021 Publication History

Abstract

Approximate subgraph matching, an important primitive for many applications like question answering, community detection, and motif discovery, often involves large labeled graphs such as knowledge graphs, social networks, and protein sequences. Effective methods for extracting matching subgraphs, in terms of label and structural similarities to a query, should depict accuracy, computational efficiency, and robustness to noise. In this paper, we propose VerSaChI for finding the top-k most similar subgraphs based on 2-hop label and structural overlap similarity with the query. The similarity is characterized using Chebyshev's inequality to compute the chi-square statistical significance for measuring the degree of matching of the subgraphs. Experiments on real-life graph datasets showcase significant improvements in terms of accuracy compared to state-of-the-art methods, as well as robustness to noise.

Supplementary Material

MP4 File (CIKM21-rgsp2563.mp4)
Approximate subgraph matching, an important primitive for many applications like question answering, community detection, and motif discovery, often involves large labeled graphs such as knowledge graphs, social networks, and protein sequences. Effective methods for extracting matching subgraphs, in terms of label and structural similarities to a query, should depict accuracy, computational efficiency, and robustness to noise. In this paper, we propose VerSaChI for finding the top-k most similar subgraphs based on 2-hop label and structural overlap similarity with the query. The similarity is characterized using Chebyshev?s inequality to compute the chi-square statistical significance for measuring the degree of matching of the subgraphs. Experiments on real-life graph datasets showcase significant improvements in terms of accuracy compared to state-of-the-art methods, as well as robustness to noise.

References

[1]
S. Agarwal, S. Dutta, and A. Bhattacharya. 2020. ChiSeL: Graph Similarity Search using Chi-Squared Statistics in Large Probabilistic Graphs. PVLDB, Vol. 13, 10 (2020), 1654--1668.
[2]
S. Agarwal, S. Dutta, and A. Bhattacharya. 2021. VerSaChI: Finding Statistically Significant Subgraph Matches using Chebyshev's Inequality. https://arxiv.org/abs/2108.07996.
[3]
C. C. Aggarwal and H. Wang. 2010. Graph Data Management and Mining: A Survey of Algorithms and Applications. Advances in Database Systems, Vol. 40 (2010), 13--68.
[4]
A. Arora, M. Sachan, and A. Bhattacharya. 2014. Mining Statistically Significant Connected Subgraphs in Vertex Labeled Graphs. In International Conference on Management of Data (SIGMOD). 1003--1014.
[5]
L. Babai. 2016. Graph Isomorphism in Quasipolynomial Time. In STOC. 684--697.
[6]
F. Bi, L. Chang, X. Lin, L. Qin, and W. Zhang. 2016. Efficient Subgraph Matching by Postponing Cartesian Products. In SIGMOD. 1199--1214.
[7]
A. Bordes, S. Chopra, and J. Weston. 2014. Question Answering with Subgraph Embeddings. In EMNLP. 615--620.
[8]
C. Chen, X. Yan, P. S. Yu, J. Han, D. Zhang, and X. Gu. 2007. Towards Graph Containment Search and Indexing. In VLDB. 926--937.
[9]
W. Chen, J. Liu, Z. Chen, X. Tang, and K. Li. 2017. PBSM: An Efficient Top-K Subgraph Matching Algorithm. International Journal of Pattern Recognition and Artificial Intelligence, Vol. 32, 6 (2017).
[10]
X. Chen. 2020. Simulation-based Approximate Graph Pattern Matching. In International Conference on Management of Data (SIGMOD). 2825--2827.
[11]
X. Chen, L. Lai, L. Qin, X. Lin, and B. Liu. 2021. A Framework to Quantify Approximate Simulation on Graph Data. In International Conference on Data Engineering (ICDE). 1308--1319.
[12]
J. Cheng, Y. Ke, W. Ng, and A. Lu. 2007. FG-Index: Towards Verification-free Query Processing on Graph Databases. In SIGMOD. 857--872.
[13]
T. Y. Cheung. 1983. State of the Art of Graph-based Data Mining. Transactions on Software Engineering, Vol. 5, 1 (1983), 59--68.
[14]
D. Conte, P. Foggia, C. Sansone, and M. Vento. 2004. Thirty Years of Graph Matching in Pattern Recognition. IJPRAI, Vol. 18, 3 (2004), 265--298.
[15]
S. A. Cook. 1971. The Complexity of Theorem-proving Procedures. In STOC. 151--158.
[16]
L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. 2004. A (Sub)graph Isomorphism Algorithm for Large Graphs. PAMI, Vol. 26, 10 (2004), 1367--1372.
[17]
S. Dutta. 2015. MIST: Top-k Approximate Sub-string Mining Using Triplet Statistical Significance. In European Conference on Information Retrieval (ECIR). 284--290.
[18]
S. Dutta and A. Bhattacharya. 2010. Most Significant Substring Mining Based on Chi-square Measure. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). 319--327.
[19]
S. Dutta and A. Bhattacharya. 2012. Mining Statistically Significant Substrings Based on the Chi-Square Measure. In Pattern Discovery Using Sequence Data Mining: Applications and Studies. IGI Global, 73--82.
[20]
S. Dutta and J. Lauri. 2019. Finding a Maximum Clique in Dense Graphs via Chi-Square Statistics. In International Conference on Information and Knowledge Management (CIKM). 2421--2424.
[21]
S. Dutta, P. Nayek, and A. Bhattacharya. 2017. Neighbor-Aware Search for Approximate Labeled Graph Matching using the Chi-Square Statistics. In International Conference on World Wide Web (WWW). 1281--1290.
[22]
B. Gallagher. 2006. Matching Structure and Semantics: A Survey on Graph-based Pattern Matching. In AAAI. 45--53.
[23]
Rosalba Giugno and Dennis Shasha. 2002. GraphGrep: A Fast and Universal Method for Querying Graphs. ICPR, Vol. 2 (2002), 201--212.
[24]
Y. Gu, C. Gao, L. Wang, and G. Yu. 2016. Subgraph Similarity Maximal All-matching over a Large Uncertain Graph. In International Conference on World Wide Web (WWW). 755--782.
[25]
M. Han, H. Kim, G. Gu, K. Park, and W. Han. 2019. Efficient Subgraph Matching: Harmonizing Dynamic Programming, Adaptive Matching Order, and Failing Set Together. In SIGMOD. 1429--1446.
[26]
W. Han, J. Lee, M. Pham, and J. X. Yu. 2010. iGraph: A Framework for Comparisons of Disk-based Graph Indexing Techniques. PVLDB, Vol. 3, 1--2 (2010), 449--459.
[27]
L. Hong, L. Zou, X. Lian, and P. S. Yu. 2015. Subgraph Matching with Set Similarity in a Large Graph Database. Transactions on Knowledge and Data Engineering, Vol. 27, 9 (2015), 2507--2521.
[28]
H. Jiang, H. Wang, P. S. Yu, and S. Zhou. 2007. GString: A Novel Approach for Efficient Search in Graph DBs. In ICDE. 566--575.
[29]
A. Jüttner and P. Madarasi. 2018. VF2: An Improved Subgraph Isomorphism Algorithm. Discrete Applied Mathematics, Vol. 242 (2018), 69--81.
[30]
V. Kassiano, A. Gounaris, A. N. Papadopoulos, and K. Tsichlas. 2016. Mining Uncertain Graphs: An Overview. In International Symposium on Algorithmic Aspects of Cloud Computing (ALGOCLOUD). 87--116.
[31]
B. P. Kelley, B. Yuan, F. Lewitter, R. Sharan, B.R. Stockwell, and T. Ideker. 2004. PathBLAST: A Tool for Alignment of Protein Interaction Networks. Nucleic Acids Research, Vol. 32 (2004), 83--88.
[32]
A. Khan, Y. Wu, C. C. Aggarwal, and X. Yan. 2013. NeMa: Fast Graph Search with Label Similarity. PVLDB, Vol. 6, 3 (2013), 181--192.
[33]
A. Khan, X. Yan, and K. L. Wu. 2010. Towards Proximity Pattern Mining in Large Graphs. In SIGMOD. 867--878.
[34]
S. Kpodjedo, P. Galinier, and G. Antoniol. 2014. Using Local Similarity Measures to Efficiently Address Approximate Graph Matching. Discrete Applied Mathematics, Vol. 164 (2014), 161--177.
[35]
G. Li, L. Yan, and Z. Ma. 2019. An Approach for Approximate Subgraph Matching in Fuzzy RDF Graph. Fuzzy Sets and Systems, Vol. 376 (2019), 106--126.
[36]
L. Liu, B. Du, J. Xu, and H. Tong. 2019. G-Finder: Approximate Attributed Subgraph Matching. In International Conference on Big Data. 513--522.
[37]
L. Livi and A. Rizzi. 2013. The Graph Matching Problem. Pattern Analysis and Application, Vol. 16 (2013), 253--283.
[38]
S. Ma, Y. Cao, W. Fan, J. Huai, and T. Wo. 2014. Strong Simulation: Capturing Topology in Graph Pattern Matching. Transactions on Database Systems (TODS), Vol. 39, 1 (2014), 1--46.
[39]
A. Mahmood, H. Farooq, and J. Ferzund. 2017. Large Scale Graph Matching (LSGM): Techniques, Tools, Applications and Challenges. International Journal of Advanced Computer Science and Applications, Vol. 8, 4 (2017), 494--499.
[40]
V. Nastase, R. Mihalcea, and D. R. Radev. 2015. A Survey of Graphs in Natural Language Processing. Natural Language Engineering, Vol. 21 (2015), 665--698.
[41]
T. Read and N. Cressie. 1988. Goodness-of-fit Statistics for Discrete Multivariate Data. Springer Series in Statistics.
[42]
C. R. Rivero and H. M. Jamil. 2017. Efficient and Scalable Labeled Subgraph Matching using SGMatch. Knowledge and Information Systems, Vol. 51 (2017), 61--87.
[43]
R. A. Rossi and N. K. Ahmed. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. In AAAI. 4292--4293.
[44]
M. Sachan and A. Bhattacharya. 2012. Mining Statistically Significant Substrings using the χ2 Statistic. PVLDB, Vol. 5, 10 (2012), 1052--1063.
[45]
H. Shang, Y. Zhang, X. Lin, and J. Yu. 2008. Taming Verification Hardness: An Efficient Algorithm for Testing Subgraph Isomorphism. PVLDB, Vol. 1, 1 (2008), 364--375.
[46]
R. Shen and C. Guda. 2014. Applied Graph-Mining Algorithms to Study Biomolecular Interaction Networks. BioMed Research Int., Vol. 2014, 439476 (2014), 11.
[47]
R. Singh, J. Xu, and B. Berger. 2008. Global Alignment of Multiple Protein Interaction Networks with Application to Functional Orthology Detection. PNAS, Vol. 105, 35 (2008), 12763--12768.
[48]
S. Sun and Q. Luo. 2019. Scaling Up Subgraph Query Processing with Efficient Subgraph Matching. In ICDE. 220--231.
[49]
P. Tchebichef. 1867. Des Valeurs Moyennes. Journal de Mathématiques Pures et Appliquees, Vol. 12 (1867), 177--184.
[50]
Y. Tian, R. McEachin, C. Santos, D. States, and J. Patel. 2006. SAGA: A Subgraph Matching Tool for Biological Graphs. Bioinformatics, Vol. 23, 2 (2006), 232--239.
[51]
Y. Tian and J.M. Patel. 2008. TALE: A tool for approximate large graph matching. In ICDE. 963--972.
[52]
A. Tversky. 1977. Features of Similarity. Psychological Review, Vol. 84, 4 (1977), 327--352.
[53]
J. R. Ullmann. 1976. An Algorithm for Subgraph Isomorphism. JACM, Vol. 23, 1 (1976), 31--42.
[54]
F. Vandin, E. Upfal, and B. J. Raphael. 2011. Algorithms for Detecting Significantly Mutated Pathways in Cancer. JCB, Vol. 18, 3 (2011), 507--522.
[55]
J. Wang, N. Ntarmos, and P. Triantafillou. 2016. Indexing Query Graphs to Speedup Graph Query Processing. In EDBT. 41--52.
[56]
J. Yan, X. Yin, W. Lin, C. Deng, H. Zha, and X. Yang. 2016. A Short Survey of Recent Advances in Graph Matching. In ICMR. 167--174.
[57]
Xifeng Yan, Philip S. Yu, and Jiawei Han. 2005 a. Graph Indexing Based on Discriminative Frequent Structure Analysis. TODS, Vol. 30, 4 (2005), 960--993.
[58]
X. Yan, P. S. Yu, and J. Han. 2005 b. Substructure Similarity Search in Graph Databases. In SIGMOD. 766--777.
[59]
Y. Yuan, G. Wang, L. Chen, and H. Wang. 2015. Graph Similarity Search on Large Uncertain Graph Databases. VLDB Journal, Vol. 24, 2 (2015), 271--296.
[60]
S. Zhang, J. Li, H. Gao, and Z. Zou. 2009. A Novel Approach for Efficient Supergraph Query Processing on Graph Databases. In EDBT. 204--215.
[61]
S. Zhang, J. Yang, and W. Jin. 2010. SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs. PVLDB, Vol. 3, 1--2 (2010), 1185--1194.
[62]
Qinghua Zou, Shaorong Liu, and Wesley W. Chu. 2004. CTree: A Compact Tree for Indexing XML Data. In WIDM. 39--46.

Cited By

View all
  • (2024)VeNoM: Approximate Subgraph Matching with Enhanced Neighbourhood Structural InformationProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632459(18-26)Online publication date: 4-Jan-2024

Index Terms

  1. VerSaChI: Finding Statistically Significant Subgraph Matches using Chebyshev's Inequality

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
      October 2021
      4966 pages
      ISBN:9781450384469
      DOI:10.1145/3459637
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 30 October 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. approximate matching
      2. chebyshev's inequality
      3. chi-square
      4. labeled graph
      5. statistical significance
      6. subgraph similarity

      Qualifiers

      • Short-paper

      Conference

      CIKM '21
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)26
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 13 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)VeNoM: Approximate Subgraph Matching with Enhanced Neighbourhood Structural InformationProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632459(18-26)Online publication date: 4-Jan-2024

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media