[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2213836.2213896acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

TreeSpan: efficiently computing similarity all-matching

Published: 20 May 2012 Publication History

Abstract

Given a query graph $q$ and a data graph G, computing all occurrences of q in G, namely exact all-matching, is fundamental in graph data analysis with a wide spectrum of real applications. It is challenging since even finding one occurrence of q in G (subgraph isomorphism test) is NP-Complete. Consider that in many real applications, exploratory queries from users are often inaccurate to express their real demands. In this paper, we study the problem of efficiently computing all approximate occurrences of q in G. Particularly, we study the problem of efficiently retrieving all matches of q in G with the number of possible missing edges bounded by a given threshold θ, namely similarity all-matching. The problem of similarity all-matching is harder than the problem of exact all-matching since it covers the problem of exact all-matching as a special case with θ = 0.
In this paper, we develop a novel paradigm to conduct similarity all-matching. Specifically, we propose to use a minimal set QT of spanning trees in q to cover all connected subgraphs q' of q missing at most θ edges; that is, each q' is spanned by a spanning tree in QT. Then, we conduct exact all-matching for each spanning tree in QT to induce all similarity matches. A rigid theoretic analysis shows that our new search paradigm significantly reduces the times of conducting exact all-matching against the existing techniques. To further speed-up the computation, we develop new filtering, computation sharing, and search ordering techniques. Our comprehensive experiments on both real and synthetic datasets demonstrate that our techniques outperform the state of the art technique by 7 orders of magnitude.

References

[1]
A. V. Aho, J. E. Hopcroft, and J. Ullman. Data Structures and Algorithms. 1983.
[2]
C. Chen, X. Yan, P. S. Yu, J. Han, D.-Q. Zhang, and X. Gu. Towards graph containment search and indexing. In VLDB, pages 926--937, 2007.
[3]
J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: towards verification-free query processing on graph databases. In SIGMOD Conference, pages 857--872, 2007.
[4]
L. Cordella, P.Foggia, C. Sansone, and M. Vento. An improved algorithm for matching large graphs. In 3rd Workshop on Graph-based Representations in Pattern Recognition, pages 149--159, 2001.
[5]
M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979.
[6]
A. Gibbons. Algorithmic Graph Theory. Cambridge University Press, 1985.
[7]
H. He and A. K. Singh. Closure-tree: An index structure for graph queries. In ICDE, page 38, 2006.
[8]
H. Jiang, H. Wang, P. S. Yu, and S. Zhou. Gstring: A novel approach for efficient search in graph databases. In ICDE, pages 566--575, 2007.
[9]
K. Klein, N. Kriege, and P. Mutzel. Ct-index: Fingerprint-based graph indexing combining cycles and trees. In ICDE, pages 1115--1126, 2011.
[10]
M. E. J. Newman. Power laws, pareto distributions and zipf's law. Contemporary Physics, 46:323--351, 2005.
[11]
S. Nijssen and J. N. Kok. A quickstart in frequent structure mining can make a difference. In KDD, pages 647--652, 2004.
[12]
H. Shang, X. Lin, Y. Zhang, J. X. Yu, andW. Wang. Connected substructure similarity search. In SIGMOD, pages 903--914, 2010.
[13]
H. Shang, Y. Zhang, X. Lin, and J. X. Yu. Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB, 1(1):364--375, 2008.
[14]
D. Shasha, J. T.-L. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching. In PODS, pages 39--52, 2002.
[15]
Y. Tian and J. M. Patel. Tale: A tool for approximate large graph matching. In ICDE, pages 963--972, 2008.
[16]
N. Wang, J. Zhang, K.-L. Tan, and A. K. H. Tung. On triangulation-based dense neighborhood graphs discovery. PVLDB, 4(2):58--68, 2010.
[17]
D. W. Williams, J. Huan, and W. Wang. Graph database indexing using structured graph decomposition. In ICDE, pages 976--985, 2007.
[18]
X. Yan and P. S. Y. J. Han. Substructure similarity search in graph databases. In SIGMOD, pages 766--777, 2005.
[19]
X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In SIGMOD Conference, pages 335--346, 2004.
[20]
S. Zhang, M. Hu, and J. Yang. Treepi: A novel graph indexing method. In ICDE, pages 966--975, 2007.
[21]
S. Zhang, J. Li, H. Gao, and Z. Zou. A novel approach for efficient supergraph query processing on graph databases. In EDBT, pages 204--215, 2009.
[22]
S. Zhang, S. Li, and J. Yang. Gaddi: distance index based subgraph matching in biological networks. In EDBT, pages 192--203, 2009.
[23]
S. Zhang, J. Yang, and W. Jin. Sapper: Subgraph indexing and approximate matching in large graphs. In VLDB, 2010.
[24]
P. Zhao and J. Han. On graph query optimization in large networks. PVLDB, 3(1):340--351, 2010.
[25]
P. Zhao, J. X. Yu, and P. S. Yu. Graph indexing: Tree + delta >= graph. In VLDB, pages 938--949, 2007.
[26]
L. Zou, J. Mo, L. Chen, M. T. Ozsu, and D. Zhao. gstore: Answering sparql queries via subgraph matching. PVLDB, 2011.

Cited By

View all
  • (2024)l2Match: Optimization Techniques on Subgraph Matching Algorithm Using Label Pair, Neighboring Label Index, and Jump-Redo Method2024 International Conference on Electronics, Information, and Communication (ICEIC)10.1109/ICEIC61013.2024.10457108(1-6)Online publication date: 28-Jan-2024
  • (2024)FGAQ: Accelerating Graph Analytical Queries Using FPGAWeb and Big Data10.1007/978-981-97-7244-5_25(357-361)Online publication date: 28-Aug-2024
  • (2024)Parallel Program Generation for Hybrid Tabular-Textual Question AnsweringWeb and Big Data10.1007/978-981-97-7232-2_9(121-137)Online publication date: 28-Aug-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
May 2012
886 pages
ISBN:9781450312479
DOI:10.1145/2213836
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 May 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. graph
  2. similarity all-matching

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '12
Sponsor:

Acceptance Rates

SIGMOD '12 Paper Acceptance Rate 48 of 289 submissions, 17%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)3
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)l2Match: Optimization Techniques on Subgraph Matching Algorithm Using Label Pair, Neighboring Label Index, and Jump-Redo Method2024 International Conference on Electronics, Information, and Communication (ICEIC)10.1109/ICEIC61013.2024.10457108(1-6)Online publication date: 28-Jan-2024
  • (2024)FGAQ: Accelerating Graph Analytical Queries Using FPGAWeb and Big Data10.1007/978-981-97-7244-5_25(357-361)Online publication date: 28-Aug-2024
  • (2024)Parallel Program Generation for Hybrid Tabular-Textual Question AnsweringWeb and Big Data10.1007/978-981-97-7232-2_9(121-137)Online publication date: 28-Aug-2024
  • (2024)An Experimental Comparison of RDF Systems on CloudDatabases Theory and Applications10.1007/978-981-96-1242-0_3(30-43)Online publication date: 13-Dec-2024
  • (2024)Learning and Mapping Academic Topic Evolution Evolving - Topics in the Australian National Disability Insurance SchemeAdvanced Data Mining and Applications10.1007/978-981-96-0811-9_10(131-145)Online publication date: 13-Dec-2024
  • (2023)A Neighborhood Encoding for Subgraph Queries in Graph DatabasesDatabase and Expert Systems Applications10.1007/978-3-031-39847-6_30(377-391)Online publication date: 18-Aug-2023
  • (2022)Efficient Top-k Graph Similarity Search With GED ConstraintsIEEE Access10.1109/ACCESS.2022.319455910(79180-79191)Online publication date: 2022
  • (2021)Energy Signal-Aided Secure Beamforming and Self-Energy Recycling in Full-Duplex Wireless-Powered Relay NetworksEnergies10.3390/en1420649714:20(6497)Online publication date: 11-Oct-2021
  • (2021)An application of persistent homology and the graph theory to linguistics: The case of Tifinagh and Phoenician scriptsStatistics in Transition New Series10.21307/stattrans-2021-03122:3(141-156)Online publication date: 5-Sep-2021
  • (2021)Research on the Optimal Methods for Graph Edit DistanceProceedings of the 3rd International Conference on Advanced Information Science and System10.1145/3503047.3503062(1-7)Online publication date: 26-Nov-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media