[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Incremental graph pattern matching

Published: 05 September 2013 Publication History

Abstract

Graph pattern matching is commonly used in a variety of emerging applications such as social network analysis. These applications highlight the need for studying the following two issues. First, graph pattern matching is traditionally defined in terms of subgraph isomorphism or graph simulation. These notions, however, often impose too strong a topological constraint on graphs to identify meaningful matches. Second, in practice a graph is typically large, and is frequently updated with small changes. It is often prohibitively expensive to recompute matches starting from scratch via batch algorithms when the graph is updated.
This article studies these two issues. (1) We propose to define graph pattern matching based on a notion of bounded simulation, which extends graph simulation by specifying the connectivity of nodes in a graph within a predefined number of hops. We show that bounded simulation is able to find sensible matches that the traditional matching notions fail to catch. We also show that matching via bounded simulation is in cubic time, by giving such an algorithm. (2) We provide an account of results on incremental graph pattern matching, for matching defined with graph simulation, bounded simulation, and subgraph isomorphism. We show that the incremental matching problem is unbounded, that is, its cost is not determined alone by the size of the changes in the input and output, for all these matching notions. Nonetheless, when matching is defined in terms of simulation or bounded simulation, incremental matching is semibounded, that is, its worst-time complexity is bounded by a polynomial in the size of the changes in the input, output, and auxiliary information that is necessarily maintained to reuse previous computation, and the size of graph patterns. We also develop incremental matching algorithms for graph simulation and bounded simulation, by minimizing unnecessary recomputation. In contrast, matching based on subgraph isomorphism is neither bounded nor semibounded. (3) We experimentally verify the effectiveness and efficiency of these algorithms, and show that: (a) the revised notion of graph pattern matching allows us to identify communities commonly found in real-life networks, and (b) the incremental algorithms substantially outperform their batch counterparts in response to small changes. These suggest a promising framework for real-life graph pattern matching.

References

[1]
Abiteboul, S., Mchugh, J., Rys, M., Vassalos, V., and Wiener, J. L. 1998. Incremental maintenance for materialized views over semistructured data. In Proceedings of the 24th International Conference on Very Large Data Bases (VLDB'98). 38--49.
[2]
Alpern, B., Hoover, R., Rosen, B. K., Sweeney, P. F., and Zadeck, F. K. 1990. Incremental evaluation of computational circuits. In Proceedings of the 1st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'90). 32--42.
[3]
Amer-Yahia, S., Benedikt, M., and Bohannon, P. 2007. Challenges in searching online communities. IEEE Data Engin. Bull. 30, 2, 23--31.
[4]
Bang-Jensen, J. and Gutin, G. Z. 2008. Digraphs: Theory, Algorithms and Applications. Springer.
[5]
Bruno, N., Koudas, N., and Srivastava, D. 2002. Holistic twig joins: Optimal XML pattern matching. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'02). 310--321.
[6]
Brynielsson, J., Hogberg, J., Kaati, L., Martenson, C., and Svenson, P. 2010. Detecting social positions using simulation. In Proceedings of the International Conference on Advances in Social Networks Analysis and Mining (ASONAM'10). 48--55.
[7]
Chan, E. P. and Lim, H. 2007. Optimization and evaluation of shortest path queries. The Int. J. Very Large Data Bases 16, 3, 343--369.
[8]
Chen, L., Gupta, A., and Kurul, M. E. 2005. Stack-based algorithms for pattern matching on dags. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB'05). 493--504.
[9]
Chen, Z., Shen, H. T., Zhou, X., and Yu, J. X. 2009. Monitoring path nearest neighbor in road networks. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'09).
[10]
Cheng, J., Yu, J. X., Ding, B., Yu, P. S., and Wang, H. 2008a. Fast graph pattern matching. In Proceedings of the 24th International Conference on Data Engineering (ICDE'08).
[11]
Cheng, J., Yu, J. X., Lin, X., Wang, H., and Yu, P. S. 2008b. Fast computing reachability labelings for large graphs with high compression rate. In Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology (EDBT'08).
[12]
Cho, J., Shivakumar, N., and Garcia-Molina, H. 2000. Finding replicated web collections. SIGMOD Rec. 29, 2.
[13]
Cohen, E., Halperin, E., Kaplan, H., and Zwick, U. 2003. Reachability and distance queries via 2-hop labels. SIAM J. Comput. 32, 5, 1338--1355.
[14]
Cordella, L. P., Foggia, P., Sansone, C., and Vento, M. 2004. A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26, 10, 1367--1372.
[15]
Dovier, A., Piazza, C., and Policriti, A. 2001. A fast bisimulation algorithm. In Proceedings of the 13th International Conference on Computer Aided Verification (CAV'01). 79--90.
[16]
Fan, W. and Bohannon, P. 2008. Information preserving xml schema embedding. ACM Trans. Datab. Syst. 33, 1.
[17]
Fan, W., Li, J., Luo, J., Tan, Z., Wang, X., and Wu, Y. 2011a. Incremental graph pattern matching. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'11). 925--936.
[18]
Fan, W., Li, J., Ma, S., Tang, N., and Wu, Y. 2011b. Adding regular expressions to graph reachability and pattern queries. In Proceedings of the IEEE 27th International Conference on Data Engineering (ICDE'11). 39--50.
[19]
Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., and Wu, Y. 2010a. Graph pattern matching: From intractable to polynomial time. Proc. VLDB Endow. 3, 1--2, 264--275.
[20]
Fan, W., Li, J., Ma, S., Wang, H., and Wu, Y. 2010b. Graph homomorphism revisited for graph matching. Proc. VLDB Endow. 3, 1--2, 1161--1172.
[21]
Floyd, R. W. 1962. Algorithm 97: Shortest path. Comm. ACM 5, 6, 345.
[22]
Gallagher, B. 2006. Matching structure and semantics: A survey on graph-based pattern matching. In Proceedings of the Conference of the American Association for Artificial Intelligence (AAAI'06).
[23]
Garey, M. and Johnson, D. 1979. Computers and Intractability: A Guide to the Theory of NP- Completeness. W. H. Freeman and Company.
[24]
Garg, S., Gupta, T., Carlsson, N., and Mahanti, A. 2009. Evolution of an online social aggregation network: An empirical study. In Proceedings of the 9th ACM SIGCOMM Internet Measurement Conference (IMC'09). 315--321.
[25]
Gentilini, R., Piazza, C., and Policriti, A. 2003. From bisimulation to simulation: Coarsest partition problems. J. Autom. Reason. 31, 1, 73--103.
[26]
Gupta, A. and Mumick, I. 2000. Materialized Views. MIT Press.
[27]
He, H. and Singh, A. K. 2009. Graphs-at-a-time: Query language and access methods for graph databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'08). 405--418.
[28]
Henzinger, M. R., Henzinger, T., and Kopke, P. 1995. Computing simulations on finite and infinite graphs. In Proceedings of the 36th Annual Symposium on Foundations of Computer Science (FOCS'95). 453.
[29]
Jin, R., Xiang, Y., Ruan, N., and Fuhry, D. 2009. 3-hop: A high-compression indexing scheme for reachability query. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'09). 813--826.
[30]
Kahn, A. B. 1962. Topological sorting of large networks. Comm. ACM 5, 11, 558--562.
[31]
Kumar, R., Novak, J., and Tomkins, A. 2006. Structure and evolution of online social networks. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'06). 611--617.
[32]
Leskovec, J., Kleinberg, J., and Faloutsos, C. 2007. Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Disc. Data 1, 1, 2.
[33]
Li, C.-T. and Shan, M.-K. 2012. Composing activity groups in social networks. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM'12).
[34]
Ma, S., Cao, Y., Fan, W., Huai, J., and Wo, T. 2011. Capturing topology in graph pattern matching. Proc. VLDB Endow. 5, 4, 310--321.
[35]
Milner, R. 1989. Communication and Concurrency. Prentice Hall.
[36]
Nardo, L. D., Ranzato, F., and Tapparo, F. 2009. The subgraph similarity problem. IEEE Trans. Knowl. Data Engin. 21, 5, 748--749.
[37]
Natarajan, M. 2000. Understanding the structure of a drug trafficking organization: A conversational analysis. Crime Prevent. Stud. 11, 273--298.
[38]
Ntoulas, A., Cho, J., and Olston, C. 2004. What's new on the web? The evolution of the web from a search engine perspective. In Proceedings of the 13th International Conference on World Wide Web (WWW'04). 1--12.
[39]
Potamias, M., Bonchi, F., Castillo, C., and Gionis, A. 2009. Fast shortest path distance estimation in large networks. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM'09).
[40]
Ramalingam, G. and Reps, T. 1993. A categorized bibliography on incremental computation. In Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'93). 502--510.
[41]
Ramalingam, G. and Reps, T. 1996a. An incremental algorithm for a generalization of the shortest-path problem. J. Algor. 21, 2, 267--305.
[42]
Ramalingam, G. and Reps, T. 1996b. On the computational complexity of dynamic graph problems. Theor. Comput. Sci. 158, 1--2, 233--277.
[43]
Ronen, R. and Shmueli, O. 2009. SoQL: A language for querying and creating data in social networks. In Proceedings of the 25th IEEE International Conference on Data Engineering (ICDE'09). 1595--1602.
[44]
Saha, D. 2007. An incremental bisimulation algorithm. In Proceedings of the 27th International Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS'07). 204--215.
[45]
San Martin, M., Gutierrez, C., and Wood, P. T. 2011. SNQL: A social networks query and transformation language. In Proceedings of the 5th Alberto Mendelzon International Workshop on Foundations of Data Management(AMW'11).
[46]
Shasha, D., Wang, J. T. L., and Giugno, R. 2002. Algorithmics and applications of tree and graph searching. In Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'02). 39--52.
[47]
Shukla, S. K., Shukla, E. K., Rosenkrantz, D. J., III, Hunt, H. B., and Stearns, R. E. 1997. The polynomial time decidability of simulation relations for finite state processes: A hornsat based approach. In Proceedings of the DIMACS Conference Series in Discrete Mathematics and Computer Science.
[48]
Smith, C. 2013. By the numbers: 25 amazing facebook stats. http://expandedramblings.com/index.php/by-the-numbers-17-amazing-facebook-stats/
[49]
Stotz, A., Nagi, R., and Sudit, M. 2009. Incremental graph matching for situation awareness. In Proceedings of the 12th International Conference on Information Fusion (FUSION'09).
[50]
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., and Su, Z. 2008. ArnetMiner: Extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'08). 990--998.
[51]
Terveen, L. and McDonald, D. W. 2005. Social matching: A framework and research agenda. ACM Trans. Comput.-Hum. Interact. 12, 3.
[52]
Tong, H., Faloutsos, C., Gallagher, B., and Eliassi-Rad, T. 2007. Fast best-effort pattern matching in large attributed graphs. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'07). 737--746.
[53]
Vazirani, V. V. 2003. Approximation Algorithms. Springer.
[54]
Wang, C. and Chen, L. 2009. Continuous subgraph pattern search over graph streams. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'09). 393--404.
[55]
Wang, H., He, H., Yang, J., Yu, P. S., and Yu, J. X. 2006. Dual labeling: Answering graph reachability queries in constant time. In Proceedings of the 22nd International Conference on Data Engineering (ICDE'06). 75.
[56]
White, S. and Smyth, P. 2003. Algorithms for estimating relative importance in networks. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'03). 266--275.
[57]
Yi, K., He, H., Stanoi, I., and Yang, J. 2004. Incremental maintenance of xml structural indexes. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'04). 491--502.
[58]
YouTube. 2012. Youtube dataset. http://netsg.cs.sfu.ca/youtubedata/
[59]
Zhuge, Y. and Garcia-Molina, H. 1998. Graph structured views and their incremental maintenance. In Proceedings of the 14th International Conference on Data Engineering (ICDE'98). 116--125.
[60]
Zou, L., Chen, L., and Ozsu, M. T. 2009. Distance-join: Pattern match query in a large graph database. In Proceedings of the 35th International Conference on Very Large Data Bases (VLDB'09). 864--875.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 38, Issue 3
August 2013
266 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/2508020
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 September 2013
Accepted: 01 May 2013
Revised: 01 April 2013
Received: 01 December 2012
Published in TODS Volume 38, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Graph pattern matching
  2. graph simulation
  3. incremental pattern matching
  4. subgraph isomorphism

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)81
  • Downloads (Last 6 weeks)12
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Incremental model checking for fuzzy computation tree logicFuzzy Sets and Systems10.1016/j.fss.2024.109195500(109195)Online publication date: Jan-2025
  • (2024)TC-Match: Fast Time-Constrained Continuous Subgraph MatchingProceedings of the VLDB Endowment10.14778/3681954.368196317:11(2791-2804)Online publication date: 30-Aug-2024
  • (2024)Extending Graph Rules with OraclesProceedings of the VLDB Endowment10.14778/3654621.365464117:7(1775-1787)Online publication date: 1-Mar-2024
  • (2024)In-depth Analysis of Continuous Subgraph Matching in a Common Delta Query Compilation FrameworkProceedings of the ACM on Management of Data10.1145/36549502:3(1-27)Online publication date: 30-May-2024
  • (2024)View-based Explanations for Graph Neural NetworksProceedings of the ACM on Management of Data10.1145/36392952:1(1-27)Online publication date: 26-Mar-2024
  • (2024)PG-Triggers: Triggers for Property GraphsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653386(373-385)Online publication date: 9-Jun-2024
  • (2024)GCSM: GPU-Accelerated Continuous Subgraph Matching for Large Graphs2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00097(1046-1057)Online publication date: 27-May-2024
  • (2024)Time-Constrained Continuous Subgraph Matching Using Temporal Information for Filtering and Backtracking2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00252(3257-3269)Online publication date: 13-May-2024
  • (2024)Efficient Multi-Query Oriented Continuous Subgraph Matching2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00250(3230-3243)Online publication date: 13-May-2024
  • (2024)Mining Keys for GraphsData & Knowledge Engineering10.1016/j.datak.2023.102274150(102274)Online publication date: Mar-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media