[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Mining graph patterns efficiently via randomized summaries

Published: 01 August 2009 Publication History

Abstract

Graphs are prevalent in many domains such as Bioinformatics, social networks, Web and cyber-security. Graph pattern mining has become an important tool in the management and analysis of complexly structured data, where example applications include indexing, clustering and classification. Existing graph mining algorithms have achieved great success by exploiting various properties in the pattern space. Unfortunately, due to the fundamental role subgraph isomorphism plays in these methods, they may all enter into a pitfall when the cost to enumerate a huge set of isomorphic embeddings blows up, especially in large graphs.
The solution we propose for this problem resorts to reduction on the data space. For each graph, we build a summary of it and mine this shrunk graph instead. Compared to other data reduction techniques that either reduce the number of transactions or compress between transactions, this new framework, called Summarize-Mine, suggests a third path by compressing within transactions. Summarize-Mine is effective in cutting down the size of graphs, thus decreasing the embedding enumeration cost. However, compression might lose patterns at the same time. We address this issue by generating randomized summaries and repeating the process for multiple rounds, where the main idea is that true patterns are unlikely to miss from all rounds. We provide strict probabilistic guarantees on pattern loss likelihood. Experiments on real malware trace data show that Summarize-Mine is very efficient, which can find interesting malware fingerprints that were not revealed previously.

References

[1]
R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB, pages 487--499, 1994.
[2]
D. Chakrabarti and C. Faloutsos. Graph mining: Laws, generators, and algorithms. ACM Computing Survey, 38(1):1--69, 2006.
[3]
C. Chen, X. Yan, F. Zhu, J. Han, and P. S. Yu. Graph OLAP: Towards online analytical processing on graphs. In ICDM, pages 103--112, 2008.
[4]
J. Chen, W. Hsu, M.-L. Lee, and S.-K. Ng. Nemofinder: Dissecting genome-wide protein-protein interactions with meso-scale network motifs. In KDD, pages 106--115, 2006.
[5]
M. Christodorescu, S. Jha, and C. Kruegel. Mining specifications of malicious behavior. In ESEC/SIGSOFT FSE, pages 5--14, 2007.
[6]
M. Deshpande, M. Kuramochi, N. Wale, and G. Karypis. Frequent substructure-based approaches for classifying chemical compounds. IEEE Transactions on Knowledge and Data Engineering, 17(8):1036--1050, 2005.
[7]
M. N. Garofalakis and P. B. Gibbons. Approximate query processing: Taming the terabytes (tutorial). In VLDB, 2001.
[8]
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In SIGMOD Conference, pages 1--12, 2000.
[9]
M. A. Hasan, V. Chaoji, S. Salem, J. Besson, and M. J. Zaki. Origami: Mining representative orthogonal graph patterns. In ICDM, pages 153--162, 2007.
[10]
H. He and A. K. Singh. Efficient algorithms for mining significant substructures in graphs with quality guarantees. In ICDM, pages 163--172, 2007.
[11]
L. B. Holder, D. J. Cook, and S. Djoko. Substucture discovery in the subdue system. In KDD Workshop, pages 169--180, 1994.
[12]
J. Huan, W. Wang, J. Prins, and J. Yang. Spin: Mining maximal frequent subgraphs from graph databases. In KDD, pages 581--586, 2004.
[13]
A. Inokuchi, T. Washio, and H. Motoda. Complete mining of frequent patterns from graphs: Mining graph data. Machine Learning, 50(3):321--354, 2003.
[14]
S. Kramer, L. D. Raedt, and C. Helma. Molecular feature mining in hiv data. In KDD, pages 136--143, 2001.
[15]
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM, pages 313--320, 2001.
[16]
M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph. Data Mining and Knowledge Discovery, 11(3):243--271, 2005.
[17]
A. Lachmann and M. Riedewald. Finding relevant patterns in bursty sequences. PVLDB, 1(1):78--89, 2008.
[18]
J. Leskovec, J. M. Kleinberg, and C. Faloutsos. Graphs over time: Densification laws, shrinking diameters and possible explanations. In KDD, pages 177--187, 2005.
[19]
S. Navlakha, R. Rastogi, and N. Shrivastava. Graph summarization with bounded error. In SIGMOD Conference, pages 419--432, 2008.
[20]
J. Pei, D. Jiang, and A. Zhang. On mining cross-graph quasi-cliques. In KDD, pages 228--238, 2005.
[21]
N. Polyzotis and M. N. Garofalakis. Xsketch synopses for xml data graphs. ACM Transactions on Database Systems, 31(3):1014--1063, 2006.
[22]
S. Raghavan and H. Garcia-Molina. Representing web graphs. In ICDE, pages 405--416, 2003.
[23]
S. Reinhardt and G. Karypis. A multi-level parallel implementation of a program for finding frequent patterns in a large sparse graph. In IPDPS, pages 1--8, 2007.
[24]
T. Sarlós, A. A. Benczúr, K. Csalogány, D. Fogaras, and B. Rácz. To randomize or not to randomize: Space optimal summaries for hyperlink analysis. In WWW, pages 297--306, 2006.
[25]
Y. Tian, R. A. Hankins, and J. M. Patel. Efficient aggregation for graph summarization. In SIGMOD Conference, pages 567--580, 2008.
[26]
H. Toivonen. Sampling large databases for association rules. In VLDB, pages 134--145, 1996.
[27]
X. Yan, H. Cheng, J. Han, and P. S. Yu. Mining significant graph patterns by leap search. In SIGMOD Conference, pages 433--444, 2008.
[28]
X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In ICDM, pages 721--724, 2002.
[29]
X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In SIGMOD Conference, pages 335--346, 2004.
[30]
N. Zhang, V. Kacholia, and M. T. Özsu. A succinct physical storage scheme for efficient evaluation of path queries in xml. In ICDE, pages 54--65, 2004.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 2, Issue 1
August 2009
1293 pages

Publisher

VLDB Endowment

Publication History

Published: 01 August 2009
Published in PVLDB Volume 2, Issue 1

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Optimizing subgraph retrieval and matching with an efficient indexing schemeKnowledge and Information Systems10.1007/s10115-024-02175-766:11(6815-6843)Online publication date: 1-Nov-2024
  • (2023)Semantically Constitutive Entities in Knowledge GraphsDatabase and Expert Systems Applications10.1007/978-3-031-39847-6_36(445-461)Online publication date: 28-Aug-2023
  • (2022)Multi-relation Graph SummarizationACM Transactions on Knowledge Discovery from Data10.1145/349456116:5(1-30)Online publication date: 9-Mar-2022
  • (2021)Distributed aggregation-based attributed graph summarization for summary-based approximate attributed graph queriesExpert Systems with Applications: An International Journal10.1016/j.eswa.2021.114921176:COnline publication date: 15-Aug-2021
  • (2021)GS4: Graph stream summarization based on both the structure and semanticsThe Journal of Supercomputing10.1007/s11227-020-03290-277:3(2713-2733)Online publication date: 1-Mar-2021
  • (2021)Mining explainable local and global subgraph patterns with surprising densitiesData Mining and Knowledge Discovery10.1007/s10618-020-00721-935:1(321-371)Online publication date: 1-Jan-2021
  • (2020)Mining Top-k pairs of correlated subgraphs in a large networkProceedings of the VLDB Endowment10.14778/3397230.339724513:9(1511-1524)Online publication date: 26-Jun-2020
  • (2019)SWeG: Lossless and Lossy Summarization of Web-Scale GraphsThe World Wide Web Conference10.1145/3308558.3313402(1679-1690)Online publication date: 13-May-2019
  • (2019)Leveraging Compression-Based Graph Mining for Behavior-Based Malware DetectionIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2017.267588116:1(99-112)Online publication date: 1-Jan-2019
  • (2019)Summarizing semantic graphsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-018-0528-328:3(295-327)Online publication date: 1-Jun-2019
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media