[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3447818.3460359acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article
Open access

Sandslash: a two-level framework for efficient graph pattern mining

Published: 04 June 2021 Publication History

Abstract

Graph pattern mining (GPM) is a key building block in diverse applications, including bioinformatics, chemical engineering, social network analysis, recommender systems and security. Existing GPM frameworks either provide high-level interfaces for productivity at the cost of expressiveness or provide low-level interfaces that can express a wide variety of GPM algorithms at the cost of increased programming complexity. Moreover, existing systems lack the flexibility to explore combinations of optimizations to achieve performance competitive with hand-optimized applications.
We present Sandslash, an in-memory graph pattern mining framework that uses a novel programming interface to support productive, expressive, and efficient GPM on large graphs. Sandslash provides a high-level API that needs only a specification of the GPM problem from the user, and the system implements fast subgraph enumeration, provides efficient data structures, and applies high-level optimizations automatically. To achieve performance competitive with expert-optimized implementations, Sandslash also provides a low-level API that allows users to express algorithm-specific optimizations. This enables Sandslash to support both high-productivity and high-efficiency without losing expressiveness. We evaluate Sandslash on shared-memory machines using five GPM applications and a wide range of large real-world graphs. Experimental results demonstrate that applications written using Sandslash high-level or low-level API outperform those in state-of-the-art GPM systems AutoMine, Pangolin, and Peregrine on average by 13.8x, 7.9x, and 5.4x, respectively. We also show that these Sandslash applications outperform expert-optimized GPM implementations by 2.3x on average with less programming effort.

References

[1]
Ehab Abdelhamid, Ibrahim Abdelaziz, Panos Kalnis, Zuhair Khayyat, and Fuad Jamour. 2016. Scalemine: Scalable Parallel Frequent Subgraph Mining in a Single Large Graph. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Salt Lake City, Utah) (SC '16). IEEE Press, Piscataway, NJ, USA, Article 61, 12 pages. http://dl.acm.org/citation.cfm?id=3014904.3014986
[2]
Christopher R. Aberger, Andrew Lamb, Susan Tu, Andres Nötzli, Kunle Olukotun, and Christopher Ré. 2017. EmptyHeaded: A Relational Engine for Graph Processing. ACM Trans. Database Syst. 42, 4, Article 20 (Oct. 2017), 44 pages. 0362-5915
[3]
Charu C. Aggarwal and Haixun Wang. 2010. Managing and Mining Graph Data. Springer US.
[4]
Nesreen K. Ahmed, Jennifer Neville, Ryan A. Rossi, and Nick Duffield. 2015. Efficient Graphlet Counting for Large Networks. In ICDM. 1--10.
[5]
N. Alon, P. Dao, I. Hajirasouliha, F. Hormozdiari, and SC. Sahinalp. 2008. Biomolecular network motif counting and discovery by color coding. Bioinformatics 24, 13 (2008), 241--249.
[6]
Scott Beamer, Krste Asanovic, and David A. Patterson. 2015. The GAP Benchmark Suite. CoRR abs/1508.03619 (2015). http://arxiv.org/abs/1508.03619
[7]
Austin R. Benson, David F. Gleich, and Jure Leskovec. 2016. Higher-order organization of complex networks. Science 353, 6295 (2016), 163--166. 0036-8075 https://science.sciencemag.org/content/353/6295/163.full.pdf
[8]
Bibek Bhattarai, Hang Liu, and H. Howie Huang. 2019. CECI: Compact Embedding Cluster Index for Scalable Subgraph Matching. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD '19). ACM, New York, NY, USA, 1447--1462.
[9]
Fei Bi, Lijun Chang, Xuemin Lin, Lu Qin, and Wenjie Zhang. 2016. Efficient Subgraph Matching by Postponing Cartesian Products. In Proceedings of the 2016 International Conference on Management of Data (San Francisco, California, USA) (SIGMOD ’16). Association for Computing Machinery, New York, NY, USA, 1199--1214.
[10]
Paolo Boldi, Massimo Santini, and Sebastiano Vigna. 2008. A Large Time-Aware Graph. SIGIR Forum 42, 2 (2008), 33--38.
[11]
Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph Framework I: Compression Techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW 2004). ACM Press, Manhattan, USA, 595--601.
[12]
Hongzhi Chen, Miao Liu, Yunjian Zhao, Xiao Yan, Da Yan, and James Cheng. 2018. G-Miner: An Efficient Task-Oriented Graph Mining System. In Proceedings of the Thirteenth EuroSys Conference (Porto, Portugal) (EuroSys ’18). Association for Computing Machinery, New York, NY, USA, Article 32, 12 pages.
[13]
Xuhao Chen, Roshan Dathathri, Gurbinder Gill, and Keshav Pingali. 2020. Pangolin: An Efficient and Flexible Graph Mining System on CPU and GPU. Proc. VLDB Endow. 13, 8 (Aug. 2020). 2150-8097
[14]
X. Cheng, C. Dale, and J. Liu. [n.d.]. Dataset for statistics and social network of youtube videos. http://netsg.cs.sfu.ca/youtubedata/.
[15]
Norishige Chiba and Takao Nishizeki. 1985. Arboricity and Subgraph Listing Algorithms. SIAM J. Comput. 14, 1 (Feb. 1985), 210--223. 0097-5397
[16]
Young-Rae Cho and Aidong Zhang. 2010. Predicting Protein Function by Frequent Functional Association Pattern Mining in Protein Interaction Networks. Trans. Info. Tech. Biomed. 14, 1 (Jan. 2010), 30--36. 1089-7771
[17]
Maximilien Danisch, Oana Balalau, and Mauro Sozio. 2018. Listing K-cliques in Sparse Real-World Graphs*. In Proceedings of the 2018 World Wide Web Conference (Lyon, France) (WWW '18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 589--598.
[18]
M. Deshpande, M. Kuramochi, N. Wale, and G. Karypis. 2005. Frequent substructure-based approaches for classifying chemical compounds. IEEE Transactions on Knowledge and Data Engineering 17, 8 (Aug 2005), 1036--1050. 1041-4347
[19]
Laxman Dhulipala, Guy E. Blelloch, and Julian Shun. 2018. Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable. In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures (Vienna, Austria) (SPAA ’18). Association for Computing Machinery, New York, NY, USA, 393--404.
[20]
Vinicius Dias, Carlos H. C. Teixeira, Dorgival Guedes, Wagner Meira, and Srinivasan Parthasarathy. 2019. Fractal: A General-Purpose Graph Pattern Mining System. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD '19). ACM, New York, NY, USA, 1357--1374.
[21]
Mohammed Elseidy, Ehab Abdelhamid, Spiros Skiadopoulos, and Panos Kalnis. 2014. GraMi: Frequent Subgraph and Pattern Mining in a Single Large Graph. Proc. VLDB Endow. 7, 7 (March 2014), 517--528. 2150-8097
[22]
Katherine Faust. 2010. A puzzle concerning triads in social networks: Graph constraints and the triad census. Social Networks 32, 3 (2010), 221 -- 233. 0378-8733
[23]
Brian Gallagher. 2006. Matching Structure and Semantics: A Survey on Graph-Based Pattern Matching. In AAAI Fall Symposium: Capturing and Using Patterns for Evidence Detection.
[24]
I. Giechaskiel, G. Panagopoulos, and E. Yoneki. 2015. PDTL: Parallel and Distributed Triangle Listing for Massive Graphs. In 2015 44th International Conference on Parallel Processing. 370--379. 0190-3918
[25]
Joshua A. Grochow and Manolis Kellis. 2007. Network Motif Discovery Using Subgraph Enumeration and Symmetry-Breaking. In Proceedings of the 11th Annual International Conference on Research in Computational Molecular Biology (Oakland, CA, USA) (RECOMB’07). Springer-Verlag, Berlin, Heidelberg, 92--106.
[26]
B. H. Hall, Jaffe A. B., and Trajtenberg M. 2001. The NBER Patent Citation Data File: Lessons, Insights and Methodological Tools. http://www.nber.org/patents/.
[27]
F. Harary. 1969. Graph theory. Addison-Wesley Pub. Co. 69019989 https://books.google.com/books?id=QNxgQZQH868C
[28]
Loc Hoang, Vishwesh Jatala, Xuhao Chen, Udit Agarwal, Roshan Dathathri, Gurbinder Gill, and Keshav Pingali. 2019. DistTC: High Performance Distributed Triangle Counting. In HPEC 2019 23rd IEEE High Performance Extreme Computing, Graph Challenge.
[29]
Y. Hu, H. Liu, and H. H. Huang. 2018. TriCore: Parallel Triangle Counting on GPUs. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. 171--182.
[30]
Shweta Jain and C. Seshadhri. 2020. The Power of Pivoting for Exact Clique Counting. In Proceedings of the 13th International Conference on Web Search and Data Mining (Houston, TX, USA) (WSDM ’20). Association for Computing Machinery, New York, NY, USA, 268--276.
[31]
Kasra Jamshidi, Rakesh Mahadasa, and Keval Vora. 2020. Peregrine: A Pattern-Aware Graph Mining System. In Proceedings of the Fifteenth EuroSys Conference (EuroSys ’20).
[32]
Madhav Jha, C. Seshadhri, and Ali Pinar. 2015. Path Sampling: A Fast and Provable Method for Estimating 4-Vertex Subgraph Counts. In Proceedings of the 24th International Conference on World Wide Web (Florence, Italy) (WWW '15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 495--505.
[33]
Chathura Kankanamge, Siddhartha Sahu, Amine Mhedbhi, Jeremy Chen, and Semih Salihoglu. 2017. Graphflow: An Active Graph Database. In Proceedings of the 2017 ACM International Conference on Management of Data (Chicago, Illinois, USA) (SIGMOD ’17). Association for Computing Machinery, New York, NY, USA, 1695--1698.
[34]
Hyeonji Kim, Juneyoung Lee, Sourav S. Bhowmick, Wook-Shin Han, JeongHoon Lee, Seongyun Ko, and Moath H.A. Jarrah. 2016. DUALSIM: Parallel Subgraph Enumeration in a Massive Graph on a Single Machine. In Proceedings of the 2016 International Conference on Management of Data (San Francisco, California, USA) (SIGMOD '16). ACM, New York, NY, USA, 1231--1245.
[35]
Kyoungmin Kim, In Seo, Wook-Shin Han, Jeong-Hoon Lee, Sungpack Hong, Hassan Chafi, Hyungyu Shin, and Geonhwa Jeong. 2018. TurboFlux: A Fast Continuous Subgraph Matching System for Streaming Graph Data. In Proceedings of the 2018 International Conference on Management of Data (Houston, TX, USA) (SIGMOD '18). ACM, New York, NY, USA, 411--426.
[36]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a Social Network or a News Media?. In Proceedings of the 19th International Conference on World Wide Web (Raleigh, North Carolina, USA) (WWW '10). ACM, New York, NY, USA, 591--600.
[37]
Longbin Lai, Lu Qin, Xuemin Lin, and Lijun Chang. 2015. Scalable Subgraph Enumeration in MapReduce. Proc. VLDB Endow. 8, 10 (June 2015), 974--985. 2150-8097
[38]
J. Leskovec. 2013. SNAP: Stanford Network Analysis Platform. http://snap.stanford.edu/data/index.html
[39]
Shuai Ma, Yang Cao, Jinpeng Huai, and Tianyu Wo. 2012. Distributed Graph Pattern Matching. In Proceedings of the 21st International Conference on World Wide Web (Lyon, France) (WWW '12). ACM, New York, NY, USA, 949--958.
[40]
Daniel Mawhirter, Sam Reinehr, Connor Holmes, Tongping Liu, and Bo Wu. 2019. GraphZero: Breaking Symmetry for Efficient Graph Mining. [arxiv]1911.12877 [cs.PF]
[41]
Daniel Mawhirter and Bo Wu. 2019. AutoMine: Harmonizing High-level Abstraction and High Performance for Graph Mining. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (Huntsville, Ontario, Canada) (SOSP '19). ACM, New York, NY, USA, 509--523.
[42]
Amine Mhedhbi and Semih Salihoglu. 2019. Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins. Proc. VLDB Endow. 12, 11 (July 2019), 1692--1704. 2150-8097
[43]
R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. 2002. Network Motifs: Simple Building Blocks of Complex Networks. Science 298, 5594 (2002), 824--827. 0036-8075 https://science.sciencemag.org/content/298/5594/824.full.pdf
[44]
Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A Lightweight Infrastructure for Graph Analytics. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP) (Farminton, Pennsylvania) (SOSP '13). ACM, New York, NY, USA, 456--471.
[45]
Santosh Pandey, Xiaoye Sherry Li, Aydin Buluc, Jiejun Xu, and Hang Liu. 2019. H-INDEX: Hash-Indexing for Parallel Triangle Counting on GPUs. In 2019 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1--7. 2377-6943
[46]
Roger Pearce, Trevor Steil, Benjamin W Priest, and Geoffrey Sanders. 2019. One Quadrillion Triangles Queried on One Million Processors. In 2019 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1--5.
[47]
Ali Pinar, C. Seshadhri, and Vaidyanathan Vishal. 2017. ESCAPE: Efficiently Counting All 5-Vertex Subgraphs. In Proceedings of the 26th International Conference on World Wide Web (Perth, Australia) (WWW '17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1431--1440.
[48]
Xuguang Ren, Junhu Wang, Wook-Shin Han, and Jeffrey Xu Yu. 2019. Fast and robust distributed subgraph enumeration. Proceedings of the VLDB Endowment 12, 11 (2019), 1344--1356.
[49]
Yingxia Shao, Bin Cui, Lei Chen, Lin Ma, Junjie Yao, and Ning Xu. 2014. Parallel Subgraph Listing in a Large-scale Graph. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (Snowbird, Utah, USA) (SIGMOD '14). ACM, New York, NY, USA, 625--636.
[50]
Tianhui Shi, Mingshu Zhai, Yi Xu, and Jidong Zhai. 2020. GraphPi: High Performance Graph Pattern Matching through Effective Redundancy Elimination. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Atlanta, Georgia) (SC '20). IEEE Press, Article 100, 14 pages.
[51]
J. Shun and K. Tangwongsan. 2015. Multicore triangle computations without tuning. In 2015 IEEE 31st International Conference on Data Engineering. 149--160. 1063-6382
[52]
Shixuan Sun, Yulin Che, Lipeng Wang, and Qiong Luo. 2019. Efficient Parallel Subgraph Enumeration on a Single Machine. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 232--243.
[53]
Shixuan Sun and Qiong Luo. 2019. Scaling Up Subgraph Query Processing with Efficient Subgraph Matching. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 220--231.
[54]
Siddharth Suri and Sergei Vassilvitskii. 2011. Counting Triangles and the Curse of the Last Reducer. In Proceedings of the 20th International Conference on World Wide Web (Hyderabad, India) (WWW '11). ACM, New York, NY, USA, 607--614.
[55]
N. Talukder and M. J. Zaki. 2016. A Distributed Approach for Graph Mining in Massive Networks. Data Min. Knowl. Discov. 30, 5 (Sept. 2016), 1024--1052. 1384-5810
[56]
N. Talukder and M. J. Zaki. 2016. Parallel graph mining with dynamic load balancing. In 2016 IEEE International Conference on Big Data (Big Data). 3352--3359.
[57]
Carlos H. C. Teixeira, Alexandre J. Fonseca, Marco Serafini, Georgos Siganos, Mohammed J. Zaki, and Ashraf Aboulnaga. 2015. Arabesque: A System for Distributed Graph Mining. In Proceedings of the 25th Symposium on Operating Systems Principles (Monterey, California) (SOSP '15). ACM, New York, NY, USA, 425--440.
[58]
J. R. Ullmann. 1976. An Algorithm for Subgraph Isomorphism. J. ACM 23, 1 (Jan. 1976), 31--42. 0004-5411
[59]
Kai Wang, Zhiqiang Zuo, John Thorpe, Tien Quang Nguyen, and Guoqing Harry Xu. 2018. RStream: Marrying Relational Algebra with Streaming for Efficient Graph Mining on a Single Machine. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (Carlsbad, CA, USA) (OSDI'18). USENIX Association, Berkeley, CA, USA, 763--782. http://dl.acm.org/citation.cfm?id=3291168.3291225
[60]
Michael M Wolf, Mehmet Deveci, Jonathan W Berry, Simon D Hammond, and Sivasankaran Rajamanickam. 2017. Fast linear algebra-based triangle counting with kokkoskernels. In 2017 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1--7.
[61]
Xifeng Yan and Jiawei Han. 2002. gSpan: graph-based substructure pattern mining. In Proceedings of the 2002 IEEE International Conference on Data Mining. 721--724.
[62]
Jaewon Yang and Jure Leskovec. 2012. Defining and Evaluating Network Communities based on Ground-truth. CoRR abs/1205.6233 (2012). [arxiv]1205.6233 http://arxiv.org/abs/1205.6233
[63]
Abdurrahman Yaşar, Sivasankaran Rajamanickam, Michael Wolf, Jonathan Berry, and Ümit V Çatalyürek. 2018. Fast triangle counting using cilk. In 2018 IEEE High Performance extreme Computing Conference (HPEC). IEEE, 1--7.
[64]
Cheng Zhao, Zhibin Zhang, Peng Xu, Tianqi Zheng, and Jiafeng Guo. 2020. Kaleido: An Efficient Out-of-core Graph Mining System on A Single Machine. In Proceedings of the 2020 IEEE International Conference on Data Engineering (ICDE 2020) (ICDE '20).

Cited By

View all
  • (2024)PimPam: Efficient Graph Pattern Matching on Real Processing-in-Memory HardwareProceedings of the ACM on Management of Data10.1145/36549642:3(1-25)Online publication date: 30-May-2024
  • (2024)A Comprehensive Survey and Experimental Study of Subgraph Matching: Trends, Unbiasedness, and InteractionProceedings of the ACM on Management of Data10.1145/36393152:1(1-29)Online publication date: 26-Mar-2024
  • (2024)Enumeration of Billions of Maximal Bicliques in Bipartite Graphs without Using GPUsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00106(1-15)Online publication date: 17-Nov-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '21: Proceedings of the 35th ACM International Conference on Supercomputing
June 2021
506 pages
ISBN:9781450383356
DOI:10.1145/3447818
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2021

Check for updates

Author Tags

  1. graph pattern mining
  2. performance optimization
  3. programming framework
  4. search space pruning

Qualifiers

  • Research-article

Funding Sources

Conference

ICS '21
Sponsor:

Acceptance Rates

ICS '21 Paper Acceptance Rate 39 of 157 submissions, 25%;
Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)188
  • Downloads (Last 6 weeks)17
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)PimPam: Efficient Graph Pattern Matching on Real Processing-in-Memory HardwareProceedings of the ACM on Management of Data10.1145/36549642:3(1-25)Online publication date: 30-May-2024
  • (2024)A Comprehensive Survey and Experimental Study of Subgraph Matching: Trends, Unbiasedness, and InteractionProceedings of the ACM on Management of Data10.1145/36393152:1(1-29)Online publication date: 26-Mar-2024
  • (2024)Enumeration of Billions of Maximal Bicliques in Bipartite Graphs without Using GPUsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00106(1-15)Online publication date: 17-Nov-2024
  • (2024)TMiner: A Vertex-Based Task Scheduling Architecture for Graph Pattern Mining2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00096(1295-1308)Online publication date: 2-Nov-2024
  • (2024)Large Subgraph Matching: A Comprehensive and Efficient Approach for Heterogeneous Graphs2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00231(2972-2985)Online publication date: 13-May-2024
  • (2024)A graph pattern mining framework for large graphs on GPUThe VLDB Journal10.1007/s00778-024-00883-834:1Online publication date: 5-Dec-2024
  • (2023)GraphSet: High Performance Graph Mining through Equivalent Set TransformationsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3613213(1-14)Online publication date: 12-Nov-2023
  • (2023)Efficient Maximal Biclique Enumeration on GPUsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607059(1-13)Online publication date: 12-Nov-2023
  • (2023)Shogun: A Task Scheduling Framework for Graph Mining AcceleratorsProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589086(1-15)Online publication date: 17-Jun-2023
  • (2023)Khuzdul: Efficient and Scalable Distributed Graph Pattern Mining EngineProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575743(413-426)Online publication date: 27-Jan-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media