A graph pattern mining framework for large graphs on GPU

Lin Hu¹^na1,
Yinnian Lin¹^na1,
Lei Zou ORCID: orcid.org/0000-0002-8586-4400¹ &
…
M. Tamer Özsu²

181 Accesses
Explore all metrics

Abstract

Graph pattern mining (GPM) is an important problem in graph processing. There are many parallel frameworks for GPM, many of which suffer from low performance. GPU is a powerful option for accelerating graph processing, but parallel GPM algorithms produce a large number of intermediate results, limiting GPM implementations on GPU. In this paper, we present GAMMA, an out-of-core GPM framework on GPU, that makes full use of host memory to process large graphs. GAMMA adopts a self-adaptive implicit host memory access approach to achieve high bandwidth, which is transparent to users. It provides flexible and effective interfaces for users to build their algorithms. We also propose several optimizations over primitives provided by GAMMA in the out-of-core GPU system, as well as optimizations to perform set intersections since they are widely used in GPM. Experimental results show that GAMMA scales better with graph size over the state-of-the-art approaches—by an order of magnitude—and is also faster than existing GPM systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 21

GPUGraphX: A GPU-Aided Distributed Graph Processing System

Efficient Subgraph Matching Using GPUs

High Performance and Scalable Graph Computation on GPUs

Notes

The canonical label of a graph is a code that uniquely identifies the graph such that two graphs have the same code if and only if they are isomorphic.
Our codes are released on github: https://github.com/pkumod/GAMMA.
We report average performance of five runs.

References

https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html?highlight=bank
https://thrust.github.io
Abdelhamid, E., Abdelaziz, I., Kalnis, P., Khayyat, Z., Jamour, F.: Scalemine: Scalable parallel frequent subgraph mining in a single large graph. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp 716–727 (2016)
Almasri, M., Hajj, I.E., Nagi, R., Xiong, J., Hwu, W.m.: Parallel k-clique counting on gpus. In: Proceedings of the 36th ACM International Conference on Supercomputing (ICS), pp 1–14 (2022)
Aoe, J.I., Morimoto, K., Sato, T.: An efficient implementation of trie structures. Softw. Pract. Exp. 22(9), 695–721 (1992)
Article Google Scholar
Babai, L., Kantor, W.M., Luks, E.M.: Computational complexity and the classification of finite simple groups. In: Proceeding of the Annual Symposium on Foundations of Computer Science (SFCS), pp 162–171 (1983)
Bisson, M., Fatica, M.: High performance exact triangle counting on gpus. IEEE Trans. Parallel Distrib. Syst. 28(12), 3501–3510 (2017)
Article Google Scholar
Chen, H., Liu, M., Zhao, Y., Yan, X., Yan, D., Cheng, J.: G-miner: an efficient task-oriented graph mining system. In: Proceedings of the Thirteenth European Conference on Computer Systems (EuroSys), pp 1–12 (2018)
Chen, X.: Graphminer. https://github.com/chenxuhao/GraphMiner
Chen, X., Csail, M., Csail, A.M.: Efficient and scalable graph pattern mining on GPUs. arXiv:2112.09761 (2021). https://api.semanticscholar.org/CorpusID:245334945
Chen, X., Dathathri, R., Gill, G., Hoang, L., Pingali, K.: Sandslash: a two-level framework for efficient graph pattern mining. In: Proceedings of the ACM International Conference on Supercomputing (ICS), pp 378–391 (2021)
Chen, X., Dathathri, R., Gill, G., Pingali, K.: Pangolin: an efficient and flexible graph mining system on CPU and GPU. Proc. VLDB Endowment (PVLDB) 13(8), 1190–1205 (2020)
Article Google Scholar
Chen, X., Huang, T., Xu, S., Bourgeat, T., Chung, C., Arvind, A.: Flexminer: a pattern-aware accelerator for graph pattern mining. In: Proceeding of the Annual International Symposium on Computer Architecture (ISCA), pp 581–594 (2021)
Chu, W.T., Tsai, M.H.: Visual pattern discovery for architecture image classification and product image search. In: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, pp 1–8 (2012)
Deshpande, M., Kuramochi, M., Wale, N., Karypis, G.: Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans. Knowl. Data Eng.(TKDE) 17(8), 1036–1050 (2005)
Article Google Scholar
Dias, V., Teixeira, C.H., Guedes, D., Meira, W., Parthasarathy, S.: Fractal: A general-purpose graph pattern mining system. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), pp 1357–1374 (2019)
Eberle, W., Graves, J., Holder, L.: Insider threat detection using a graph-based approach. J. Appl. Secur. Res. 6(1), 32–81 (2010)
Article Google Scholar
Elseidy, M., Abdelhamid, E., Skiadopoulos, S., Kalnis, P.: Grami: frequent subgraph and pattern mining in a single large graph. Proc. VLDB Endow.(PVLDB) 7(7), 517–528 (2014)
Article Google Scholar
Gera, P.: Overcoming memory capacity constraints for large graph applications on gpus. Ph.D. thesis, Georgia Institute of Technology (2021)
Gera, P., Kim, H., Sao, P., Kim, H., Bader, D.: Traversing large graphs on gpus with unified memory. Proc. VLDB Endow.(PVLDB) 13(7), 1119–1133 (2020)
Article Google Scholar
Gowanlock, M., Karsin, B.: Sorting large datasets with heterogeneous cpu/gpu architectures. In: Proceeding of the International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp 560–569 (2018)
Guo, W., Li, Y., Sha, M., He, B., Xiao, X., Tan, K.L.: Gpu-accelerated subgraph enumeration on partitioned graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), pp 1067–1082 (2020)
Han, S., Zou, L., Yu, J.X.: Speeding up set intersections in graph algorithms using simd instructions. In: Proceedings of the 2018 International Conference on Management of Data, pp 1587–1602 (2018)
Hu, L., Guan, N., Zou, L.: Triangle counting on gpu using fine-grained task distribution. In: Proceeding of the International Conference on Data Engineering Workshops (ICDEW), pp 225–232 (2019)
Hu, L., Zou, L., Özsu, M.T.: GAMMA: A graph pattern mining framework for large graphs on GPU. In: 39th IEEE International Conference on Data Engineering, ICDE 2023, Anaheim, CA, USA, April 3-7, 2023, pp 273–286. IEEE (2023)
Hu, Y., Liu, H., Huang, H.H.: Tricore: Parallel triangle counting on gpus. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp 171–182 (2018)
Jamshidi, K., Mahadasa, R., Vora, K.: Peregrine: a pattern-aware graph mining system. In: Proceedings of the Fifteenth European Conference on Computer Systems, pp 1–16 (2020)
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)
Article MathSciNet Google Scholar
Kim, M.S., An, K., Park, H., Seo, H., Kim, J.: Gts: A fast and scalable graph processing method based on streaming topology to gpus. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), pp 447–461 (2016)
Lai, L., Qin, L., Lin, X., Zhang, Y., Chang, L., Yang, S.: Scalable distributed subgraph enumeration. Proc. VLDB Endow. (PVLDB) 10(3), 217–228 (2016)
Article Google Scholar
Leskovec, J., Chakrabarti, D., Kleinberg, J.M., Faloutsos, C., Ghahramani, Z.: Kronecker graphs: an approach to modeling networks. J. Mach. Learn. Res. 11, 985–1042 (2010)
MathSciNet Google Scholar
Li, C., Ausavarungnirun, R., Rossbach, C.J., Zhang, Y., Mutlu, O., Guo, Y., Yang, J.: A framework for memory oversubscription management in graphics processing units. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp 49–63 (2019)
Lu, S., He, B., Li, Y., Fu, H.: Accelerating exact constrained shortest paths on gpus. Proc. VLDB Endow. 14(4), 547–559 (2020)
Article Google Scholar
Lü, Y., Guo, H., Huang, L., Yu, Q., Shen, L., Xiao, N., Wang, Z.: (2021) Graphpeg: Accelerating graph processing on gpus. ACM Trans. Archit. Code Optim. 18(3), 30:1–30
Mawhirter, D., Wu, B.: Automine: harmonizing high-level abstraction and high performance for graph mining. In: Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP), pp 509–523 (2019)
Meng, K., Geng, L., Li, X., Tao, Q., Yu, W., Zhou, J.: Efficient multi-gpu graph processing with remote work stealing
Merrill, D., Garland, M., Grimshaw, A.S.: High-performance and scalable GPU graph traversal. ACM Trans. Parallel Comput. 1(2), 14:1–14:30 (2015)
Mhedhbi, A., Kankanamge, C., Salihoglu, S.: Optimizing one-time and continuous subgraph queries using worst-case optimal joins. ACM Trans. Database Syst. (TODS) 46(2), 1–45 (2021)
Article MathSciNet Google Scholar
Min, S.W., Mailthody, V.S., Qureshi, Z., Xiong, J., Ebrahimi, E., Hwu, W.m.: Emogi: Efficient memory-access for out-of-memory graph-traversal in gpus. Proceedings of the VLDB Endowment (PVLDB) 14(2), 114–127 (2020)
Pandey, S., Li, X.S., Buluc, A., Xu, J., Liu, H.: H-index: Hash-indexing for parallel triangle counting on gpus. In: 2019 IEEE high performance extreme computing conference (HPEC), pp 1–7. IEEE (2019)
Pandey, S., Wang, Z., Zhong, S., Tian, C., Zheng, B., Li, X., Li, L., Hoisie, A., Ding, C., Li, D., et al.: Trust: triangle counting reloaded on gpus. IEEE Trans. Parallel Distrib. Syst.(TPDS) 32(11), 2646–2660 (2021)
Article Google Scholar
Park, H., Kim, M.S.: Evograph: An effective and efficient graph upscaling method for preserving graph properties. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), pp 2051–2059 (2018)
Ribeiro-Junior, S., Quirino, R.D., Ribeiro, L.A., Martins, W.S.: Fast parallel set similarity joins on many-core architectures. J. Inf. Data Manag. 8(3), 255–255 (2017)
Google Scholar
Roy, A., Mihailovic, I., Zwaenepoel, W.: X-stream: Edge-centric graph processing using streaming partitions. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP), pp 472–488 (2013)
Sabet, A.H.N., Zhao, Z., Gupta, R.: Subway: Minimizing data transfer during out-of-gpu-memory graph processing. In: Proceedings of the Fifteenth European Conference on Computer Systems (EuroSys), pp 1–16 (2020)
Sato, H., Mizote, R., Matsuoka, S., Ogawa, H.: I/o chunking and latency hiding approach for out-of-core sorting acceleration using gpu and flash nvm. In: Proceeding of the International Conference on Big Data (Big Data), pp 398–403 (2016)
Shi, T., Zhai, J., Wang, H., Chen, Q., Zhai, M., Hao, Z., Yang, H., Chen, W.: Graphset: High performance graph mining through equivalent set transformations. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–14 (2023)
Singh, D.P., Joshi, I., Choudhary, J.: Survey of gpu based sorting algorithms. Int. J. Parallel Program. 46(6), 1017–1034 (2018)
Article Google Scholar
Sun, S., Luo, Q.: Subgraph matching with effective matching order and indexing. IEEE Trans. Knowl. Data Eng. (TKDE) 34(1), 491–505 (2020)
Article Google Scholar
Teixeira, C.H., Fonseca, A.J., Serafini, M., Siganos, G., Zaki, M.J., Aboulnaga, A.: Arabesque: a system for distributed graph mining. In: Proceedings of the 25th Symposium on Operating Systems Principles (SOSP), pp 425–440 (2015)
Wang, K., Zuo, Z., Thorpe, J., Nguyen, T.Q., Xu, G.H.: Rstream: marrying relational algebra with streaming for efficient graph mining on a single machine. In: Proceeding of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI)), pp 763–782 (2018)
Wang, L., Wang, Y., Owens, J.D.: Fast parallel subgraph matching on the gpu. In: Proceeding of the International Symposium on High Performance Distributed Computing (HPDC) (2016)
Wang, Z., Meng, Z., Li, X., Lin, X., Zheng, L., Tian, C., Zhong, S.: Smog: Accelerating subgraph matching on gpus. In: 2023 IEEE High Performance Extreme Computing Conference (HPEC), pp 1–7 (2023). https://doi.org/10.1109/HPEC58863.2023.10363569
Wei, H., Yu, J.X., Lu, C., Lin, X.: Speedup graph processing by graph ordering. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), pp 1813–1828 (2016)
Wei, Y.W., Chen, W.M., Tsai, H.H.: Accelerating the Bron-Kerbosch algorithm for maximal clique enumeration using gpus. IEEE Trans. Parallel Distrib. Syst.(TPDS) 32(9), 2352–2366 (2021)
Article Google Scholar
Wolf, M.M., Deveci, M., Berry, J.W., Hammond, S.D., Rajamanickam, S.: Fast linear algebra-based triangle counting with kokkoskernels. In: 2017 IEEE High Performance Extreme Computing Conference (HPEC), pp 1–7. IEEE (2017)
Wu, L., Liu, H.: Tracing fake-news footprints: Characterizing social media messages by how they propagate. In: Proceedings of the eleventh ACM international conference on Web Search and Data Mining (WSDM), pp 637–645 (2018)
Yaşar, A., Rajamanickam, S., Berry, J.W., Çatalyürek, Ü.V.: A block-based triangle counting algorithm on heterogeneous environments. IEEE Trans. Parallel Distrib. Syst. (TPDS) 33(2), 444–458 (2021)
Article Google Scholar
Zeng, L., Zou, L., Özsu, M.T., Hu, L., Zhang, F.: GSI: GPU-friendly subgraph isomorphism. In: Proceedings of IEEE 36th International Conference on Data Engineering (ICDE), pp 1249–1260 (2020)
Zeng, Z., Wang, J., Zhou, L.: Efficient mining of minimal distinguishing subgraph patterns from graph databases. In: Proceeding of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp 1062–1068 (2008)
Zhang, J., Lu, Y., Spampinato, D.G., Franchetti, F.: FESIA: A fast and simd-efficient set intersection approach on modern cpus. In: 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20-24, 2020, pp 1465–1476. IEEE (2020)
Zhang, Y., Liao, X., Jin, H., He, B., Liu, H., Gu, L.: Digraph: An efficient path-based iterative directed graph processing system on multiple gpus. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp 601–614 (2019)
Zhao, C., Zhang, Z., Xu, P., Zheng, T., Guo, J.: Kaleido: An efficient out-of-core graph mining system on a single machine. In: Proceeding of the 36th International Conference on Data Engineering (ICDE), pp 673–684 (2020)
Zheng, T., Nellans, D., Zulfiqar, A., Stephenson, M., Keckler, S.W.: Towards high performance paged memory for gpus. In: Proceeding of the International Symposium on High Performance Computer Architecture (HPCA), pp 345–357 (2016)

Download references

Acknowledgements

Lei Zou was supported by The National Key Research and Development Program of China under grant 2023YFB4502303 and NSFC under grant 61932001 and U20A20174. Tame Özsu’s research was funded by a Discovery Grant from Natural Sciences and Engineering Research Council (NSERC) of Canada. Lei Zou is the corresponding author of this work.

Author information

Lin Hu and Yinnian Lin have contribulted equally to this work.

Authors and Affiliations

Peking University, Beijing, China
Lin Hu, Yinnian Lin & Lei Zou
University of Waterloo, Waterloo, Canada
M. Tamer Özsu

Authors

Lin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yinnian Lin
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zou
View author publications
You can also search for this author in PubMed Google Scholar
M. Tamer Özsu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Zou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hu, L., Lin, Y., Zou, L. et al. A graph pattern mining framework for large graphs on GPU. The VLDB Journal 34, 6 (2025). https://doi.org/10.1007/s00778-024-00883-8

Download citation

Received: 10 February 2024
Revised: 15 September 2024
Accepted: 21 September 2024
Published: 05 December 2024
DOI: https://doi.org/10.1007/s00778-024-00883-8

A graph pattern mining framework for large graphs on GPU

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

GPUGraphX: A GPU-Aided Distributed Graph Processing System

Efficient Subgraph Matching Using GPUs

High Performance and Scalable Graph Computation on GPUs

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A graph pattern mining framework for large graphs on GPU

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

GPUGraphX: A GPU-Aided Distributed Graph Processing System

Efficient Subgraph Matching Using GPUs

High Performance and Scalable Graph Computation on GPUs

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation