Abstract
Graph pattern mining (GPM) is an important problem in graph processing. There are many parallel frameworks for GPM, many of which suffer from low performance. GPU is a powerful option for accelerating graph processing, but parallel GPM algorithms produce a large number of intermediate results, limiting GPM implementations on GPU. In this paper, we present GAMMA, an out-of-core GPM framework on GPU, that makes full use of host memory to process large graphs. GAMMA adopts a self-adaptive implicit host memory access approach to achieve high bandwidth, which is transparent to users. It provides flexible and effective interfaces for users to build their algorithms. We also propose several optimizations over primitives provided by GAMMA in the out-of-core GPU system, as well as optimizations to perform set intersections since they are widely used in GPM. Experimental results show that GAMMA scales better with graph size over the state-of-the-art approaches—by an order of magnitude—and is also faster than existing GPM systems.
Similar content being viewed by others
Notes
The canonical label of a graph is a code that uniquely identifies the graph such that two graphs have the same code if and only if they are isomorphic.
Our codes are released on github: https://github.com/pkumod/GAMMA.
We report average performance of five runs.
References
https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html?highlight=bank
Abdelhamid, E., Abdelaziz, I., Kalnis, P., Khayyat, Z., Jamour, F.: Scalemine: Scalable parallel frequent subgraph mining in a single large graph. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp 716–727 (2016)
Almasri, M., Hajj, I.E., Nagi, R., Xiong, J., Hwu, W.m.: Parallel k-clique counting on gpus. In: Proceedings of the 36th ACM International Conference on Supercomputing (ICS), pp 1–14 (2022)
Aoe, J.I., Morimoto, K., Sato, T.: An efficient implementation of trie structures. Softw. Pract. Exp. 22(9), 695–721 (1992)
Babai, L., Kantor, W.M., Luks, E.M.: Computational complexity and the classification of finite simple groups. In: Proceeding of the Annual Symposium on Foundations of Computer Science (SFCS), pp 162–171 (1983)
Bisson, M., Fatica, M.: High performance exact triangle counting on gpus. IEEE Trans. Parallel Distrib. Syst. 28(12), 3501–3510 (2017)
Chen, H., Liu, M., Zhao, Y., Yan, X., Yan, D., Cheng, J.: G-miner: an efficient task-oriented graph mining system. In: Proceedings of the Thirteenth European Conference on Computer Systems (EuroSys), pp 1–12 (2018)
Chen, X.: Graphminer. https://github.com/chenxuhao/GraphMiner
Chen, X., Csail, M., Csail, A.M.: Efficient and scalable graph pattern mining on GPUs. arXiv:2112.09761 (2021). https://api.semanticscholar.org/CorpusID:245334945
Chen, X., Dathathri, R., Gill, G., Hoang, L., Pingali, K.: Sandslash: a two-level framework for efficient graph pattern mining. In: Proceedings of the ACM International Conference on Supercomputing (ICS), pp 378–391 (2021)
Chen, X., Dathathri, R., Gill, G., Pingali, K.: Pangolin: an efficient and flexible graph mining system on CPU and GPU. Proc. VLDB Endowment (PVLDB) 13(8), 1190–1205 (2020)
Chen, X., Huang, T., Xu, S., Bourgeat, T., Chung, C., Arvind, A.: Flexminer: a pattern-aware accelerator for graph pattern mining. In: Proceeding of the Annual International Symposium on Computer Architecture (ISCA), pp 581–594 (2021)
Chu, W.T., Tsai, M.H.: Visual pattern discovery for architecture image classification and product image search. In: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, pp 1–8 (2012)
Deshpande, M., Kuramochi, M., Wale, N., Karypis, G.: Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans. Knowl. Data Eng.(TKDE) 17(8), 1036–1050 (2005)
Dias, V., Teixeira, C.H., Guedes, D., Meira, W., Parthasarathy, S.: Fractal: A general-purpose graph pattern mining system. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), pp 1357–1374 (2019)
Eberle, W., Graves, J., Holder, L.: Insider threat detection using a graph-based approach. J. Appl. Secur. Res. 6(1), 32–81 (2010)
Elseidy, M., Abdelhamid, E., Skiadopoulos, S., Kalnis, P.: Grami: frequent subgraph and pattern mining in a single large graph. Proc. VLDB Endow.(PVLDB) 7(7), 517–528 (2014)
Gera, P.: Overcoming memory capacity constraints for large graph applications on gpus. Ph.D. thesis, Georgia Institute of Technology (2021)
Gera, P., Kim, H., Sao, P., Kim, H., Bader, D.: Traversing large graphs on gpus with unified memory. Proc. VLDB Endow.(PVLDB) 13(7), 1119–1133 (2020)
Gowanlock, M., Karsin, B.: Sorting large datasets with heterogeneous cpu/gpu architectures. In: Proceeding of the International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp 560–569 (2018)
Guo, W., Li, Y., Sha, M., He, B., Xiao, X., Tan, K.L.: Gpu-accelerated subgraph enumeration on partitioned graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), pp 1067–1082 (2020)
Han, S., Zou, L., Yu, J.X.: Speeding up set intersections in graph algorithms using simd instructions. In: Proceedings of the 2018 International Conference on Management of Data, pp 1587–1602 (2018)
Hu, L., Guan, N., Zou, L.: Triangle counting on gpu using fine-grained task distribution. In: Proceeding of the International Conference on Data Engineering Workshops (ICDEW), pp 225–232 (2019)
Hu, L., Zou, L., Özsu, M.T.: GAMMA: A graph pattern mining framework for large graphs on GPU. In: 39th IEEE International Conference on Data Engineering, ICDE 2023, Anaheim, CA, USA, April 3-7, 2023, pp 273–286. IEEE (2023)
Hu, Y., Liu, H., Huang, H.H.: Tricore: Parallel triangle counting on gpus. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp 171–182 (2018)
Jamshidi, K., Mahadasa, R., Vora, K.: Peregrine: a pattern-aware graph mining system. In: Proceedings of the Fifteenth European Conference on Computer Systems, pp 1–16 (2020)
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)
Kim, M.S., An, K., Park, H., Seo, H., Kim, J.: Gts: A fast and scalable graph processing method based on streaming topology to gpus. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), pp 447–461 (2016)
Lai, L., Qin, L., Lin, X., Zhang, Y., Chang, L., Yang, S.: Scalable distributed subgraph enumeration. Proc. VLDB Endow. (PVLDB) 10(3), 217–228 (2016)
Leskovec, J., Chakrabarti, D., Kleinberg, J.M., Faloutsos, C., Ghahramani, Z.: Kronecker graphs: an approach to modeling networks. J. Mach. Learn. Res. 11, 985–1042 (2010)
Li, C., Ausavarungnirun, R., Rossbach, C.J., Zhang, Y., Mutlu, O., Guo, Y., Yang, J.: A framework for memory oversubscription management in graphics processing units. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp 49–63 (2019)
Lu, S., He, B., Li, Y., Fu, H.: Accelerating exact constrained shortest paths on gpus. Proc. VLDB Endow. 14(4), 547–559 (2020)
Lü, Y., Guo, H., Huang, L., Yu, Q., Shen, L., Xiao, N., Wang, Z.: (2021) Graphpeg: Accelerating graph processing on gpus. ACM Trans. Archit. Code Optim. 18(3), 30:1–30
Mawhirter, D., Wu, B.: Automine: harmonizing high-level abstraction and high performance for graph mining. In: Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP), pp 509–523 (2019)
Meng, K., Geng, L., Li, X., Tao, Q., Yu, W., Zhou, J.: Efficient multi-gpu graph processing with remote work stealing
Merrill, D., Garland, M., Grimshaw, A.S.: High-performance and scalable GPU graph traversal. ACM Trans. Parallel Comput. 1(2), 14:1–14:30 (2015)
Mhedhbi, A., Kankanamge, C., Salihoglu, S.: Optimizing one-time and continuous subgraph queries using worst-case optimal joins. ACM Trans. Database Syst. (TODS) 46(2), 1–45 (2021)
Min, S.W., Mailthody, V.S., Qureshi, Z., Xiong, J., Ebrahimi, E., Hwu, W.m.: Emogi: Efficient memory-access for out-of-memory graph-traversal in gpus. Proceedings of the VLDB Endowment (PVLDB) 14(2), 114–127 (2020)
Pandey, S., Li, X.S., Buluc, A., Xu, J., Liu, H.: H-index: Hash-indexing for parallel triangle counting on gpus. In: 2019 IEEE high performance extreme computing conference (HPEC), pp 1–7. IEEE (2019)
Pandey, S., Wang, Z., Zhong, S., Tian, C., Zheng, B., Li, X., Li, L., Hoisie, A., Ding, C., Li, D., et al.: Trust: triangle counting reloaded on gpus. IEEE Trans. Parallel Distrib. Syst.(TPDS) 32(11), 2646–2660 (2021)
Park, H., Kim, M.S.: Evograph: An effective and efficient graph upscaling method for preserving graph properties. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), pp 2051–2059 (2018)
Ribeiro-Junior, S., Quirino, R.D., Ribeiro, L.A., Martins, W.S.: Fast parallel set similarity joins on many-core architectures. J. Inf. Data Manag. 8(3), 255–255 (2017)
Roy, A., Mihailovic, I., Zwaenepoel, W.: X-stream: Edge-centric graph processing using streaming partitions. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP), pp 472–488 (2013)
Sabet, A.H.N., Zhao, Z., Gupta, R.: Subway: Minimizing data transfer during out-of-gpu-memory graph processing. In: Proceedings of the Fifteenth European Conference on Computer Systems (EuroSys), pp 1–16 (2020)
Sato, H., Mizote, R., Matsuoka, S., Ogawa, H.: I/o chunking and latency hiding approach for out-of-core sorting acceleration using gpu and flash nvm. In: Proceeding of the International Conference on Big Data (Big Data), pp 398–403 (2016)
Shi, T., Zhai, J., Wang, H., Chen, Q., Zhai, M., Hao, Z., Yang, H., Chen, W.: Graphset: High performance graph mining through equivalent set transformations. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–14 (2023)
Singh, D.P., Joshi, I., Choudhary, J.: Survey of gpu based sorting algorithms. Int. J. Parallel Program. 46(6), 1017–1034 (2018)
Sun, S., Luo, Q.: Subgraph matching with effective matching order and indexing. IEEE Trans. Knowl. Data Eng. (TKDE) 34(1), 491–505 (2020)
Teixeira, C.H., Fonseca, A.J., Serafini, M., Siganos, G., Zaki, M.J., Aboulnaga, A.: Arabesque: a system for distributed graph mining. In: Proceedings of the 25th Symposium on Operating Systems Principles (SOSP), pp 425–440 (2015)
Wang, K., Zuo, Z., Thorpe, J., Nguyen, T.Q., Xu, G.H.: Rstream: marrying relational algebra with streaming for efficient graph mining on a single machine. In: Proceeding of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI)), pp 763–782 (2018)
Wang, L., Wang, Y., Owens, J.D.: Fast parallel subgraph matching on the gpu. In: Proceeding of the International Symposium on High Performance Distributed Computing (HPDC) (2016)
Wang, Z., Meng, Z., Li, X., Lin, X., Zheng, L., Tian, C., Zhong, S.: Smog: Accelerating subgraph matching on gpus. In: 2023 IEEE High Performance Extreme Computing Conference (HPEC), pp 1–7 (2023). https://doi.org/10.1109/HPEC58863.2023.10363569
Wei, H., Yu, J.X., Lu, C., Lin, X.: Speedup graph processing by graph ordering. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), pp 1813–1828 (2016)
Wei, Y.W., Chen, W.M., Tsai, H.H.: Accelerating the Bron-Kerbosch algorithm for maximal clique enumeration using gpus. IEEE Trans. Parallel Distrib. Syst.(TPDS) 32(9), 2352–2366 (2021)
Wolf, M.M., Deveci, M., Berry, J.W., Hammond, S.D., Rajamanickam, S.: Fast linear algebra-based triangle counting with kokkoskernels. In: 2017 IEEE High Performance Extreme Computing Conference (HPEC), pp 1–7. IEEE (2017)
Wu, L., Liu, H.: Tracing fake-news footprints: Characterizing social media messages by how they propagate. In: Proceedings of the eleventh ACM international conference on Web Search and Data Mining (WSDM), pp 637–645 (2018)
Yaşar, A., Rajamanickam, S., Berry, J.W., Çatalyürek, Ü.V.: A block-based triangle counting algorithm on heterogeneous environments. IEEE Trans. Parallel Distrib. Syst. (TPDS) 33(2), 444–458 (2021)
Zeng, L., Zou, L., Özsu, M.T., Hu, L., Zhang, F.: GSI: GPU-friendly subgraph isomorphism. In: Proceedings of IEEE 36th International Conference on Data Engineering (ICDE), pp 1249–1260 (2020)
Zeng, Z., Wang, J., Zhou, L.: Efficient mining of minimal distinguishing subgraph patterns from graph databases. In: Proceeding of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp 1062–1068 (2008)
Zhang, J., Lu, Y., Spampinato, D.G., Franchetti, F.: FESIA: A fast and simd-efficient set intersection approach on modern cpus. In: 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20-24, 2020, pp 1465–1476. IEEE (2020)
Zhang, Y., Liao, X., Jin, H., He, B., Liu, H., Gu, L.: Digraph: An efficient path-based iterative directed graph processing system on multiple gpus. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp 601–614 (2019)
Zhao, C., Zhang, Z., Xu, P., Zheng, T., Guo, J.: Kaleido: An efficient out-of-core graph mining system on a single machine. In: Proceeding of the 36th International Conference on Data Engineering (ICDE), pp 673–684 (2020)
Zheng, T., Nellans, D., Zulfiqar, A., Stephenson, M., Keckler, S.W.: Towards high performance paged memory for gpus. In: Proceeding of the International Symposium on High Performance Computer Architecture (HPCA), pp 345–357 (2016)
Acknowledgements
Lei Zou was supported by The National Key Research and Development Program of China under grant 2023YFB4502303 and NSFC under grant 61932001 and U20A20174. Tame Özsu’s research was funded by a Discovery Grant from Natural Sciences and Engineering Research Council (NSERC) of Canada. Lei Zou is the corresponding author of this work.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hu, L., Lin, Y., Zou, L. et al. A graph pattern mining framework for large graphs on GPU. The VLDB Journal 34, 6 (2025). https://doi.org/10.1007/s00778-024-00883-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00778-024-00883-8