Abstract
Despite having several distributed graph processing frameworks, scalable iterative processing of large graphs is a challenging problem since the graph and intermediate data need a global view of the graph topology in distributed memory. Although some systems support out-of-core iterative computations, they use a single machine and often require fast storage. In this paper, we present a new distributed iterative graph computation framework, called GraphMap, that utilizes a disk-based NoSQL database system for scalable graph processing while ensuring competitive performance. Extensive experiments on several real-world graphs show that GraphMap is more scalable and often faster than existing distributed memory-based systems for various graph processing workloads.
Similar content being viewed by others
References
Apache Giraph. http://giraph.apache.org/. Accessed 12 Mar 2019
Apache Hama. https://hama.apache.org/. Accessed 12 Mar 2019
Backstrom L, Huttenlocher D, Kleinberg J, Lan X (2006) Group formation in large social networks: membership, growth, and evolution. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’06. ACM, New York, NY, USA, pp 44–54. https://doi.org/10.1145/1150402.1150412
Boldi P, Vigna S (2004) The WebGraph framework I: compression techniques. In: Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004). ACM Press, Manhattan, USA, pp 595–601
Bu Y, Borkar V, Jia J, Carey MJ, Condie T (2014) Pregelix: Big(Ger) graph analytics on a dataflow engine. Proc VLDB Endow 8(2):161–172
Facebook Reports Third Quarter 2019 Results. https://investor.fb.com/investor-news/press-release-details/2019/Facebook-Reports-Third-Quarter-2019-Results/default.aspx. Accessed 12 Mar 2019
Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) PowerGraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). USENIX Association, Berkeley, CA, USA, pp 17–30. http://dl.acm.org/citation.cfm?id=2387880.2387883. Accessed 12 Mar 2019
Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) GraphX: graph processing in a distributed dataflow framework. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14). USENIX Association, Berkeley, CA, USA, pp 599–613. http://dl.acm.org/citation.cfm?id=2685048.2685096. Accessed 12 Mar 2019
Goswami S, Das AK, Płatania R, Lee K, Park SJ (2016) Lazer: Distributed memory-efficient assembly of large-scale genomes. In: 2016 IEEE International Conference on Big Data (Big Data), pp 1171–1181. https://doi.org/10.1109/BigData.2016.7840721
Han WS, Lee S, Park K, Lee JH, Kim MS, Kim J, Yu H (2013) TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’13). ACM, New York, NY, USA, pp 77–85. https://doi.org/10.1145/2487575.2487581
Hundt R (2011) Loop recognition in C++/Java/go/scala. Proc Scala Days 2011:38
Jia Z, Kwon Y, Shipman G, McCormick P, Erez M, Aiken A (2017) A distributed multi-GPU system for fast graph processing. Proc VLDB Endow 11(3):297–310
Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392. https://doi.org/10.1137/S1064827595287997
Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, WWW’10. ACM, New York, NY, USA, pp 591–600. https://doi.org/10.1145/1772690.1772751
Kyrola A, Blelloch G, Guestrin C (2012) GraphChi: large-scale graph computation on just a PC. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI’12. USENIX Association, Berkeley, CA, USA, pp 31–46. http://dl.acm.org/citation.cfm?id=2387880.2387884
Lee K, Liu L (2013) Scaling queries over Big RDF graphs with semantic hash partitioning. Proc VLDB Endow 6(14):1894–1905
Lee K, Liu L, Schwan K, Pu C, Zhang Q, Zhou Y, Yigitoglu E, Yuan P (2015) Scaling iterative graph computations with GraphMap. In: SC15: International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–12. https://doi.org/10.1145/2807591.2807604
Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD’05). ACM, New York, NY, USA, pp 177–187. https://doi.org/10.1145/1081870.1081893
Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein JM (2012) Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc VLDB Endow 5(8):716–727
Maass S, Min C, Kashyap S, Kang W, Kumar M, Kim T (2017) Mosaic: processing a trillion-edge graph on a single machine. In: Proceedings of the Twelfth European Conference on Computer Systems, EuroSys’17. ACM, New York, NY, USA, pp 527–543. https://doi.org/10.1145/3064176.3064191
Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD’10). ACM, New York, NY, USA, pp 135–146. https://doi.org/10.1145/1807167.1807184
Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the 5th ACM/Usenix Internet Measurement Conference (IMC’07), San Diego, CA
Pan Y, Wang Y, Wu Y, Yang C, Owens JD (2017) Multi-GPU graph analytics. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, pp 479–490
Roy A, Bindschaedler L, Malicevic J, Zwaenepoel W (2015) Chaos: scale-out graph processing from secondary storage. In: Proceedings of the 25th Symposium on Operating Systems Principles. ACM, pp 410–424
Roy A, Mihailovic I, Zwaenepoel W (2013) X-Stream: edge-centric graph processing using streaming partitions. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, USA, pp 472–488. https://doi.org/10.1145/2517349.2522740
Shao B, Wang H, Li Y (2013) Trinity: a distributed graph engine on a memory cloud. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD’13). ACM, New York, NY, USA, pp 505–516. https://doi.org/10.1145/2463676.2467799
Tian Y, Balmin A, Corsten SA, Tatikonda S, McPherson J (2013) From “think like a vertex” to “think like a graph. Proc VLDB Endow 7(3):193–204
Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33(8):103–111. https://doi.org/10.1145/79173.79181
White B, Lepreau J, Stoller L, Ricci R, Guruprasad S, Newbold M, Hibler M, Barb C, Joglekar A (2002) An integrated experimental environment for distributed systems and networks. In: Proceedings of the 5th Symposium on Operating Systems Design and implementation (OSDI’02). ACM, New York, NY, USA, pp 255–270. https://doi.org/10.1145/1060289.1060313
Yan D, Huang Y, Liu M, Chen H, Cheng J, Wu H, Zhang C (2018) Graphd: distributed vertex-centric graph processing beyond the memory limit. IEEE Trans Parallel Distrib Syst 29(1):99–114
Yuan P, Zhang W, Xie C, Jin H, Liu L, Lee K (2014) Fast iterative graph computation: a path centric approach. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14). IEEE Press, Piscataway, NJ, USA, pp 401–412. https://doi.org/10.1109/SC.2014.38
Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18(5):821–829
Zhang Y, Liao X, Jin H, He B, Liu H, Gu L (2019) Digraph: an efficient path-based iterative directed graph processing system on multiple GPUs. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, pp 601–614
Zheng D, Mhembere D, Burns R, Vogelstein J, Priebe CE, Szalay AS (2015) FlashGraph: processing billion-node graphs on an array of commodity SSDs. In: 13th USENIX Conference on File and Storage Technologies (FAST 15). USENIX Association, Santa Clara, CA, pp 45–58. https://www.usenix.org/conference/fast15/technical-sessions/presentation/zheng. Accessed 12 Mar 2019
Zhou Y, Liu L, Lee K, Zhang Q (2015) Graphtwist: fast iterative graph computation with two-tier optimizations. Proc VLDB Endow 8(11):1262–1273. https://doi.org/10.14778/2809974.2809987
Acknowledgements
Funding was provided by Louisiana Board of Regents (Grant No. LEQSF(2016-19)-RD-A-08) and National Science Foundation (Grant No. IBSS-L-1620451 and RAPID-1762600).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Goswami, S., Pokhrel, A., Lee, K. et al. GraphMap: scalable iterative graph processing using NoSQL. J Supercomput 76, 6619–6647 (2020). https://doi.org/10.1007/s11227-019-03097-w
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-019-03097-w