GraphMap: scalable iterative graph processing using NoSQL

Sayan Goswami ORCID: orcid.org/0000-0002-7519-524X¹,
Ayam Pokhrel²,
Kisung Lee²,
Ling Liu³,
Qi Zhang⁴ &
…
Yang Zhou⁵

442 Accesses
13 Citations
Explore all metrics

Abstract

Despite having several distributed graph processing frameworks, scalable iterative processing of large graphs is a challenging problem since the graph and intermediate data need a global view of the graph topology in distributed memory. Although some systems support out-of-core iterative computations, they use a single machine and often require fast storage. In this paper, we present a new distributed iterative graph computation framework, called GraphMap, that utilizes a disk-based NoSQL database system for scalable graph processing while ensuring competitive performance. Extensive experiments on several real-world graphs show that GraphMap is more scalable and often faster than existing distributed memory-based systems for various graph processing workloads.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

FOG: A Fast Out-of-Core Graph Processing Framework

Article 01 November 2016

GoFFish: A Sub-graph Centric Framework for Large-Scale Graph Analytics

GraphIn: An Online High Performance Incremental Graph Processing Framework

References

Apache Giraph. http://giraph.apache.org/. Accessed 12 Mar 2019
Apache Hama. https://hama.apache.org/. Accessed 12 Mar 2019
Backstrom L, Huttenlocher D, Kleinberg J, Lan X (2006) Group formation in large social networks: membership, growth, and evolution. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’06. ACM, New York, NY, USA, pp 44–54. https://doi.org/10.1145/1150402.1150412
Boldi P, Vigna S (2004) The WebGraph framework I: compression techniques. In: Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004). ACM Press, Manhattan, USA, pp 595–601
Bu Y, Borkar V, Jia J, Carey MJ, Condie T (2014) Pregelix: Big(Ger) graph analytics on a dataflow engine. Proc VLDB Endow 8(2):161–172
Article Google Scholar
Facebook Reports Third Quarter 2019 Results. https://investor.fb.com/investor-news/press-release-details/2019/Facebook-Reports-Third-Quarter-2019-Results/default.aspx. Accessed 12 Mar 2019
Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) PowerGraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). USENIX Association, Berkeley, CA, USA, pp 17–30. http://dl.acm.org/citation.cfm?id=2387880.2387883. Accessed 12 Mar 2019
Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) GraphX: graph processing in a distributed dataflow framework. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14). USENIX Association, Berkeley, CA, USA, pp 599–613. http://dl.acm.org/citation.cfm?id=2685048.2685096. Accessed 12 Mar 2019
Goswami S, Das AK, Płatania R, Lee K, Park SJ (2016) Lazer: Distributed memory-efficient assembly of large-scale genomes. In: 2016 IEEE International Conference on Big Data (Big Data), pp 1171–1181. https://doi.org/10.1109/BigData.2016.7840721
Han WS, Lee S, Park K, Lee JH, Kim MS, Kim J, Yu H (2013) TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’13). ACM, New York, NY, USA, pp 77–85. https://doi.org/10.1145/2487575.2487581
Hundt R (2011) Loop recognition in C++/Java/go/scala. Proc Scala Days 2011:38
Google Scholar
Jia Z, Kwon Y, Shipman G, McCormick P, Erez M, Aiken A (2017) A distributed multi-GPU system for fast graph processing. Proc VLDB Endow 11(3):297–310
Article Google Scholar
Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392. https://doi.org/10.1137/S1064827595287997
Article MathSciNet MATH Google Scholar
Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, WWW’10. ACM, New York, NY, USA, pp 591–600. https://doi.org/10.1145/1772690.1772751
Kyrola A, Blelloch G, Guestrin C (2012) GraphChi: large-scale graph computation on just a PC. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI’12. USENIX Association, Berkeley, CA, USA, pp 31–46. http://dl.acm.org/citation.cfm?id=2387880.2387884
Lee K, Liu L (2013) Scaling queries over Big RDF graphs with semantic hash partitioning. Proc VLDB Endow 6(14):1894–1905
Article Google Scholar
Lee K, Liu L, Schwan K, Pu C, Zhang Q, Zhou Y, Yigitoglu E, Yuan P (2015) Scaling iterative graph computations with GraphMap. In: SC15: International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–12. https://doi.org/10.1145/2807591.2807604
Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD’05). ACM, New York, NY, USA, pp 177–187. https://doi.org/10.1145/1081870.1081893
Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein JM (2012) Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc VLDB Endow 5(8):716–727
Article Google Scholar
Maass S, Min C, Kashyap S, Kang W, Kumar M, Kim T (2017) Mosaic: processing a trillion-edge graph on a single machine. In: Proceedings of the Twelfth European Conference on Computer Systems, EuroSys’17. ACM, New York, NY, USA, pp 527–543. https://doi.org/10.1145/3064176.3064191
Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD’10). ACM, New York, NY, USA, pp 135–146. https://doi.org/10.1145/1807167.1807184
Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the 5th ACM/Usenix Internet Measurement Conference (IMC’07), San Diego, CA
Pan Y, Wang Y, Wu Y, Yang C, Owens JD (2017) Multi-GPU graph analytics. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, pp 479–490
Roy A, Bindschaedler L, Malicevic J, Zwaenepoel W (2015) Chaos: scale-out graph processing from secondary storage. In: Proceedings of the 25th Symposium on Operating Systems Principles. ACM, pp 410–424
Roy A, Mihailovic I, Zwaenepoel W (2013) X-Stream: edge-centric graph processing using streaming partitions. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, USA, pp 472–488. https://doi.org/10.1145/2517349.2522740
Shao B, Wang H, Li Y (2013) Trinity: a distributed graph engine on a memory cloud. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD’13). ACM, New York, NY, USA, pp 505–516. https://doi.org/10.1145/2463676.2467799
Tian Y, Balmin A, Corsten SA, Tatikonda S, McPherson J (2013) From “think like a vertex” to “think like a graph. Proc VLDB Endow 7(3):193–204
Article Google Scholar
Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33(8):103–111. https://doi.org/10.1145/79173.79181
Article Google Scholar
White B, Lepreau J, Stoller L, Ricci R, Guruprasad S, Newbold M, Hibler M, Barb C, Joglekar A (2002) An integrated experimental environment for distributed systems and networks. In: Proceedings of the 5th Symposium on Operating Systems Design and implementation (OSDI’02). ACM, New York, NY, USA, pp 255–270. https://doi.org/10.1145/1060289.1060313
Yan D, Huang Y, Liu M, Chen H, Cheng J, Wu H, Zhang C (2018) Graphd: distributed vertex-centric graph processing beyond the memory limit. IEEE Trans Parallel Distrib Syst 29(1):99–114
Article Google Scholar
Yuan P, Zhang W, Xie C, Jin H, Liu L, Lee K (2014) Fast iterative graph computation: a path centric approach. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14). IEEE Press, Piscataway, NJ, USA, pp 401–412. https://doi.org/10.1109/SC.2014.38
Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18(5):821–829
Article Google Scholar
Zhang Y, Liao X, Jin H, He B, Liu H, Gu L (2019) Digraph: an efficient path-based iterative directed graph processing system on multiple GPUs. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, pp 601–614
Zheng D, Mhembere D, Burns R, Vogelstein J, Priebe CE, Szalay AS (2015) FlashGraph: processing billion-node graphs on an array of commodity SSDs. In: 13th USENIX Conference on File and Storage Technologies (FAST 15). USENIX Association, Santa Clara, CA, pp 45–58. https://www.usenix.org/conference/fast15/technical-sessions/presentation/zheng. Accessed 12 Mar 2019
Zhou Y, Liu L, Lee K, Zhang Q (2015) Graphtwist: fast iterative graph computation with two-tier optimizations. Proc VLDB Endow 8(11):1262–1273. https://doi.org/10.14778/2809974.2809987
Article Google Scholar

Download references

Acknowledgements

Funding was provided by Louisiana Board of Regents (Grant No. LEQSF(2016-19)-RD-A-08) and National Science Foundation (Grant No. IBSS-L-1620451 and RAPID-1762600).

Author information

Authors and Affiliations

Louisiana State University, Shreveport, USA
Sayan Goswami
Louisiana State University, Baton Rouge, USA
Ayam Pokhrel & Kisung Lee
Georgia Institute of Technology, Atlanta, USA
Ling Liu
IBM Thomas J. Watson Research Center, New York, USA
Qi Zhang
Auburn University, Auburn, USA
Yang Zhou

Authors

Sayan Goswami
View author publications
You can also search for this author in PubMed Google Scholar
Ayam Pokhrel
View author publications
You can also search for this author in PubMed Google Scholar
Kisung Lee
View author publications
You can also search for this author in PubMed Google Scholar
Ling Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sayan Goswami.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Goswami, S., Pokhrel, A., Lee, K. et al. GraphMap: scalable iterative graph processing using NoSQL. J Supercomput 76, 6619–6647 (2020). https://doi.org/10.1007/s11227-019-03097-w

Download citation

Published: 04 December 2019
Issue Date: September 2020
DOI: https://doi.org/10.1007/s11227-019-03097-w

GraphMap: scalable iterative graph processing using NoSQL

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

FOG: A Fast Out-of-Core Graph Processing Framework

GoFFish: A Sub-graph Centric Framework for Large-Scale Graph Analytics

GraphIn: An Online High Performance Incremental Graph Processing Framework

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

GraphMap: scalable iterative graph processing using NoSQL

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

FOG: A Fast Out-of-Core Graph Processing Framework

GoFFish: A Sub-graph Centric Framework for Large-Scale Graph Analytics

GraphIn: An Online High Performance Incremental Graph Processing Framework

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation