[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2503210.2503302acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Efficient data partitioning model for heterogeneous graphs in the cloud

Published: 17 November 2013 Publication History

Abstract

As the size and variety of information networks continue to grow in many scientific and engineering domains, we witness a growing demand for efficient processing of large heterogeneous graphs using a cluster of compute nodes in the Cloud. One open issue is how to effectively partition a large graph to process complex graph operations efficiently. In this paper, we present VB-Partitioner -- a distributed data partitioning model and algorithms for efficient processing of graph operations over large-scale graphs in the Cloud. Our VB-Partitioner has three salient features. First, it introduces vertex blocks (VBs) and extended vertex blocks (EVBs) as the building blocks for semantic partitioning of large graphs. Second, VB-Partitioner utilizes vertex block grouping algorithms to place those vertex blocks that have high correlation in graph structure into the same partition. Third, VB-Partitioner employs a VB-partition guided query partitioning model to speed up the parallel processing of graph pattern queries by reducing the amount of inter-partition query processing. We conduct extensive experiments on several real-world graphs with millions of vertices and billions of edges. Our results show that VB-Partitioner significantly outperforms the popular random block-based data partitioner in terms of query latency and scalability over large-scale graphs.

References

[1]
About FacetedDBLP. http://dblp.l3s.de/dblp++.php.
[2]
BTC 2012 Dataset. http://km.aifb.kit.edu/projects/btc-2012/.
[3]
Chaco: Software for Partitioning Graphs. http://www.sandia.gov/bahendr/chaco.html.
[4]
DBpedia 3.8 Downloads. http://wiki.dbpedia.org/Downloads38.
[5]
METIS. http://www.cs.umn.edu/~metis.
[6]
K. Andreev and H. Räcke. Balanced graph partitioning. In Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures, SPAA '04, pages 120--124, New York, NY, USA, 2004. ACM.
[7]
P. Barceló, L. Libkin, and J. L. Reutter. Querying graph patterns. In Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, PODS '11, pages 199--210, New York, NY, USA, 2011. ACM.
[8]
C. Franke, S. Morin, A. Chebotko, J. Abraham, and P. Brazier. Distributed Semantic Web Data Management in HBase and MySQL Cluster. In Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing, CLOUD '11, pages 105--112, Washington, DC, USA, 2011. IEEE Computer Society.
[9]
B. Hendrickson and R. Leland. A multilevel algorithm for partitioning graphs. In Proceedings of the 1995 ACM/IEEE conference on Supercomputing, Supercomputing '95, New York, NY, USA, 1995. ACM.
[10]
J. Huang, D. J. Abadi, and K. Ren. Scalable SPARQL querying of large RDF graphs. Proceedings of the VLDB Endowment, 4(11):1123--1134, 2011.
[11]
M. Husain, J. McGlothlin, M. M. Masud, L. Khan, and B. M. Thuraisingham. Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing. IEEE Trans. on Knowl. and Data Eng., 23(9):1312--1327, Sept. 2011.
[12]
G. Karypis and V. Kumar. Analysis of multilevel graph partitioning. In Proceedings of the 1995 ACM/IEEE conference on Supercomputing, Supercomputing '95, New York, NY, USA, 1995. ACM.
[13]
G. Karypis and V. Kumar. Parallel multilevel k-way partitioning scheme for irregular graphs. In Proceedings of the 1996 ACM/IEEE conference on Supercomputing, Supercomputing '96, Washington, DC, USA, 1996. IEEE Computer Society.
[14]
G. Karypis and V. Kumar. Multilevel algorithms for multi-constraint graph partitioning. In Proceedings of the 1998 ACM/IEEE conference on Supercomputing, Supercomputing '98, pages 1--13, Washington, DC, USA, 1998. IEEE Computer Society.
[15]
A. Kyrola, G. Blelloch, and C. Guestrin. GraphChi: large-scale graph computation on just a PC. In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, OSDI'12, pages 31--46, Berkeley, CA, USA, 2012. USENIX Association.
[16]
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endow., 5(8):716--727, Apr. 2012.
[17]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, SIGMOD '10, pages 135--146, New York, NY, USA, 2010. ACM.
[18]
T. Neumann and G. Weikum. The RDF-3X engine for scalable management of RDF data. The VLDB Journal, 19(1), Feb. 2010.
[19]
K. Rohloff and R. E. Schantz. Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store. In Proceedings of the fourth international workshop on Data-intensive distributed computing, DIDC '11, pages 35--44, New York, NY, USA, 2011. ACM.
[20]
B. White and et al. An integrated experimental environment for distributed systems and networks. In Proceedings of the 5th symposium on Operating systems design and implementation, OSDI '02, pages 255--270, 2002.
[21]
P. Yuan, P. Liu, B. Wu, H. Jin, W. Zhang, and L. Liu. TripleBit: a Fast and Compact System for Large Scale RDF Data. Proceedings of the VLDB Endowment, 6(7), 2013.

Cited By

View all
  • (2022)GGraph: An Efficient Structure-Aware Approach for Iterative Graph ProcessingIEEE Transactions on Big Data10.1109/TBDATA.2020.30196418:5(1182-1194)Online publication date: 1-Oct-2022
  • (2022)OntoSP: Ontology-Based Semantic-Aware Partitioning on RDF GraphsWeb Information Systems Engineering – WISE 202110.1007/978-3-030-90888-1_21(258-273)Online publication date: 1-Jan-2022
  • (2021)An Efficient Node Priority and Threshold-Based Partitioning Algorithm for Graph ProcessingIntelligent Computing and Innovation on Data Science10.1007/978-981-15-3284-9_84(783-792)Online publication date: 18-Nov-2021
  • Show More Cited By

Index Terms

  1. Efficient data partitioning model for heterogeneous graphs in the cloud

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
      November 2013
      1123 pages
      ISBN:9781450323789
      DOI:10.1145/2503210
      • General Chair:
      • William Gropp,
      • Program Chair:
      • Satoshi Matsuoka
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 November 2013

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. big data processing
      2. cloud computing
      3. heterogeneous graph
      4. partitioning

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      SC13
      Sponsor:

      Acceptance Rates

      SC '13 Paper Acceptance Rate 91 of 449 submissions, 20%;
      Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)9
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 10 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)GGraph: An Efficient Structure-Aware Approach for Iterative Graph ProcessingIEEE Transactions on Big Data10.1109/TBDATA.2020.30196418:5(1182-1194)Online publication date: 1-Oct-2022
      • (2022)OntoSP: Ontology-Based Semantic-Aware Partitioning on RDF GraphsWeb Information Systems Engineering – WISE 202110.1007/978-3-030-90888-1_21(258-273)Online publication date: 1-Jan-2022
      • (2021)An Efficient Node Priority and Threshold-Based Partitioning Algorithm for Graph ProcessingIntelligent Computing and Innovation on Data Science10.1007/978-981-15-3284-9_84(783-792)Online publication date: 18-Nov-2021
      • (2020)Artificial Intelligence Knowledge Graph for Dynamic Networks: An Incremental Partition AlgorithmIEEE Access10.1109/ACCESS.2020.29826528(63434-63442)Online publication date: 2020
      • (2020)An Efficient Node Priority and Threshold-Based Partitioning Algorithm for Graph ProcessingIntelligent Computing and Innovation on Data Science10.1007/978-981-15-3284-9_90(783-792)Online publication date: 15-May-2020
      • (2019) Reasoning and querying web-scale open data based on DL-Lite in a divide-and-conquer way Journal of Web Semantics10.1016/j.websem.2019.01.003Online publication date: Feb-2019
      • (2019)An efficient clustering and load balancing of distributed cloud data centers using graph theoryInternational Journal of Communication Systems10.1002/dac.389632:5Online publication date: 4-Jan-2019
      • (2018)Accelerating Data Analytics on Integrated GPU Platforms via Runtime SpecializationInternational Journal of Parallel Programming10.1007/s10766-016-0482-x46:2(336-375)Online publication date: 1-Apr-2018
      • (2018)Parallel Shortest Path Graph Computations of United States Road Network Data on Apache SparkSmart Societies, Infrastructure, Technologies and Applications10.1007/978-3-319-94180-6_30(323-336)Online publication date: 22-Jul-2018
      • (2018)Storing and Querying Semantic Data in the CloudReasoning Web. Learning, Uncertainty, Streaming, and Scalability10.1007/978-3-030-00338-8_7(173-222)Online publication date: 22-Sep-2018
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media