[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1996014.1996021acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store

Published: 08 June 2011 Publication History

Abstract

Graph data processing is an emerging application area for cloud computing because there are few other information infrastructures that cost-effectively permit scalable graph data processing. We present a scalable cloud-based approach to process queries on graph data utilizing the MapReduce model. We call this approach the Clause-Iteration approach. We present algorithms that, when used in conjunction with a MapReduce framework, respond to SPARQL queries over RDF data. Our innovation in the Clause-Iteration approach comes from 1) the iterative construction of query responses by incrementally growing the number of query clauses considered in a response, and 2) our use of flagged keys to join the results of these incremental responses. The Clause-Iteration algorithms form the basis of our scalable, SHARD graph-store built on the Hadoop implementation of MapReduce. SHARD performs favorably when compared to existing "industrial" graph-stores on a standard benchmark graph with 800 million edges. We discuss design considerations and alternatives associated with constructing scalable graph processing technologies.

References

[1]
Amazon. (2010) Amazon EC2 Instance Types. Retrieved from http://aws.amazon.com/ec2/instance-types/
[2]
Berners-Lee, Tim; James Hendler and Ora Lassila (May 17, 2001). "The Semantic Web". Scientific American Magazine.
[3]
Bigdata Scale-out Architecture. (2010) Retrieved from http://www.bigdata.com/whitepapers/bigdata_whitepaper_10-13-2009_public.pdf
[4]
Cassandra. (2010) Retrieved from http://cassandra.apache.org/
[5]
Dean J. and Ghemawat S., MapReduce: Simplified data processing on large clusters. In Proceedings of the USENIX Symposium on Operating Systems Design & Implementation (OSDI), pp. 137--147. 2004.
[6]
DeWitt D., Stonebraker M. MapReduce: A major step backwards. databasecolumn.com. Retrieved from http://databasecolumn.vertica.com/database-innovation/MapReduce-a-major-step-backwards/. Retrieved 2010-08-29.
[7]
Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. Journal of Web Semantics 3(2) (2005) 158--182
[8]
Grigoris A., van Harmelen F. A Semantic Web Primer, 2nd Edition. The MIT Press, 2008.
[9]
Hadoop. (2010). Apache Hadoop. Retrieved from http://hadoop.apache.org/
[10]
HBase (2011). Apache HBase. Retrieved from http://hbase.apache.org/
[11]
Hendler J., Web 3.0: The Dawn of Semantic Search. In IEEE Computer, Jan. 2010.
[12]
Husain M., McGlothlin J., Masud M., Khan L., and Thuraisingham B. Heuristics Based Query Processing for Large RDF Graphs Using Cloud Computing. To appear in IEEE Transaction on Data and Knowledge Engineering Journal (TKDE).
[13]
Hypergraph (2010) Retrieved from http://www.kobrix.com/hgdb.jsp
[14]
InfiniteGraph (2010) Retrieved from http://www.infinitegraph.com/
[15]
Jena (2010) Retrieved from http://jena.sourceforge.net/
[16]
Kantarcioglu M. and Thuraisingham B. (2010) HBase Graph for Jena. Retrieved from http://cs.utdallas.edu/semanticweb/HBase-Extension/hbase-extension.html
[17]
Kiryakov A., Tashev Z., Ognyanoff D., Velkov R., Momtchev V., Balev B., Peikov I. "Validation goals and metrics for the LarKC platform." LarKC Report FP7 -- 215535.
[18]
Kolas, D., Query Rewriting for Semantic Web Information Integration, Proceedings of the Information Integration on the Web workshop. 2007.
[19]
Kolas D., Emmons I. and Dean M., Efficient Linked-List RDF Indexing in Parliament. In the Proceedings of the Scalable Semantic Web (SSWS), 2009.
[20]
LarKC (2010) Retrieved from http://www.larkc.eu
[21]
Li P., Zeng Y., Kotoulas S., Urbani J., and Zhong N., "The Quest for Parallel Reasoning on the Semantic Web," in Proceedings of the 2009 International Conference on Active Media Technology, LNCS, 2009.
[22]
Lin J. and Schatz M., Design patterns for efficient graph algorithms in MapReduce. In Proceedings of the Eighth Workshop on Mining and Learning with Graphs. 2010.
[23]
LinkingOpenData. (2010) Retrieved from http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
[24]
Mika, P. and Tummarello, G. 2008. Web Semantics in the Clouds. IEEE Intelligent Systems 23, 5 (Sep. 2008), 82--87.
[25]
Neo4j (2010) Retrieved from http://neo4j.org/
[26]
OrientDB From http://www.orientechnologies.com/
[27]
Owens, A., Seaborne, A., Gibbins, N., mc schraefel: Clustered TDB: A clustered triple store for Jena. In: WWW2009. (November 2008)
[28]
OWL. (2010) Web Ontology Language (OWL.) Retrieved from http://www.w3.org/TR/owl2-overview/
[29]
Parliament (2010) Retrieved from http://parliament.semwebcentral.org/
[30]
Pig (2011) Apache Pig. Retrieved from http://pig.apache.org/
[31]
Project Voldemort. (2010) Retrieved from http://project-voldemort.com/
[32]
Robey Pointer, Nick Kallen, Ed Ceaser, John Kalucki. Introducing FlockDB. http://engineering.twitter.com/2010/05/introducing-flockdb.html
[33]
RDF. (2010) Resource Description Framework (RDF) Retrieved from http://www.w3.org/RDF/
[34]
Rohloff K., Dean M., Emmons I., Ryder D., Sumner J. "An Evaluation of Triple-Store Technologies for Large Data Stores." SSWS '07, Vilamoura, Portugal, Nov 27, 2007.
[35]
Rohloff K, Schantz R., High-Performance, Massively Scalable Distributed Systems using the MapReduce Software Framework: The SHARD Triple-Store. International Workshop on Programming Support Innovations for Emerging Distributed Applications (PSI EtA), 2010.
[36]
Sesame (2010). Retrieved from http://www.openrdf.org/
[37]
SHARD (2011) Retrieved from http://www.dist-systems.bbn.com/people/krohloff/shard.shtml
[38]
SPARQL. (2010) SPARQL Query Language for RDF http://www.w3.org/TR/rdf-sparql-query/
[39]
Sridhar R., Ravindra P. and Anyanwu K. RAPID: Enabling Scalable Ad-Hoc Analytics on the Semantic Web. ISWC 2009.
[40]
VertexDB (2010) Retrieved from http://www.dekorte.com/projects/opensource/vertexd
[41]
White T., Hadoop: The Definitive Guide. O'Reilly Media, Inc. June 2009.
[42]
Urbani J., Kotoulas S., Maassen J., van Harmelen F., and Bal H. OWL reasoning with WebPIE: calculating the closure of 100 billion triples, In Proceedings of the ESWC '10, 2010.
[43]
Urbani J., Kotoulas S., Oren E., and van Harmelen F., "Scalable Distributed Reasoning using MapReduce," In Proceedings of the ISWC '09, 2009.

Cited By

View all
  • (2024)Protocol Conformance of Collaborative SPARQL Using Multiparty Session TypesTheoretical Aspects of Software Engineering10.1007/978-3-031-64626-3_1(1-18)Online publication date: 14-Jul-2024
  • (2023)Using Machine Learning and Routing Protocols for Optimizing Distributed SPARQL Queries in CollaborationComputers10.3390/computers1210021012:10(210)Online publication date: 17-Oct-2023
  • (2023)Distributed SPARQL queries in collaboration with the routing protocolProceedings of the 27th International Database Engineered Applications Symposium10.1145/3589462.3589497(99-106)Online publication date: 5-May-2023
  • Show More Cited By

Index Terms

  1. Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DIDC '11: Proceedings of the fourth international workshop on Data-intensive distributed computing
    June 2011
    60 pages
    ISBN:9781450307048
    DOI:10.1145/1996014
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 June 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. algorithms
    2. distributed computing
    3. graph data
    4. mapreduce
    5. performance evaluation
    6. semantic web
    7. sparql
    8. systems

    Qualifiers

    • Research-article

    Conference

    HPDC '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 7 of 12 submissions, 58%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 02 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Protocol Conformance of Collaborative SPARQL Using Multiparty Session TypesTheoretical Aspects of Software Engineering10.1007/978-3-031-64626-3_1(1-18)Online publication date: 14-Jul-2024
    • (2023)Using Machine Learning and Routing Protocols for Optimizing Distributed SPARQL Queries in CollaborationComputers10.3390/computers1210021012:10(210)Online publication date: 17-Oct-2023
    • (2023)Distributed SPARQL queries in collaboration with the routing protocolProceedings of the 27th International Database Engineered Applications Symposium10.1145/3589462.3589497(99-106)Online publication date: 5-May-2023
    • (2023)S3QLRDF: distributed SPARQL query processing using Apache Spark—a comparative performance studyDistributed and Parallel Databases10.1007/s10619-023-07422-441:3(191-231)Online publication date: 24-Jan-2023
    • (2022)A SPARQL benchmark for distributed databases in IoT environmentsProceedings of the International Workshop on Big Data in Emergent Distributed Environments10.1145/3530050.3532929(1-6)Online publication date: 12-Jun-2022
    • (2022)Optimizing subgraph matching over distributed knowledge graphs using partial evaluationWorld Wide Web10.1007/s11280-022-01075-626:2(751-771)Online publication date: 8-Jul-2022
    • (2022)Entropy and sigmoid based K-means clustering and AGWO for effective big data handlingMultimedia Tools and Applications10.1007/s11042-022-13929-282:10(15287-15304)Online publication date: 3-Oct-2022
    • (2022)Optimal Subgraph Matching Queries over Distributed Knowledge Graphs Based on Partial EvaluationWeb Information Systems Engineering – WISE 202110.1007/978-3-030-90888-1_22(274-289)Online publication date: 1-Jan-2022
    • (2020)Cloud-Based RDF Data ManagementSynthesis Lectures on Data Management10.2200/S00986ED1V01Y202001DTM06215:1(1-103)Online publication date: 24-Feb-2020
    • (2020)WLeidenRDF: RDF Data Query Method based on Semantic-Enhanced Graph-Clustering Algorithm2020 International Symposium on Theoretical Aspects of Software Engineering (TASE)10.1109/TASE49443.2020.00014(33-40)Online publication date: Dec-2020
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media