[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1940747.1940751acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
research-article

High-performance, massively scalable distributed systems using the MapReduce software framework: the SHARD triple-store

Published: 17 October 2010 Publication History

Abstract

In this paper we discuss the use of the MapReduce software framework to address the challenge of constructing high-performance, massively-scalable distributed systems. We discuss several design considerations associated with constructing complex distributed systems using the MapReduce software framework, including the difficulty of scalably building indexes. We focus on Hadoop, the most popular MapReduce implementation. Our discussion and analysis are motivated by our construction of SHARD, a massively scalable, high-performance and robust triple-store technology on top of Hadoop. We provide a general approach to construct an information system from the MapReduce software framework that responds to data queries. We provide experimental results generated of an early version of SHARD. We close with a discussion of hypothetical MapReduce alternatives that can be used for the construction of more scalable distributed computing systems.

References

[1]
Amazon. (2010) Amazon EC2 Instance Types. Retrieved from http://aws.amazon.com/ec2/instance-types/
[2]
Berners-Lee, Tim; James Hendler and Ora Lassila (May 17, 2001). "The Semantic Web". Scientific American Magazine.
[3]
Cassandra. (2010) Retrieved from http://cassandra.apache.org/
[4]
Dean J. and Ghemawat S., MapReduce: Simplified data processing on large clusters. In Proceedings of the USENIX Symposium on Operating Systems Design & Implementation (OSDI), pp. 137--147. 2004.
[5]
DeWitt D., Stonebraker M. MapReduce: A major step backwards. databasecolumn.com. http://databasecolumn.vertica.com/database-innovation/mapreduce-a-major-step-backwards/. Retrieved 2010-08-29.
[6]
Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. Journal of Web Semantics 3(2) (2005) 158--182
[7]
Grigoris A., van Harmelen F. A Semantic Web Primer, 2nd Edition. The MIT Press, 2008.
[8]
Hadoop. (2010). Apache Hadoop. Retrieved from http://hadoop.apache.org/
[9]
Hendler J., Web 3.0: The Dawn of Semantic Search. In IEEE Computer, Jan. 2010.
[10]
Kiryakov A., Tashev Z., Ognyanoff D., Velkov R., Momtchev V., Balev B., Peikov I. "Validation goals and metrics for the LarKC platform." LarKC Report FP7--215535. Retrieved from http://www.larkc.eu/wpcontent/uploads/2008/01/larkc_prefinalversion_d552_validation-goals-and-metrics-for-the-larkcplatform. pdf. 2009.
[11]
Kolas D., Emmons I. and Dean M., Efficient Linked-List RDF Indexing in Parliament. In the Proceedings of the Scalable Semantic Web (SSWS) Workshop of ISWC '09, 2009.
[12]
Li P., Zeng Y., Kotoulas S., Urbani J., and Zhong N., "The Quest for Parallel Reasoning on the Semantic Web," in Proceedings of the 2009 International Conference on Active Media Technology, LNCS, 2009.
[13]
LinkingOpenData. (2010) Retrieved from http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
[14]
Mika, P. and Tummarello, G. 2008. Web Semantics in the Clouds. IEEE Intelligent Systems 23, 5 (Sep. 2008), 82--87.
[15]
OWL. (2010) Web Ontology Language (OWL.) Retrieved from http://www.w3.org/TR/owl2-overview/
[16]
Project Voldemort. (2010) Retrieved from http://projectvoldemort.com/
[17]
RDF. (2010) Resource Description Framework (RDF) Retrieved from http://www.w3.org/RDF/
[18]
Rohloff K., Dean M., Emmons I., Ryder D., Sumner J. "An Evaluation of Triple-Store Technologies for Large Data Stores." 3rd International Workshop On Scalable Semantic Web Knowledge Base Systems (SSWS '07), Vilamoura, Portugal, Nov 27, 2007.
[19]
SPARQL. (2010) SPARQL Query Language for RDF http://www.w3.org/TR/rdf-sparql-query/
[20]
Urbani J., Kotoulas S., Oren E., and van Harmelen F., "Scalable Distributed Reasoning using MapReduce," In Proceedings of the ISWC '09, 2009.

Cited By

View all

Index Terms

  1. High-performance, massively scalable distributed systems using the MapReduce software framework: the SHARD triple-store

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      PSI EtA '10: Programming Support Innovations for Emerging Distributed Applications
      October 2010
      26 pages
      ISBN:9781450305440
      DOI:10.1145/1940747
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 October 2010

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. MapReduce
      2. SPARQL
      3. Semantic Web
      4. distributed computing
      5. graph data
      6. performance evaluation
      7. programming
      8. software engineering
      9. systems

      Qualifiers

      • Research-article

      Conference

      SPLASH '10
      Sponsor:

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)37
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 02 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)MINIMISATION OF NETWORK TRAFFIC IN THE RAFT-LIKE CONSENSUS ALGORITHMMunicipal economy of cities10.33042/2522-1809-2024-4-185-2-64:185(2-6)Online publication date: 6-Sep-2024
      • (2024)smart-KG: Partition-Based Linked Data Fragments for querying knowledge graphsSemantic Web10.3233/SW-24357115:5(1791-1835)Online publication date: 9-Oct-2024
      • (2023)Distributed Knowledge ManagementDistributed Systems10.1002/9781119825968.ch15(399-431)Online publication date: 10-Feb-2023
      • (2022)Storage and Query Processing Architectures for RDF DataEncyclopedia of Data Science and Machine Learning10.4018/978-1-7998-9220-5.ch019(298-313)Online publication date: 14-Oct-2022
      • (2022)Wukong+G: Fast and Concurrent RDF Query Processing Using RDMA-Assisted GPU Graph ExplorationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.312156833:7(1619-1635)Online publication date: 1-Jul-2022
      • (2022)Distributed subgraph query for RDF graph data based on MapReduceComputers and Electrical Engineering10.1016/j.compeleceng.2022.108221102:COnline publication date: 1-Sep-2022
      • (2022)Persistence of Fuzzy RDF and Fuzzy RDF SchemaModeling and Management of Fuzzy Semantic RDF Data10.1007/978-3-031-11669-8_4(109-150)Online publication date: 9-Sep-2022
      • (2021)Categorization of RDF Data Management SystemsAdvances in Science, Technology and Engineering Systems Journal10.25046/aj0602256:2(221-233)Online publication date: Mar-2021
      • (2021)MuSe: a multi-level storage scheme for big RDF data using MapReduceJournal of Big Data10.1186/s40537-021-00519-68:1Online publication date: 9-Oct-2021
      • (2021)WukongACM SIGOPS Operating Systems Review10.1145/3469379.346938855:1(77-83)Online publication date: 6-Jun-2021
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media