Abstract
Previously, we proposed efficient, scalable decentralized processing of SPARQL queries for an ad hoc Semantic Web data sharing system and explored optimization techniques. However, it has proven to be difficult to measure the performance of the proposed query processing in a decentralized setting with existing tools. This is because assessments on SPARQL query performance were typically targeted at a centralized or single-machine settings, and node-to-node communication costs occurring when (sub-)queries were forwarded among multiple nodes have rarely been taken into consideration. We hereby developed a simulator, SShare, that bridges Jena, a Java framework that supports querying RDF data with SPARQL, and ns-3 (network simulator 3), a discrete-event network simulator using C++ and Python. With SShare, one can submit any proper SPARQL query that involves RDF data of interest scattered around distributed hosts (the details of which are unknown to the query initiator), evaluate important performance metrics (e.g., the inter-site data transmission volume and communication delay) obtained at the network level, and finally get visualized results. We anticipated that SShare would be beneficial to others who are keen on better capturing and analyzing the inherent feature of various distributed and decentralized SPARQL processing mechanisms over a large-scale network.
Similar content being viewed by others
Notes
We borrowed the term from ad hoc networking in that the ad hoc environment for Semantic Web data sharing has many features with ad hoc networking in common: no centralized authority, self-organization, multiple nodes connected by links, and dynamics.
The source code and documentation for SShare are under constant maintenance and development, and can be accessed via http://sshare.sinaapp.com.
Chord provides a unique mapping between an identifier space and a set of nodes; each node is therefore associated with an identifier. Chord maps an identifier, say id, to a node with the smallest identifier greater than id and the node is called the successor (node) of id.
This triple-indexing approach was also presented by Atlas [7] in a similar way.
Put it simply, Chord uses a hash function SHA-1 to get the key identifier Hash(key) of a given key and then stores it at its successor node.
Information[1].query and Information[3].query are the same but they are associated with different keys. We distinguish between them to point out that Information[1].query obtains its answer directly from storage nodes. Subsequently, the answer to Information[3].query is acquired by running the query against a merged RDF graph consisting of individual RDF graphs collected by running Information[1].query as mentioned earlier.
A solution mapping can be broken down into a set of tuples that contain variables and their corresponding values in RDF terms [10].
The ChordIpv4 module was developed by Harjot Gill to support the Chord/DHASH, see http://code.nsnam.org/gillh/ns-3-chord/.
The function is frequently used during the construction of location tables. For example, when a node that has a triple (a:person b:name ‘jason’) as in Fig. 1 joins the Semantic Web data sharing system, an index on its subject needs to be built and the function insert(Index(s), s:http://a/person) will be invoked.
We tested with different ratios of index nodes to storage nodes and found that the more index nodes the shorter the response time of queries. This is because less index nodes indicate that the probability of any two (or more) queries being forwarded to the same index node will be higher; due to the limitation of bandwidth, it is very likely to take longer time to respond to these queries.
We set the default value for the transmission rate, propagation delay, and MTU as in ns-3.
The maximum number of the RDF triple copies is a tunable parameter.
References
Beckett D (2001) The Design and implementation of the Redland RDF application framework. In: Proceedings of the 10th international conference on world wide web, ACM, New York, NY, USA, pp 449–456
Beckett D (2014) RDF 1.1 N-Triples: a line-based syntax for an RDF graph. W3C Recommendation. http://www.w3.org/TR/n-triples/, 25 Feb 2014
Beckett D, Berners-Lee T, Prud’hommeaux E, Carothers G (2014) RDF 1.1 Turtle: terse RDF triple language. W3C recommendation. http://www.w3.org/TR/turtle/, 25 Feb 2014
Cai M, Frank M (2004) RDFPeers: a scalable distributed RDF repository based on a structured peer-to-peer network. In: Proceedings of the 13th international conference on world wide web. ACM, New York, NY, USA, pp 650–657
Dabek F, Brunskill E, Kaashoek MF, Morris DKR, Stoica I, Balakrishnan H (2001) Building peer-to-peer systems with Chord, a distributed lookup service. In: Proceedings of the eighth workshop on hot topics in operating systems, IEEE, pp 81–86
Enslow Jr PH, Saponas TG (1981) Distributed and decentralized control in fully distributed processing systems—a survey of applicable models. Final Technical Report GIT-ICS-81/02, School of Information and Computer Science, Georgia Institute of Technology, Atlanta, GA, USA
Kaoudi Z, Koubarakis M, Kyzirakos K, Miliaraki I, Magiridou M, Papadakis-Pesaresi A (2010) Atlas: storing, updating and querying RDF(S) data on top of DHTs. Web Semant Sci Serv Agents World Wide Web 8(4):271–277
Liarou E, Idreos S, Koubarakis M (2006) Evaluating conjunctive triple pattern queries over large structured overlay networks. In: Proceedings of the fifth international conference on the semantic web. Springer, Athens, GA, USA, pp 399–413
Ns-3 project (2013) Ns-3 Tutorial. http://www.nsnam.org/docs/tutorial/html/index.html
Pérez J, Arenas M, Gutierrez C (2009) Semantics and complexity of SPARQL. ACM Trans Database Syst 34(3):1–45
Prud’hommeaux E, Seaborne A (2008) SPARQL query language for RDF. W3C recommendation. http://www.w3.org/TR/rdf-sparql-query/. 15 Jan 2008
Schmidt M, Hornung T, Lausen G, Pinkel C (2009) SP2Bench: a SPARQL performance benchmark. In: Proceedings of the 25th international conference on data engineering. IEEE Computer Society, Shanghai, China, pp 222–233
Seaborne A, Polleres A, Feigenbaum L, Williams GT (2013) SPARQL 1.1 federated query. W3C recommendation. http://www.w3.org/TR/sparql11-federated-query/. 21 Mar 2013
Stoica I, Morris R, Karger D, Kaashoek MF, Balakrishnan H (2001) Chord: a scalable peer-to-peer lookup service for Internet applications. In: Proceedings of the 2001 conference on applications, technologies, architectures, and protocols for computer communications. ACM, San Diego, California, USA, pp 149–160
Zhou J, Bochmann GV, Shi Z (2014) Supporting decentralized SPARQL queries in an ad-hoc semantic web data sharing system. Int J Netw Comput 4(1):88–110
Acknowledgments
This work was funded by the Engineering Disciplines Planning Project of the Communication University of China (No. 3132014XNG1453) and the National Key Technology R&D Program (No. 2013BAH66F02). The authors also acknowledge the input of PAPD and CICAEET.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhou, J., Huang, Q., Xie, W. et al. SShare: a simulator for studying and evaluating decentralized SPARQL query processing. Pers Ubiquit Comput 19, 1087–1097 (2015). https://doi.org/10.1007/s00779-015-0878-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00779-015-0878-4