SShare: a simulator for studying and evaluating decentralized SPARQL query processing

Jing Zhou¹,
Qi Huang¹,
Weifeng Xie² &
…
Zhiguo Qu³

1163 Accesses
Explore all metrics

Abstract

Previously, we proposed efficient, scalable decentralized processing of SPARQL queries for an ad hoc Semantic Web data sharing system and explored optimization techniques. However, it has proven to be difficult to measure the performance of the proposed query processing in a decentralized setting with existing tools. This is because assessments on SPARQL query performance were typically targeted at a centralized or single-machine settings, and node-to-node communication costs occurring when (sub-)queries were forwarded among multiple nodes have rarely been taken into consideration. We hereby developed a simulator, SShare, that bridges Jena, a Java framework that supports querying RDF data with SPARQL, and ns-3 (network simulator 3), a discrete-event network simulator using C++ and Python. With SShare, one can submit any proper SPARQL query that involves RDF data of interest scattered around distributed hosts (the details of which are unknown to the query initiator), evaluate important performance metrics (e.g., the inter-site data transmission volume and communication delay) obtained at the network level, and finally get visualized results. We anticipated that SShare would be beneficial to others who are keen on better capturing and analyzing the inherent feature of various distributed and decentralized SPARQL processing mechanisms over a large-scale network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

WODII: a solution to process SPARQL queries over distributed data sources

Article 11 November 2019

DRSS: Distributed RDF SPARQL Streaming

Collaborative SPARQL Query Processing for Decentralized Semantic Data

Notes

We borrowed the term from ad hoc networking in that the ad hoc environment for Semantic Web data sharing has many features with ad hoc networking in common: no centralized authority, self-organization, multiple nodes connected by links, and dynamics.
The source code and documentation for SShare are under constant maintenance and development, and can be accessed via http://sshare.sinaapp.com.
Chord provides a unique mapping between an identifier space and a set of nodes; each node is therefore associated with an identifier. Chord maps an identifier, say id, to a node with the smallest identifier greater than id and the node is called the successor (node) of id.
This triple-indexing approach was also presented by Atlas [7] in a similar way.
Put it simply, Chord uses a hash function SHA-1 to get the key identifier Hash(key) of a given key and then stores it at its successor node.
Chord was unable to provide the functionality required for this purpose since it merely associates identifiers with successor nodes. We, therefore, adopted DHASH [5] as shown in Fig. 5.
We proposed to apply the move-small strategy, when evaluating a SPARQL query that contains more than two conjunction graph patterns, to resolve the query in an optimized fashion by using the frequency information (see Sect. 2) available in the location table of related index nodes [15].
Information[1].query and Information[3].query are the same but they are associated with different keys. We distinguish between them to point out that Information[1].query obtains its answer directly from storage nodes. Subsequently, the answer to Information[3].query is acquired by running the query against a merged RDF graph consisting of individual RDF graphs collected by running Information[1].query as mentioned earlier.
A solution mapping can be broken down into a set of tuples that contain variables and their corresponding values in RDF terms [10].
http://www.pudn.com/downloads448/sourcecode/java/detail1890872.html.
The ChordIpv4 module was developed by Harjot Gill to support the Chord/DHASH, see http://code.nsnam.org/gillh/ns-3-chord/.
The function is frequently used during the construction of location tables. For example, when a node that has a triple (a:person b:name ‘jason’) as in Fig. 1 joins the Semantic Web data sharing system, an index on its subject needs to be built and the function insert(Index(s), s:http://a/person) will be invoked.
We tested with different ratios of index nodes to storage nodes and found that the more index nodes the shorter the response time of queries. This is because less index nodes indicate that the probability of any two (or more) queries being forwarded to the same index node will be higher; due to the limitation of bandwidth, it is very likely to take longer time to respond to these queries.
We set the default value for the transmission rate, propagation delay, and MTU as in ns-3.
The maximum number of the RDF triple copies is a tunable parameter.
http://www.nsnam.org/
http://www.riverbed.com/.
http://tetcos.com/.
http://librdf.org/.

References

Beckett D (2001) The Design and implementation of the Redland RDF application framework. In: Proceedings of the 10th international conference on world wide web, ACM, New York, NY, USA, pp 449–456
Beckett D (2014) RDF 1.1 N-Triples: a line-based syntax for an RDF graph. W3C Recommendation. http://www.w3.org/TR/n-triples/, 25 Feb 2014
Beckett D, Berners-Lee T, Prud’hommeaux E, Carothers G (2014) RDF 1.1 Turtle: terse RDF triple language. W3C recommendation. http://www.w3.org/TR/turtle/, 25 Feb 2014
Cai M, Frank M (2004) RDFPeers: a scalable distributed RDF repository based on a structured peer-to-peer network. In: Proceedings of the 13th international conference on world wide web. ACM, New York, NY, USA, pp 650–657
Dabek F, Brunskill E, Kaashoek MF, Morris DKR, Stoica I, Balakrishnan H (2001) Building peer-to-peer systems with Chord, a distributed lookup service. In: Proceedings of the eighth workshop on hot topics in operating systems, IEEE, pp 81–86
Enslow Jr PH, Saponas TG (1981) Distributed and decentralized control in fully distributed processing systems—a survey of applicable models. Final Technical Report GIT-ICS-81/02, School of Information and Computer Science, Georgia Institute of Technology, Atlanta, GA, USA
Kaoudi Z, Koubarakis M, Kyzirakos K, Miliaraki I, Magiridou M, Papadakis-Pesaresi A (2010) Atlas: storing, updating and querying RDF(S) data on top of DHTs. Web Semant Sci Serv Agents World Wide Web 8(4):271–277
Article Google Scholar
Liarou E, Idreos S, Koubarakis M (2006) Evaluating conjunctive triple pattern queries over large structured overlay networks. In: Proceedings of the fifth international conference on the semantic web. Springer, Athens, GA, USA, pp 399–413
Ns-3 project (2013) Ns-3 Tutorial. http://www.nsnam.org/docs/tutorial/html/index.html
Pérez J, Arenas M, Gutierrez C (2009) Semantics and complexity of SPARQL. ACM Trans Database Syst 34(3):1–45
Article Google Scholar
Prud’hommeaux E, Seaborne A (2008) SPARQL query language for RDF. W3C recommendation. http://www.w3.org/TR/rdf-sparql-query/. 15 Jan 2008
Schmidt M, Hornung T, Lausen G, Pinkel C (2009) SP²Bench: a SPARQL performance benchmark. In: Proceedings of the 25th international conference on data engineering. IEEE Computer Society, Shanghai, China, pp 222–233
Seaborne A, Polleres A, Feigenbaum L, Williams GT (2013) SPARQL 1.1 federated query. W3C recommendation. http://www.w3.org/TR/sparql11-federated-query/. 21 Mar 2013
Stoica I, Morris R, Karger D, Kaashoek MF, Balakrishnan H (2001) Chord: a scalable peer-to-peer lookup service for Internet applications. In: Proceedings of the 2001 conference on applications, technologies, architectures, and protocols for computer communications. ACM, San Diego, California, USA, pp 149–160
Zhou J, Bochmann GV, Shi Z (2014) Supporting decentralized SPARQL queries in an ad-hoc semantic web data sharing system. Int J Netw Comput 4(1):88–110
Google Scholar

Download references

Acknowledgments

This work was funded by the Engineering Disciplines Planning Project of the Communication University of China (No. 3132014XNG1453) and the National Key Technology R&D Program (No. 2013BAH66F02). The authors also acknowledge the input of PAPD and CICAEET.

Author information

Authors and Affiliations

School of Computer Science, Communication University of China, Beijing, 100024, China
Jing Zhou & Qi Huang
PLA Logistics Academy, Beijing, 100858, China
Weifeng Xie
Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science and Technology, Nanjing, 210044, China
Zhiguo Qu

Authors

Jing Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Qi Huang
View author publications
You can also search for this author in PubMed Google Scholar
Weifeng Xie
View author publications
You can also search for this author in PubMed Google Scholar
Zhiguo Qu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiguo Qu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, J., Huang, Q., Xie, W. et al. SShare: a simulator for studying and evaluating decentralized SPARQL query processing. Pers Ubiquit Comput 19, 1087–1097 (2015). https://doi.org/10.1007/s00779-015-0878-4

Download citation

Received: 12 December 2014
Accepted: 01 May 2015
Published: 04 September 2015
Issue Date: October 2015
DOI: https://doi.org/10.1007/s00779-015-0878-4

SShare: a simulator for studying and evaluating decentralized SPARQL query processing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

WODII: a solution to process SPARQL queries over distributed data sources

DRSS: Distributed RDF SPARQL Streaming

Collaborative SPARQL Query Processing for Decentralized Semantic Data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

SShare: a simulator for studying and evaluating decentralized SPARQL query processing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

WODII: a solution to process SPARQL queries over distributed data sources

DRSS: Distributed RDF SPARQL Streaming

Collaborative SPARQL Query Processing for Decentralized Semantic Data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation