Ulysses: An Intelligent Client for Replicated Triple Pattern Fragments

Thomas Minier²⁶,
Hala Skaf-Molli²⁶,
Pascal Molli²⁶ &
…
Maria-Ester Vidal²⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11155))

Included in the following conference series:

European Semantic Web Conference

2347 Accesses

Abstract

Ulysses is an intelligent TPF client that takes advantage of replicated datasets to distribute the load of SPARQL query processing and provides fault-tolerance. By reducing the load on a TPF server, Ulysses improves the Linked Data availability and distributes the financial costs of queries execution among data providers. This demonstration presents the Ulysses web client and shows how users can run SPARQL queries in their browsers against TPF servers hosting replicated data. It also provides various visualizations that show in real-time how Ulysses performs the actual load distribution and adapts to network conditions during SPARQL query processing.

You have full access to this open access chapter, Download conference paper PDF

Intelligent Clients for Replicated Triple Pattern Fragments

SLURP: An Interactive SPARQL Query Planner

Ladda: SPARQL Queries in the Fog of Browsers

Keywords

1 Introduction

We proposed Ulysses [1], a replication-aware intelligent TPF client that distributes the load of SPARQL query processing across heterogeneous replicated TPF servers. Ulysses relies on a light-weighted cost-model for computing servers processing capabilities and a client-side load balancer to distribute SPARQL query processing and provides fault tolerance during query processing.

Consider the SPARQL query \(Q_1\) in Fig. 1, and the two servers \(S_1\) and \(S_2\) publishing a replica of the DBpedia 2015 dataset, hosted by DBpedia^{Footnote 1} and LANL Linked Data Archive^{Footnote 2}, respectively. Executing \(Q_1\) with the regular TPF client [4] on \(S_1\) alone generates 442 HTTP calls, takes 7 s in average, and returns 222 results. Executing the same query as a federated SPARQL query on both \(S_1\) and \(S_2\) generates 478 HTTP calls on \(S_1\) and 470 HTTP calls on \(S_2\), returns 222 results, and takes 25 s in average. This is because existing TPF clients do not support replication nor client-side load balancing [1].

As Ulysses is aware that datasets hosted at \(S_1\) and \(S_2\) are replicated, it only generates 442 HTTP calls that are distributed between servers according to their processing capabilities and network latencies. If the servers are not loaded, the performances of Ulysses are similar to those of the regular TPF client \((7\,s)\) without replication. However, if the servers are loaded, Ulysses improves significantly the performances thanks to load-balancing.

Using replicated servers, Ulysses prevents a single point of failure server-side, improves the overall availability of data, and distributes the financial costs of queries execution among data providers.

This demonstration presents the Ulysses web client. It details which informations are collected by Ulysses about servers in real-time, how the cost model is recomputed, and how the load of SPARQL query processing is balanced among replicated servers through different real-time visualizations. Finally, Ulysses reactions in presence of servers failure are illustrated.

2 Overview of Ulysses Client

The Ulysses web client is available online at http://ulysses-demo.herokuapp.com. In order to distribute the load of SPARQL query processing across heterogeneous TPF servers hosting replicated data, it relies on three key ideas detailed in [1]. In next sections, we provides a brief overview of key ideas and how they are integrated in the Ulysses web client^{Footnote 3}.

2.1 Replication-Aware Source Selection

Ulysses uses a replication-aware source selection algorithm to identify which TPF servers can be used to distribute evaluation of triple patterns during SPARQL query processing, based on the replication model introduced in [2, 3].

This replication model allows to describe replicated datasets using replicated fragment and a fragment mapping. A fragment is defined as 2-tuple: the authoritative source of the fragment, and a triple pattern met by the fragment’s triple. A fragment mapping is a function that maps each fragment to a set of TPF servers. Using these information, Ulysses is able to compute relevant sources for all triple pattern in a SPARQL query.

Consider again the two servers \(S_1, S_2\) and the SPARQL query \(Q_1\) in Fig. 1. Only one fragment \(f_1 = \langle \)http://fragments.dbpedia.org/2015-10/en, ?s ?p ?o\(\rangle \) is defined to indicate a total replication. A fragment mapping \(\mathcal {F}\) maps \(f_1\) to the set \(\{ S_1, S_2 \}\). Thus, all RDF triples met by every triple pattern of \(Q_1\) are replicated by both DBpedia and LANL servers.

For simplicity, in this demonstration we only consider the scenario with total replication. Consequently, the evaluation a triple pattern of the query \(Q_1\) will be distributed between servers DBpedia and LANL.

2.2 A Cost-Model for Estimating Servers Processing Capabilities

Ulysses uses response times of HTTP requests performed against TPF servers during query processing as probes to accurately estimate the processing capabilities of a server. The response time of each request is used to compute the throughput of a server, i.e., the number of results server per unit of time by a server. As SPARQL query processing with the TPF approach requires to send many requests to a server in order to evaluate triple patterns, Ulysses can keep the servers throughputs updated in real-time without additional probing. This can also easily detect load spikes or server failures.

Servers’ throughputs are used to compute a cost-model that define a capability factor of each TPF server. This capability factor determines the load distribution among servers: a server with a high capability factor has more chance to be selected to evaluate a triple pattern as detailed in Sect. 2.3.

Figure 2 shows a real-time estimation of servers loads during execution of query \(Q_1\) of Fig. 1 against \(S_1\) and \(S_2\). \(S_1\) is slightly faster to access than \(S_2\), but as the latter serves five times more results per access (Page size column), \(S_2\) has a better throughput than \(S_1\). As, \(S_2\) has a better capability factor than \(S_1\), it will receive approximately 75% of the query load, while \(S_1\) will approximately receive the remaining 25% (Estimated load column).

2.3 Adaptive Client-Side Load Balancing with Fault Tolerance

Ulysses uses an adaptive load-balancer to perform load balancing among replicated servers. Each evaluation of a triple pattern scheduled by the client is sent to a server selected using a weighted random algorithm, inspired by the Smart clients approach [5]. The probability of selecting a server is proportional to its processing capabilities, according to Ulysses cost-model.

This probability distribution ensures that each TPF server will only process an amount of requests proportional to its processing capabilities, without concentrating all the load of query processing on the most performant servers. Ulysses load-balancer also provides fault-tolerance, by re-scheduling failed HTTP requests using available replicated servers.

Figure 3 shows the metrics displayed in real-time by the Ulysses web client during SPARQL query processing of \(Q_1\), distributed among \(S_1\) and \(S_2\). We see that the server throughputs and capability factors of both servers remain close at the start of query processing (Server access times and Servers capability factors). However, after 18 seconds, \(S_1\) access times increase, so \(S_2\) became more efficient than \(S_1\), causing its capability factor to rise. Thus, the load distribution is affected in real-time, and, at the end of query processing, we see that \(S_2\) has received more HTTP requests (Number of HTTP requests per server).

3 Demonstration Scenario

In the context of ESWC 2018, we would like to run a live experiment that anyone can join. We will tweet a link that participants can click to access Ulysses online demonstration, using their laptops or smartphones. Then, they will be able to submit SPARQL queries against a set of TPF servers hosting replicated data. We will provide a selection of replicated TPF servers, hosting replicas of DBpedia and WatDiv datasets, with some SPARQL queries as a quick-start. Participants will also be able to use their own set of TPF servers and SPARQL queries.

In this scenario, participants will be able to see how Ulysses keeps its cost-model updated in real-time and how it benefits of this to distribute the load of query processing, using visualizations presented in Figs. 2 and 3. Additionally, we will also provide replicated TPF servers that can be shutdown in order to simulate failures. Participants will be able to see how Ulysses is able to continue query processing after a server failure, by re-distributing the load using available servers.

4 Conclusion

In this demonstration, we presented the Ulysses web client that enables Web Browsers to perform client-side load balancing and provides fault tolerance when evaluating SPARQL queries against TPF servers hosting replicated data. Real-time visualizations allow to observe how Ulysses distributes the load of SPARQL query processing across replicated TPF servers according to their processing capabilities, and adapts to failures or variations in network conditions.

Notes

1.
http://fragments.dbpedia.org/.
2.
http://fragments.mementodepot.org/.
3.
The open-source Ulysses client is available at https://github.com/Callidon/ulysses-tpf, under MIT license.

References

Minier, T., Skaf-Molli, H., Molli, P., Vidal, M.-E.: Intelligent clients for replicated triple pattern fragments. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 400–414. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93417-4_26
Chapter Google Scholar
Montoya, G., Skaf-Molli, H., Molli, P., Vidal, M.-E.: Federated SPARQL Queries processing with replicated fragments. ISWC 2015. LNCS, vol. 9366, pp. 36–51. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25007-6_3
Chapter Google Scholar
Montoya, G., Skaf-Molli, H., Molli, P., Vidal, M.E.: Decomposing federated queries in presence of replicated fragments. Web Semant. Sci. Serv. Agents World Wide Web 42, 1–18 (2017)
Article Google Scholar
Verborgh, R., et al.: Triple pattern fragments: a low-cost knowledge graph interface for the web. Web Semant. Sci. Serv. Agents World Wide Web 37, 184–206 (2016)
Article Google Scholar
Yoshikawa, C., Chun, B., Eastham, P., Vahdat, A., Anderson, T., Culler, D.: Using smart clients to build scalable services. In: Proceedings of the 1997 USENIX Technical Conference, CA, p. 105 (1997)
Google Scholar

Download references

Acknowledgments

This work is partially supported through the FaBuLA project, part of the AtlanSTIC 2020 program.

Author information

Authors and Affiliations

LS2N, University of Nantes, Nantes, France
Thomas Minier, Hala Skaf-Molli & Pascal Molli
TIB Leibniz Information Centre For Science and Technology, University Library & Fraunhofer IAIS, Sankt Augustin, Germany
Maria-Ester Vidal

Authors

Thomas Minier
View author publications
Search author on:PubMed Google Scholar
Hala Skaf-Molli
View author publications
Search author on:PubMed Google Scholar
Pascal Molli
View author publications
Search author on:PubMed Google Scholar
Maria-Ester Vidal
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Thomas Minier .

Editor information

Editors and Affiliations

University of Bologna, Bologna, Italy
Aldo Gangemi
IBM Research - Almaden, San Jose, CA, USA
Anna Lisa Gentile
CNR-ISTC, Rome, Italy
Andrea Giovanni Nuzzolese
Technische Universität Dresden, Dresden, Germany
Sebastian Rudolph
Karlsruhe Institute of Technology, Karlsruhe, Germany
Maria Maleshkova
University of Mannheim, Mannheim, Germany
Heiko Paulheim
University of Aberdeen, Aberdeen, UK
Jeff Z Pan
CNR-ISTC, Rome, Italy
Mehwish Alam

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Minier, T., Skaf-Molli, H., Molli, P., Vidal, ME. (2018). Ulysses: An Intelligent Client for Replicated Triple Pattern Fragments. In: Gangemi, A., et al. The Semantic Web: ESWC 2018 Satellite Events. ESWC 2018. Lecture Notes in Computer Science(), vol 11155. Springer, Cham. https://doi.org/10.1007/978-3-319-98192-5_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-98192-5_34
Published: 02 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98191-8
Online ISBN: 978-3-319-98192-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics