More Web Proxy on the site http://driver.im/

short-paper

Online Sampling of Summaries from Public SPARQL Endpoints

Authors:

Thi Hoang Thi Pham,

Hala Skaf-Molli,

Brice NédelecAuthors Info & Claims

WWW '24: Companion Proceedings of the ACM Web Conference 2024

Pages 617 - 620

https://doi.org/10.1145/3589335.3651543

Published: 13 May 2024 Publication History

Abstract

Collecting statistics from online public SPARQL endpoints is hampered by their fair usage policies. These restrictions hinder several critical operations, such as aggregate query processing, portal development, and data summarization. Online sampling enables the collection of statistics while respecting fair usage policies. However, sampling has not yet been integrated into the SPARQL standard. Although integrating sampling into the SPARQL standard appears beneficial, its effectiveness must be demonstrated in a practical semantic web context. This paper investigates whether online sampling can generate summaries useful in cutting-edge SPARQL federation engines. Our experimental studies indicate that sampling allows the creation and maintenance of summaries by exploring less than 20% of datasets.

Supplemental Material

MP4 File

Supplemental video

Download
79.80 MB

References

[1]

Maribel Acosta, Maria-Esther Vidal, Tomas Lampo, Julio Castillo, and Edna Ruckhaus. 2011. ANAPSID: an adaptive query processing engine for SPARQL endpoints. In 10th International Semantic Web Conference (ISWC2011). Springer, Bonn, Germany, 18--34.

[2]

Julien Aimonier-Davat, Minh-Hoang Dang, Pascal Molli, Brice Nédelec, and Hala Skaf-Molli. 2023. RAW-JENA: Approximate Query Processing for SPARQL Endpoints. In 22nd International Semantic Web Conference (ISWC'23). CEURWS. org, Athens, Greece, 5.

[3]

Julien Aimonier-Davat, Minh-Hoang Dang, Pascal Molli, Brice Nédelec, and Hala Skaf-Molli. 2024. FedUP: Querying Large-Scale Federations of SPARQL Endpoints. In The ACM Web Conference (WWW'24). ACM, Singapore, Singapore, 10.

[4]

Angelos Charalambidis, Antonis Troumpoukis, and Stasinos Konstantopoulos. 2015. SemaGrow: Optimizing federated SPARQL queries. In 11th International Conference on Semantic Systems. ACM, New York, NY, USA, 121--128.

Digital Library

[5]

Minh-Hoang Dang, Julien Aimonier-Davat, Pascal Molli, Olaf Hartig, Hala Skaf- Molli, and Yotlan Le Crom. 2023. FedShop:ABenchmark for Testing the Scalability of SPARQL Federation Engines. In International Semantic Web Conference (ISWC). Springer, Springer Nature Switzerland, Athens, Greece, 285--301.

[6]

Arnaud Grall, Thomas Minier, Hala Skaf-Molli, and Pascal Molli. 2020. Processing SPARQL Aggregate Queries with Web Preemption. In 17th Extended Semantic Web Conference (ESWC 2020). Springer, Heraklion, Greece, 235--251.

[7]

Ali Hasnain, Qaiser Mehmood, and Syeda Sana e Zainab ang Aidan Hogan. 2016. SPORTAL: Profiling the Content of Public SPARQL Endpoints. Int. J. Semantic Web Inf. Syst. 12, 3 (2016), 134--163.

[8]

Pierre Maillot, Olivier Corby, Catherine Faron, Fabien Gandon, and Franck Michel. 2023. IndeGx: A model and a framework for indexing RDF knowledge graphs with SPARQL-based test suits. J. Web Semant. 76 (2023), 100775.

Digital Library

[9]

Thomas Minier, Hala Skaf-Molli, and Pascal Molli. 2019. SaGe: Web Preemption for Public SPARQL Query Services. In The World Wide Web Conference 2019 (WWW'19). ACM, San Francisco, USA, 1268--1278.

[10]

Gabriela Montoya, Hala Skaf-Molli, and Katja Hose. 2017. The Odyssey approach for optimizing federated SPARQL queries. In International Semantic Web Conference (ISWC). Springer-Verlag, Maui, Hawaii, USA, 471--489.

Digital Library

[11]

Gabriela Montoya, Hala Skaf-Molli, Pascal Molli, and Maria-Esther Vidal. 2017. Decomposing federated queries in presence of replicated fragments. Journal of Web Semantics 42 (2017), 1--18.

Digital Library

[12]

Emmanuel Pietriga, Hande Gözükan, Caroline Appert, Marie Destandau, ?ejla ?ebiri?, François Goasdoué, and Ioana Manolescu. 2018. Browsing Linked Data Catalogs with LODAtlas. In International Semantic Web Conference. Springer, Springer, Monterey, United States, 137--153.

[13]

Bastian Quilitz and Ulf Leser. 2008. Querying Distributed RDF Data Sources with SPARQL. In Extended Semantic Web Conference (ESWC). Springer Berlin Heidelberg, Tenerife, Canary Islands, Spain, 524--538.

[14]

Muhammad Saleem, Ali Hasnain, and Axel-Cyrille Ngonga Ngomo. 2018. Large- RDFBench: A billion triples benchmark for SPARQL endpoint federation. J. Web Semant. 48 (2018), 85--125.

[15]

Muhammad Saleem and Axel-Cyrille Ngonga Ngomo. 2014. HiBISCuS: Hypergraph-based source selection for SPARQL endpoint federation. In European Semantic Web Conference (ESWC). Springer, Cham, 176--191.

[16]

Muhammad Saleem, Alexander Potocki, Tommaso Soru, Olaf Hartig, and Axel- Cyrille Ngonga Ngomo. 2018. CostFed: Cost-based query optimization for SPARQL endpoint federation. In 14th International Conference on Semantic Systems (SEMANTICS). Elsevier, Amsterdam, The Netherlands, 163--174.

[17]

Andreas Schwarte, Peter Haase, Katja Hose, Ralf Schenkel, and Michael Schmidt. 2011. FedX: Optimization techniques for federated query processing on linked data. In International Semantic Web Conference (ISWC). Springer, Bonn, Germany, 601--616.

[18]

Ruben Verborgh, Miel Vander Sande, Olaf Hartig, Joachim Van Herwegen, Laurens De Vocht, Ben De Meester, Gerald Haesendonck, and Pieter Colpaert. 2016. Triple Pattern Fragments: A low-cost knowledge graph interface for the Web. J. Web Sem. 37--38 (2016), 184--206.

Cited By

Pham TMolli PNédelec BSkaf-Molli HAimonier-Davat J(2024)CRAWD: Sampling-Based Estimation of Count-Distinct SPARQL QueriesThe Semantic Web – ISWC 202410.1007/978-3-031-77850-6_6(98-115)Online publication date: 27-Nov-2024
https://doi.org/10.1007/978-3-031-77850-6_6

Index Terms

Online Sampling of Summaries from Public SPARQL Endpoints
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing

Recommendations

Federated SPARQL Query Processing over Heterogeneous Linked Data Fragments
WWW '22: Proceedings of the ACM Web Conference 2022

Linked Data Fragments (LDFs) are Web interfaces that enable querying knowledge graphs on the Web. These interfaces, such as SPARQL endpoints or Triple Pattern Fragment servers, differ in the SPARQL expressions they can evaluate and the metadata they ...
RDF, Jena, SparQL and the 'Semantic Web'
SIGUCCS '09: Proceedings of the 37th annual ACM SIGUCCS fall conference: communication and collaboration

The Resource Description Format (RDF) is used to represent information modeled as a "graph": a set of individual objects, along with a set of connections among those objects. In that role, RDF is one of the pillars of the so-called Semantic Web. This ...
Discovering domain-specific public SPARQL endpoints: a life-sciences use-case
IDEAS '14: Proceedings of the 18th International Database Engineering & Applications Symposium

A significant portion of the LOD cloud consists of Life Sciences data sets, which together contain billions of clinical facts that interlink to form a "Web of Clinical Data". However, tools for new publishers to find relevant datasets that could ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '24: Companion Proceedings of the ACM Web Conference 2024

May 2024

1928 pages

ISBN:9798400701726

DOI:10.1145/3589335

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Chong-Wah Ngo
Singapore Management University
,
Program Chairs:
Ravi Kumar
Google
,
Hady W. Lauw
Singapore Management University
,
Roy Ka-Wei Lee
Singapore University of Technology and Design

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Agence Nationale de la Recherche
Labex CominLabs

Conference

WWW '24

Sponsor:

SIGWEB

WWW '24: The ACM Web Conference 2024

May 13 - 17, 2024

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
27
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)6

Reflects downloads up to 12 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Pham TMolli PNédelec BSkaf-Molli HAimonier-Davat J(2024)CRAWD: Sampling-Based Estimation of Count-Distinct SPARQL QueriesThe Semantic Web – ISWC 202410.1007/978-3-031-77850-6_6(98-115)Online publication date: 27-Nov-2024
https://doi.org/10.1007/978-3-031-77850-6_6

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents