[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3589335.3651543acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
short-paper

Online Sampling of Summaries from Public SPARQL Endpoints

Published: 13 May 2024 Publication History

Abstract

Collecting statistics from online public SPARQL endpoints is hampered by their fair usage policies. These restrictions hinder several critical operations, such as aggregate query processing, portal development, and data summarization. Online sampling enables the collection of statistics while respecting fair usage policies. However, sampling has not yet been integrated into the SPARQL standard. Although integrating sampling into the SPARQL standard appears beneficial, its effectiveness must be demonstrated in a practical semantic web context. This paper investigates whether online sampling can generate summaries useful in cutting-edge SPARQL federation engines. Our experimental studies indicate that sampling allows the creation and maintenance of summaries by exploring less than 20% of datasets.

Supplemental Material

MP4 File
Supplemental video

References

[1]
Maribel Acosta, Maria-Esther Vidal, Tomas Lampo, Julio Castillo, and Edna Ruckhaus. 2011. ANAPSID: an adaptive query processing engine for SPARQL endpoints. In 10th International Semantic Web Conference (ISWC2011). Springer, Bonn, Germany, 18--34.
[2]
Julien Aimonier-Davat, Minh-Hoang Dang, Pascal Molli, Brice Nédelec, and Hala Skaf-Molli. 2023. RAW-JENA: Approximate Query Processing for SPARQL Endpoints. In 22nd International Semantic Web Conference (ISWC'23). CEURWS. org, Athens, Greece, 5.
[3]
Julien Aimonier-Davat, Minh-Hoang Dang, Pascal Molli, Brice Nédelec, and Hala Skaf-Molli. 2024. FedUP: Querying Large-Scale Federations of SPARQL Endpoints. In The ACM Web Conference (WWW'24). ACM, Singapore, Singapore, 10.
[4]
Angelos Charalambidis, Antonis Troumpoukis, and Stasinos Konstantopoulos. 2015. SemaGrow: Optimizing federated SPARQL queries. In 11th International Conference on Semantic Systems. ACM, New York, NY, USA, 121--128.
[5]
Minh-Hoang Dang, Julien Aimonier-Davat, Pascal Molli, Olaf Hartig, Hala Skaf- Molli, and Yotlan Le Crom. 2023. FedShop:ABenchmark for Testing the Scalability of SPARQL Federation Engines. In International Semantic Web Conference (ISWC). Springer, Springer Nature Switzerland, Athens, Greece, 285--301.
[6]
Arnaud Grall, Thomas Minier, Hala Skaf-Molli, and Pascal Molli. 2020. Processing SPARQL Aggregate Queries with Web Preemption. In 17th Extended Semantic Web Conference (ESWC 2020). Springer, Heraklion, Greece, 235--251.
[7]
Ali Hasnain, Qaiser Mehmood, and Syeda Sana e Zainab ang Aidan Hogan. 2016. SPORTAL: Profiling the Content of Public SPARQL Endpoints. Int. J. Semantic Web Inf. Syst. 12, 3 (2016), 134--163.
[8]
Pierre Maillot, Olivier Corby, Catherine Faron, Fabien Gandon, and Franck Michel. 2023. IndeGx: A model and a framework for indexing RDF knowledge graphs with SPARQL-based test suits. J. Web Semant. 76 (2023), 100775.
[9]
Thomas Minier, Hala Skaf-Molli, and Pascal Molli. 2019. SaGe: Web Preemption for Public SPARQL Query Services. In The World Wide Web Conference 2019 (WWW'19). ACM, San Francisco, USA, 1268--1278.
[10]
Gabriela Montoya, Hala Skaf-Molli, and Katja Hose. 2017. The Odyssey approach for optimizing federated SPARQL queries. In International Semantic Web Conference (ISWC). Springer-Verlag, Maui, Hawaii, USA, 471--489.
[11]
Gabriela Montoya, Hala Skaf-Molli, Pascal Molli, and Maria-Esther Vidal. 2017. Decomposing federated queries in presence of replicated fragments. Journal of Web Semantics 42 (2017), 1--18.
[12]
Emmanuel Pietriga, Hande Gözükan, Caroline Appert, Marie Destandau, ?ejla ?ebiri?, François Goasdoué, and Ioana Manolescu. 2018. Browsing Linked Data Catalogs with LODAtlas. In International Semantic Web Conference. Springer, Springer, Monterey, United States, 137--153.
[13]
Bastian Quilitz and Ulf Leser. 2008. Querying Distributed RDF Data Sources with SPARQL. In Extended Semantic Web Conference (ESWC). Springer Berlin Heidelberg, Tenerife, Canary Islands, Spain, 524--538.
[14]
Muhammad Saleem, Ali Hasnain, and Axel-Cyrille Ngonga Ngomo. 2018. Large- RDFBench: A billion triples benchmark for SPARQL endpoint federation. J. Web Semant. 48 (2018), 85--125.
[15]
Muhammad Saleem and Axel-Cyrille Ngonga Ngomo. 2014. HiBISCuS: Hypergraph-based source selection for SPARQL endpoint federation. In European Semantic Web Conference (ESWC). Springer, Cham, 176--191.
[16]
Muhammad Saleem, Alexander Potocki, Tommaso Soru, Olaf Hartig, and Axel- Cyrille Ngonga Ngomo. 2018. CostFed: Cost-based query optimization for SPARQL endpoint federation. In 14th International Conference on Semantic Systems (SEMANTICS). Elsevier, Amsterdam, The Netherlands, 163--174.
[17]
Andreas Schwarte, Peter Haase, Katja Hose, Ralf Schenkel, and Michael Schmidt. 2011. FedX: Optimization techniques for federated query processing on linked data. In International Semantic Web Conference (ISWC). Springer, Bonn, Germany, 601--616.
[18]
Ruben Verborgh, Miel Vander Sande, Olaf Hartig, Joachim Van Herwegen, Laurens De Vocht, Ben De Meester, Gerald Haesendonck, and Pieter Colpaert. 2016. Triple Pattern Fragments: A low-cost knowledge graph interface for the Web. J. Web Sem. 37--38 (2016), 184--206.

Cited By

View all
  • (2024)CRAWD: Sampling-Based Estimation of Count-Distinct SPARQL QueriesThe Semantic Web – ISWC 202410.1007/978-3-031-77850-6_6(98-115)Online publication date: 27-Nov-2024

Index Terms

  1. Online Sampling of Summaries from Public SPARQL Endpoints

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '24: Companion Proceedings of the ACM Web Conference 2024
    May 2024
    1928 pages
    ISBN:9798400701726
    DOI:10.1145/3589335
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 May 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. federation
    2. sampling
    3. sparql
    4. summary

    Qualifiers

    • Short-paper

    Funding Sources

    Conference

    WWW '24
    Sponsor:
    WWW '24: The ACM Web Conference 2024
    May 13 - 17, 2024
    Singapore, Singapore

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)27
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 12 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)CRAWD: Sampling-Based Estimation of Count-Distinct SPARQL QueriesThe Semantic Web – ISWC 202410.1007/978-3-031-77850-6_6(98-115)Online publication date: 27-Nov-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media