[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3360901.3364436acmconferencesArticle/Chapter ViewAbstractPublication Pagesk-capConference Proceedingsconference-collections
research-article

More Complete Resultset Retrieval from Large Heterogeneous RDF Sources

Published: 23 September 2019 Publication History

Abstract

Over the last years, the Web of Data has grown significantly. Various interfaces such as LOD Stats, LOD Laudromat, SPARQL endpoints provide access to the hundered of thousands of RDF datasets, representing billions of facts. These datasets are available in different formats such as raw data dumps and HDT files or directly accessible via SPARQL endpoints. Querying such large amount of distributed data is particularly challenging and many of these datasets cannot be directly queried using the SPARQL query language. In order to tackle these problems, we present WimuQ, an integrated query engine to execute SPARQL queries and retrieve results from large amount of heterogeneous RDF data sources. Presently, WimuQ is able to execute both federated and non-federated SPARQL queries over a total of 668,166 datasets from LOD Stats and LOD Laudromat as well as 559 active SPARQL endpoints. These data sources represent a total of 221.7 billion triples from more than 5 terabytes of information from datasets retrieved using the service "Where is My URI" (WIMU). Our evaluation on state-of-the-art real-data benchmarks shows that WimuQ retrieves more complete results for the benchmark queries.

References

[1]
I. Abdelaziz, E. Mansour, M. Ouzzani, A. Aboulnaga, and P. Kalnis. Lusail: a system for querying linked data at scale. Proceedings of the VLDB Endowment, 11(4):485--498, 2017.
[2]
M. Acosta, M.-E. Vidal, T. Lampo, J. Castillo, and E. Ruckhaus. ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints. In L. Aroyo, C. Welty, H. Alani, J. Taylor, A. Bernstein, L. Kagal, N. Noy, and E. Blomqvist, editors, The Semantic Web -- ISWC 2011, volume 7031 of Lecture Notes in Computer Science, pages 18--34. Springer Berlin Heidelberg, 2011.
[3]
Z. Akar, T. G. Halacc, E. E. Ekinci, and O. Dikenelli. Querying the Web of Interlinked Datasets using VoID Descriptions. In C.Bizer et al., editors, Linked Data on the Web (LDOW2012) in CEUR Workshop Proceedings, volume 937, 2012.
[4]
S. Auer, J. Demter, M. Martin, and J. Lehmann. Lodstats--an extensible framework for high-performance dataset analytics. In International Conference on Knowledge Engineering and Knowledge Management, pages 353--362. Springer, 2012.
[5]
W. Beek, L. Rietveld, H. R. Bazoobandi, J. Wielemaker, and S. Schlobach. Lod laundromat: a uniform way of publishing other people's dirty data. In International Semantic Web Conference, pages 213--228. Springer, 2014.
[6]
A. Charalambidis, A. Troumpoukis, and S. Konstantopoulos. Semagrow: Optimizing federated sparql queries. In Proceedings of the 11th International Conference on Semantic Systems, SEMANTICS '15, pages 121--128, New York, NY, USA, 2015. ACM.
[7]
K. M. Endris, M. Galkin, I. Lytra, M. N. Mami, M.-E. Vidal, and S. Auer. Querying interlinked data by bridging rdf molecule templates. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XXXIX, pages 1--42. Springer, 2018.
[8]
I. Ermilov, J. Lehmann, M. Martin, and S. Auer. LODStats: The data web census dataset. In International Semantic Web Conference, pages 38--46. Springer, 2016.
[9]
J. D. Fernández, W. Beek, M. A. Mart'inez-Prieto, and M. Arias. Lod-a-lot: A queryable dump of the lod cloud. 2017.
[10]
J. D. Fernández, M. A. Martínez-Prieto, C. Gutiérrez, A. Polleres, and M. Arias. Binary rdf representation for publication and exchange (hdt). Web Semantics: Science, Services and Agents on the World Wide Web, 19(0), 2013.
[11]
G. H. Fletcher. An algebra for basic graph patterns. In workshop on Logic in Databases, Rome, Italy, 2008.
[12]
O. Görlitz and S. Staab. SPLENDID: SPARQL Endpoint Federation Exploiting VoID Descriptions. In O. Hartig, A. Harth, and J. F. Sequeda, editors, 2nd International Workshop on Consuming Linked Data (COLD 2011) in CEUR Workshop Proceedings, volume 782, October 2011.
[13]
O. Hartig. Zero-knowledge query planning for an iterator implementation of link traversal based query execution. In G. Antoniou, M. Grobelnik, E. Simperl, B. Parsia, D. Plexousakis, P. De Leenheer, and J. Pan, editors, The Semantic Web: Research and Applications, pages 154--169, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg.
[14]
O. Hartig. An Overview on Execution Strategies for Linked Data Queries. In Datenbank-Spektrum, volume 13, pages 89--99. Springer, 2013.
[15]
O. Hartig. Squin: a traversal based query execution system for the web of linked data. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 1081--1084. ACM, 2013.
[16]
O. Hartig, C. Bizer, and J.-C. Freytag. Executing sparql queries over the web of linked data. In International Semantic Web Conference, pages 293--309. Springer, 2009.
[17]
O. Hartig and M. T. Özsu. Walking without a map: Ranking-based traversal for querying linked data. In International Semantic Web Conference, pages 305--324. Springer, 2016.
[18]
G. Ladwig and T. Tran. Linked Data Query Processing Strategies. In P. Patel-Schneider, Y. Pan, P. Hitzler, P. Mika, L. Zhang, J. Pan, I. Horrocks, and B. Glimm, editors, The Semantic Web -- ISWC 2010, volume 6496 of Lecture Notes in Computer Science, pages 453--469. Springer Berlin Heidelberg, 2010.
[19]
G. Ladwig and T. Tran. SIHJoin: Querying Remote and Local Linked Data. In G. Antoniou, M. Grobelnik, E. Simperl, B. Parsia, D. Plexousakis, P. De Leenheer, and J. Pan, editors, The Semantic Web: Research and Applications, volume 6643 of Lecture Notes in Computer Science, pages 139--153. Springer Berlin Heidelberg, 2011.
[20]
S. Lynden, I. Kojima, A. Matono, and Y. Tanimura. ADERIS: An Adaptive Query Processor for Joining Federated SPARQL Endpoints. In R. Meersman, T. Dillon, P. Herrero, A. Kumar, M. Reichert, L. Qing, B.-C. Ooi, E. Damiani, D.C. Schmidt, J. White, M. Hauswirth, P. Hitzler, M. Mohania, editors, On the Move to Meaningful Internet Systems (OTM2011), Part II. LNCS, volume 7045, pages 808--817. Springer Heidelberg, 2011.
[21]
E. Marx, C. Baron, T. Soru, and S. Auer. Kbox-transparently shifting query execution on knowledge graphs to the edge. In 2017 IEEE 11th International Conference on Semantic Computing (ICSC), pages 125--132. IEEE, 2017.
[22]
E. Marx, S. Shekarpour, T. Soru, A. M. Bracs oveanu, M. Saleem, C. Baron, A. Weichselbraun, J. Lehmann, A.-C. N. Ngomo, and S. Auer. Torpedo: Improving the state-of-the-art rdf dataset slicing. In Semantic Computing (ICSC), 2017 IEEE 11th International Conference On, pages 149--156. IEEE, 2017.
[23]
T. Minier, H. Skaf-Molli, and P. Molli. Sage: Preemptive query execution for high data availability on the web. CoRR, abs/1806.00227, 2018.
[24]
G. Montoya, H. Skaf-Molli, and K. Hose. The odyssey approach for optimizing federated sparql queries. In International Semantic Web Conference, pages 471--489. Springer, 2017.
[25]
G. Montoya, H. Skaf-Molli, P. Molli, and M.-E. Vidal. ISWC, chapter Federated SPARQL Queries Processing with Replicated Fragments. 2015.
[26]
A. Potocki, M. Saleem, T. Soru, O. Hartig, M. Voigt, and A.-C. N. Ngomo. Federated sparql query processing via costfed. 2017.
[27]
B. Quilitz and U. Leser. Querying Distributed RDF Data Sources with SPARQL. In S. Bechhofer, M. Hauswirth, J. Hoffmann, and M. Koubarakis, editors, The Semantic Web: Research and Applications, volume 5021 of Lecture Notes in Computer Science, pages 524--538. Springer Berlin Heidelberg, 2008.
[28]
M. Saleem, A. Hasnain, and A.-C. N. Ngomo. Largerdfbench: A billion triples benchmark for sparql endpoint federation. Journal of Web Semantics: Science, Services and Agents on the World Wide Web, 0(0), 2016.
[29]
M. Saleem, A. Hasnain, and A.-C. N. Ngomo. Largerdfbench: a billion triples benchmark for sparql endpoint federation. Journal of Web Semantics, 48:85--125, 2018.
[30]
M. Saleem, Y. Khan, A. Hasnain, I. Ermilov, and A.-C. Ngonga Ngomo. A fine-grained evaluation of sparql endpoint federation systems. Semantic Web Journal, pages 1--26, 2015.
[31]
M. Saleem, Q. Mehmood, and A.-C. N. Ngomo. Feasible: A feature-based sparql benchmark generation framework. In The Semantic Web-ISWC 2015, pages 52--69. Springer, 2015.
[32]
M. Saleem and A.-C. Ngonga Ngomo. HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation. In V. Presutti, C. d'Amato, F. Gandon, M. d'Aquin, S. Staab, and A. Tordai, editors, The Semantic Web: Trends and Challenges, volume 8465 of Lecture Notes in Computer Science, pages 176--191. Springer International Publishing, 2014.
[33]
M. Saleem, A.-C. Ngonga Ngomo, J. Xavier Parreira, H. Deus, and M. Hauswirth. DAW: Duplicate-AWare Federated Query Processing over the Web of Data. In H. Alani, L. Kagal, A. Fokoue, P. Groth, C. Biemann, J. Parreira, L. Aroyo, N. Noy, C. Welty, and K. Janowicz, editors, The Semantic Web -- ISWC 2013, volume 8218 of Lecture Notes in Computer Science, pages 574--590. Springer Berlin Heidelberg, 2013.
[34]
M. Saleem, S. S. Padmanabhuni, A.-C. N. Ngomo, J. S. Almeida, S. Decker, and H. F. Deus. Linked Cancer Genome Atlas Database. In M. Sabou, E. Blomqvist, T. Di Noia, H. Sack, T. Pellegrini, editors, Proceedings of the 9th International Conference on Semantic Systems, pages 129--134, New York, NY, USA, 2013. ACM.
[35]
M. Saleem, A. Potocki, T. Soru, O. Hartig, and A.-C. N. Ngomo. Costfed: Cost-based query optimization for sparql endpoint federation. Semantics, 137:163--174, 2018.
[36]
M. Schmidt, O. Görlitz, P. Haase, G. Ladwig, A. Schwarte, and T. Tran. FedBench: A Benchmark Suite for Federated Semantic Data Query Processing. In L. Aroyo, C. Welty, H. Alani, J. Taylor, A. Bernstein, L. Kagal, N. Noy, and E. Blomqvist, editors, The Semantic Web -- ISWC 2011, volume 7031 of Lecture Notes in Computer Science, pages 585--600. Springer Berlin Heidelberg, 2011.
[37]
A. Schwarte, P. Haase, K. Hose, R. Schenkel, and M. Schmidt. FedX: Optimization Techniques for Federated Query Processing on Linked Data. In L. Aroyo, C. Welty, H. Alani, J. Taylor, A. Bernstein, L. Kagal, N. Noy, and E. Blomqvist, editors, The Semantic Web -- ISWC 2011, volume 7031 of Lecture Notes in Computer Science, pages 601--616. Springer Berlin Heidelberg, 2011.
[38]
A. P. Sheth, S. Staab, M. Dean, M. Paolucci, D. Maynard, T. Finin, and K. Thirunarayan. The semantic web-iswc 2008. 2008.
[39]
R. Taelman, J. Van Herwegen, M. Vander Sande, and R. Verborgh. Comunica: a modular sparql query engine for the web. In International Semantic Web Conference, pages 239--255. Springer, 2018.
[40]
A. Valdestilhas, T. Soru, M. Nentwig, E. Marx, M. Saleem, and A.-C. N. Ngomo. Where is my uri? In European Semantic Web Conference, pages 671--681. Springer, 2018.
[41]
R. Verborgh, M. Vander Sande, O. Hartig, J. Van Herwegen, L. De Vocht, B. De Meester, G. Haesendonck, and P. Colpaert. Triple pattern fragments: a low-cost knowledge graph interface for the web. Web Semantics: Science, Services and Agents on the World Wide Web, 37:184--206, 2016.

Cited By

View all
  • (2024)A systematic overview of data federation systemsSemantic Web10.3233/SW-22320115:1(107-165)Online publication date: 12-Jan-2024
  • (2021)Query Processing over Multiple Knowledge Bases and Text DocumentsThe 23rd International Conference on Information Integration and Web Intelligence10.1145/3487664.3487802(212-216)Online publication date: 29-Nov-2021
  • (2020)Towards an RDF Knowledge Graph of Scholars from Early Modern History2020 IEEE 14th International Conference on Semantic Computing (ICSC)10.1109/ICSC47212.2020.9309131(471-472)Online publication date: Feb-2020

Index Terms

  1. More Complete Resultset Retrieval from Large Heterogeneous RDF Sources

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    K-CAP '19: Proceedings of the 10th International Conference on Knowledge Capture
    September 2019
    281 pages
    ISBN:9781450370080
    DOI:10.1145/3360901
    • General Chairs:
    • Mayank Kejriwal,
    • Pedro Szekely,
    • Program Chair:
    • Raphaël Troncy
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 September 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data integration
    2. data interlinking
    3. dataset discovery
    4. link traversal based sparql query
    5. resultset coverage
    6. sparql federated query

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    K-CAP '19
    Sponsor:
    K-CAP '19: Knowledge Capture Conference
    November 19 - 21, 2019
    CA, Marina Del Rey, USA

    Acceptance Rates

    Overall Acceptance Rate 55 of 198 submissions, 28%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 11 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A systematic overview of data federation systemsSemantic Web10.3233/SW-22320115:1(107-165)Online publication date: 12-Jan-2024
    • (2021)Query Processing over Multiple Knowledge Bases and Text DocumentsThe 23rd International Conference on Information Integration and Web Intelligence10.1145/3487664.3487802(212-216)Online publication date: 29-Nov-2021
    • (2020)Towards an RDF Knowledge Graph of Scholars from Early Modern History2020 IEEE 14th International Conference on Semantic Computing (ICSC)10.1109/ICSC47212.2020.9309131(471-472)Online publication date: Feb-2020

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media