Context-Aware Search for Environmental Data Using Dense Retrieval
<p>Neural search architecture: (<b>a</b>) indexing stage, (<b>b</b>) retrieval stage (search), and (<b>c</b>) re-ranking stage. The text embedding service using the SBERT model is used in both stages.</p> "> Figure 2
<p>Selection of the search algorithm used in the prototype: Users can either select from the dropdown menu or by passing a URL parameter “<span class="html-italic">?algorithm=sbert</span>” or “<span class="html-italic">?algorithm=bm25</span>”. BM25 is used by default.</p> "> Figure 3
<p>Comparison of the results of the dense retrievers for each test collection query. The left graphs show the <span class="html-italic">Precision</span> and the right shows the <span class="html-italic">Recall</span> values for varying top-k levels (5–100).</p> "> Figure 4
<p>Testing a re-trained version of DistilBERT-base using the modified corpus-(1b). The graph shows the Precision and Recall values for Q4 (“aircraft measurement”) for both models.</p> "> Figure 5
<p>Spatial re-ranking effects on Precision and Recall for the “greenhouse gases” query (Q2) extended with “Italy” using corpus-1b.</p> "> Figure 6
<p>Spatial extent (blue boxes) of the top 10 search results before spatial re-ranking.</p> "> Figure 7
<p>Spatial extent (blue boxes) of the top 10 search results after spatial re-ranking with the spatial extent of Italy.</p> ">
Abstract
:1. Introduction
- (1)
- (2)
- It is performant and efficient: efficiency tests by [5] showed that BM25 outperforms other retrieval methods in terms of retrieval latency and the required index size.
2. Related Work
3. Materials Methods
3.1. Domain Adaptation
3.1.1. Corpus Design
- (I)
- (II)
- Ontology concepts from the General Multilingual Environmental Thesaurus (GEMET) [49]. A subset of 187 concept definitions were selected. Concepts related to the GEMET themes “climate” [50] and “natural dynamics” [51] as well as concepts assigned to the GEMET group “ATMOSPHERE (air, climate)” [52] were used for this study.
- (III)
- Scientific literature from open access Copernicus journals [53] and scientific textbooks (passages from two basic literature sources were parsed: [54,55]). Thematically relevant journals (a selection of relevant Copernicus journals: Atmospheric Chemistry and Physics (ACP), Atmospheric Measurement Techniques (AMT), Advances in Statistical Climatology, Meteorology and Oceanography (ASCMO), Earth System Dynamics (ESD), Hydrology and Earth System Sciences (HESS), and Natural Hazards and Earth System Sciences (NHESS)) have been selected from the existing Copernicus publications and all available online abstracts were downloaded (21.618 abstracts and 2 textbooks).
3.1.2. Training Method
3.2. Spatial Re-Ranking Method
- (1)
- Spatial context parsing: A geocoding service was developed to parse the spatial context from the query. The service uses the following:
- (a)
- A BERT model fine-tuned on named entity recognition [61] to extract location entities from a query (e.g., “Berlin” from the query “climate data berlin”).
- (b)
- An open-source geocoding service [62] to generate a query bounding box (e.g., “[13.088345, 52.3382448, 13.7611609, 52.6755087]” for “Berlin”).
- (2)
- Calculating a spatial similarity metric: To calculate the spatial similarity metric between the query bounding box and the bounding boxes of candidates retrieved by the dense retriever, the Hausdorff distance is an appropriate measure, as proposed by [28,63]. The Hausdorff distance takes into account both the size and position of the geometries (polygons in this case). Unlike other metrics such as area of overlap, which can yield a value of zero when there is no overlap between the bounding boxes, the Hausdorff distance provides a more robust measure for re-ranking as it accounts for the spatial proximity and geometry even without a direct spatial overlap.
- (3)
- Re-ranking: the final step is to re-rank the results according to the descending Hausdorff distances.
3.3. Prototype Architecture
- (1)
- The SOLR index needed to be configured to store the embeddings produced by the dense retriever. Therefore, an additional field for storing multidimensional vectors was added to the SOLR schema.
- (2)
- Also, SOLR does not inherently support the calculation of scores based on embeddings. There is a plugin available for SOLR (also for ElasticSearch) that supports the score calculations using the query and document embeddings. This plugin, called SOLR Vector Scoring Plugin [64], was installed on the SOLR instance.
- (3)
- Moreover, a mechanism was required to generate embeddings for both queries and documents. For the prototype, a text embedding service was set up using the framework fastAPI [65]. It provides an API that takes text as input and returns embeddings using the SBERT model.
- (4)
- Finally, a mechanism was required that changes the built-in search method of CKAN from BM25 to the custom search approach using the dense retriever. For that purpose, a CKAN extension [66] was developed.
3.4. Evaluation Method
3.4.1. Test Collection
- (1)
- Q1: “climate simulation”;
- (2)
- Q2: “greenhouse gases”;
- (3)
- Q3: “observation data”;
- (4)
- Q4: “aircraft measurement”.
3.4.2. Evaluation Metrics
4. Results and Discussion
4.1. Dense Retrieval
- For Q3 (“observational data”), the dense retrievers fine-tuned with corpora including text passages from the scientific literature performed weaker.
- Corpus-1b (containing dataset descriptions and ontology concepts) showed a relatively weak performance for Q4 (“aircraft measurement”).
4.2. Spatial Re-Ranking
5. Future Works
5.1. Extension and Improvement of the IR Approach
- (1)
- Scale and resolution: Ensuring alignment between the spatial scale of the query context and the document context is essential. For instance, if data for “Europe” are queried, metadata records on the continental scale and a corresponding resolution should be ranked best.
- (2)
- Spatial relations: Some queries might contain relative spatial descriptions based on topological (e.g., within) or directional (e.g., north of) relations. This aspect has already been addressed in the study by [76].
- (3)
- Temporal aspects: Geospatial and especially climate data often contain a temporal dimension, such as “past” (e.g., observation data) or “future” (e.g., climate projections), a certain time period (e.g., climate reference period, decades) or temporal relations (e.g., before). Dense retrievers could especially be fine-tuned with keywords that indicate time-related context.
5.2. Improvement of the Evaluation
6. Summary and Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Hervey, T.; Lafia, S.; Kuhn, W. Search Facets and Ranking in Geospatial Dataset Search. In 11th International Conference on Geographic Information Science (GIScience 2021)—Part I. Leibniz International Proceedings in Informatics (LIPIcs); Schloss Dagstuhl—Leibniz-Zentrum für Informatik: Wadern, Germany, 2020; Volume 177, pp. 5:1–5:15. [Google Scholar] [CrossRef]
- Robertson, S.; Zaragoza, H. The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr. 2009, 3, 333–389. [Google Scholar] [CrossRef]
- ElasticSearch. Available online: https://www.elastic.co/de/ (accessed on 29 October 2024).
- Apache SOLR. Available online: https://solr.apache.org/ (accessed on 29 October 2024).
- Thakur, N.; Reimers, N.; Rücklé, A.; Srivastava, A.; Gurevych, I. BEIR: A Heterogeneous Benchmark for Zero-Shot Evaluation of Information Retrieval Models. arXiv 2021, arXiv:2104.08663. [Google Scholar]
- Furnas, G.; Landauer, T.; Gomez, L.; Dumais, S. The Vocabulary Problem in Human-System Communication. Commun. ACM 1987, 30, 964–971. [Google Scholar] [CrossRef]
- Lehmann, J.; Athanasiou, S.; Both, A.; Garcia Rojas, A.; Giannopoulos, G.; Hladky, D.; Le Grange, J.J.; Ngonga Ngomo, A.C.; Sherif, M.A.; Stadler, C.; et al. Managing Geospatial Linked Data in the GeoKnow Project. Semant. Web Earth Space Sci. Curr. Status Future Dir. 2015, 20, 51–78. [Google Scholar] [CrossRef]
- Jiang, S.; Hagelien, T.F.; Natvig, M.; Li, J. Ontology-Based Semantic Search for Open Government Data. In Proceedings of the 13th IEEE International Conference on Semantic Computing, ICSC 2019, Newport Beach, CA, USA, 30 January–1 February 2019; pp. 7–15. [Google Scholar] [CrossRef]
- Yue, P.; Guo, X.; Zhang, M.; Jiang, L.; Zhai, X. Linked Data and SDI: The Case on Web Geoprocessing Workflows. ISPRS J. Photogramm. Remote Sens. 2015, 114, 245–257. [Google Scholar] [CrossRef]
- Geonetwork. Available online: https://geonetwork-opensource.org/ (accessed on 29 October 2024).
- CKAN. Available online: https://ckan.org/ (accessed on 29 October 2024).
- Chapman, A.; Simperl, E.; Koesten, L.; Konstantinidis, G.; Ibáñez, L.D.; Kacprzak, E.; Groth, P. Dataset Search: A Survey. VLDB J. 2020, 29, 251–272. [Google Scholar] [CrossRef]
- ISO19115; Geographic Information—Metadata. ISO: Geneva, Switzerland, 2014. Available online: https://www.iso.org/standard/53798.html (accessed on 29 October 2024).
- Dublin Core. Available online: https://www.dublincore.org/specifications/dublin-core/dces/ (accessed on 29 October 2024).
- Wagner, M.; Henzen, C.; Müller-Pfefferkorn, R. A Research Data Infrastructure Component for the Automated Metadata and Data Quality Extraction to Foster the Provision of FAIR Data in Earth System Sciences. AGILE GIScience Ser. 2021, 2, 41. [Google Scholar] [CrossRef]
- Schauppenlehner, T.; Muhar, A. Theoretical Availability Versus Practical Accessibility: The Critical Role of Metadata Management in Open Data Portals. Sustainability 2018, 10, 545. [Google Scholar] [CrossRef]
- Quarati, A. Open Government Data: Usage Trends and Metadata Quality. J. Inf. Sci. 2021, 49, 887–910. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 4171–4186. [Google Scholar]
- Zhao, W.X.; Liu, J.; Ren, R.; Wen, J.-R. Dense Text Retrieval Based on Pretrained Language Models: A Survey. ACM Trans. Inf. Syst. 2023, 42, 89. [Google Scholar] [CrossRef]
- Nakamura, T.A.; Calais, P.H.; Reis, D.C.; Lemos, A.P. An Anatomy for Neural Search Engines. Inf. Sci. 2019, 480, 339–353. [Google Scholar] [CrossRef]
- Beltagy, I.; Lo, K.; Cohan, A. SCIBERT: A Pretrained Language Model for Scientific Text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 3615–3620. [Google Scholar] [CrossRef]
- Li, Z.; Kim, J.; Chiang, Y.-Y.; Chen, M. SpaBERT: A Pretrained Language Model from Geographic Data for Geo-Entity Representation. arXiv 2022, arXiv:2210.12213. [Google Scholar]
- Wang, K.; Thakur, N.; Reimers, N.; Gurevych, I. GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval. arXiv 2021, arXiv:2112.07577. [Google Scholar]
- Wang, K.; Reimers, N.; Gurevych, I. TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning. arXiv 2021, arXiv:2104.06979. [Google Scholar]
- Hu, Y.; Gao, S.; Lunga, D.; Li, W.; Newsam, S.; Bhaduri, B. GeoAI at ACM SIGSPATIAL. SIGSPATIAL Spec. 2019, 11, 5–15. [Google Scholar] [CrossRef]
- Corcoran, P.; Spasić, I. Self-Supervised Representation Learning for Geographical Data—A Systematic Literature Review. ISPRS Int. J. Geo-Inf. 2023, 12, 64. [Google Scholar] [CrossRef]
- Chen, Y.; Huang, W.; Zhao, K.; Jiang, Y.; Cong, G. Self-supervised Learning for Geospatial AI: A Survey. arXiv 2024, arXiv:2408.12133. [Google Scholar]
- Lacasta, J.; Lopez-Pellicer, F.J.; Espejo-García, B.; Nogueras-Iso, J.; Zarazaga-Soria, F.J. Aggregation-based information retrieval system for geospatial data catalogs. Int. J. Geogr. Inf. Sci. 2017, 31, 1583–1605. [Google Scholar] [CrossRef]
- Lacasta, J.; Lopez-Pellicer, F.J.; Zarazaga-Soria, J.; Béjar, R.; Nogueras-Iso, J. Approaches for the Clustering of Geographic Metadata and the Automatic Detection of Quasi-Spatial Dataset Series. ISPRS Int. J. Geo-Inf. 2022, 11, 87. [Google Scholar] [CrossRef]
- Chen, Z.; Song, J.; Yang, Y. Similarity measurement of metadata of geospatial data: An artificial neural network approach. ISPRS Int. J. Geo-Inf. 2018, 7, 90. [Google Scholar] [CrossRef]
- Munir, K.; Sheraz Anjum, M. The use of ontologies for effective knowledge modelling and information retrieval. Appl. Comput. Inform. 2018, 14, 116–126. [Google Scholar] [CrossRef]
- Asim, M.N.; Wasim, M.; Khan, M.U.G.; Mahmood, N.; Mahmood, W. The Use of Ontology in Retrieval: A Study on Textual, Multilingual, and Multimedia Retrieval. IEEE Access 2019, 7, 21662–21686. [Google Scholar] [CrossRef]
- Noy, N.; Burgess, M.; Brickley, D. Google Dataset Search: Building a Search Engine for Datasets in an Open Web Ecosystem. In Proceedings of the World Wide Web Conference (WWW) 2019, Francisco, CA, USA, 13–17 May 2019; pp. 1365–1375. [Google Scholar] [CrossRef]
- Zrhal, M.; Bucher, B.; Hamdi, F.; Van Damme, M.D. Identifying the Key Resources and Missing Elements to Build a Knowledge Graph Dedicated to Spatial Dataset Search. Procedia Comput. Sci. 2022, 207, 2911–2920. [Google Scholar] [CrossRef]
- Glocker, K.; Knurr, A.; Dieter, J.; Dominick, F.; Forche, M.; Koch, C.; Pascoe Pérez, A.; Roth, B.; Ückert, F. Optimizing a Query by Transformation and Expansion. Stud. Health Technol. Inform. 2017, 243, 197–201. [Google Scholar] [CrossRef] [PubMed]
- Mai, G.; Janowicz, K.; Prasad, S.; Shi, M.; Cai, L.; Zhu, R.; Regalia, B.; Lao, N. Semantically-Enriched Search Engine for Geoportals: A Case Study with ArcGIS Online. AGILE GIScience Ser. 2020, 1, 13. [Google Scholar] [CrossRef]
- Sun, K.; Zhu, Y.; Pan, P.; Hou, Z.; Wang, D.; Li, W.; Song, J. Geospatial Data Ontology: The Semantic Foundation of Geospatial Data Integration and Sharing. Big Earth Data 2019, 3, 269–296. [Google Scholar] [CrossRef]
- Esteva, A.; Kale, A.; Paulus, R.; Hashimoto, K.; Yin, W.; Radev, D.; Socher, R. COVID-19 Information Retrieval with Deep-Learning Based Semantic Search, Question Answering, and Abstractive Summarization. Npj Digit. Med. 2021, 4, 68. [Google Scholar] [CrossRef]
- Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Hong Kong, China, 3–7 November 2019; pp. 3982–3992. [Google Scholar] [CrossRef]
- Coelho, J.; Magalhães, J.; Martins, B. Improving Neural Models for the Retrieval of Relevant Passages to Geographical Queries. In Proceedings of the ACM International Symposium on Advances in Geographic Information Systems, Beijing, China, 2–5 November 2021; pp. 268–277. [Google Scholar] [CrossRef]
- MS MARCO. Available online: https://microsoft.github.io/msmarco/ (accessed on 29 October 2024).
- Gao, Y.; Xiong, Y.; Wang, S.; Wang, H. GeoBERT: Pre-Training Geospatial Representation Learning on Point-of-Interest. Appl. Sci. 2022, 12, 12942. [Google Scholar] [CrossRef]
- Mai, G.; Janowicz, K.; Hu, Y.; Gao, S.; Yan, B.; Zhu, R.; Cai, L.; Lao, N. A Review of Location Encoding for GeoAI: Methods and Applications. Int. J. Geogr. Inf. Sci. 2022, 36, 639–673. [Google Scholar] [CrossRef]
- Syed, M.A.; Arsevska, E.; Roche, M.; Teisseire, M. GeospatRE: Extraction and Geocoding of Spatial Relation Entities in Textual Documents. Cartogr. Geogr. Inf. Sci. 2023, 1–16. [Google Scholar] [CrossRef]
- EEA Geospatial Data Catalogue. Available online: https://sdi.eea.europa.eu/catalogue/srv/eng/catalog.search#/home (accessed on 29 October 2024).
- United Nations FAO Map Catalogue. Available online: https://data.apps.fao.org/map/catalog/srv/ger/catalog.search#/home (accessed on 29 October 2024).
- Copernicus Data Store. Available online: https://cds.climate.copernicus.eu/#!/home (accessed on 29 October 2024).
- data.europe.eu. Available online: https://data.europa.eu/en (accessed on 29 October 2024).
- GEMET. Available online: https://www.eionet.europa.eu/gemet/en/themes/ (accessed on 29 October 2024).
- GEMET Theme Climate. Available online: http://www.eionet.europa.eu/gemet/theme/7 (accessed on 29 October 2024).
- GEMET Theme Natural Dynamics. Available online: http://www.eionet.europa.eu/gemet/theme/8 (accessed on 29 October 2024).
- GEMET Atmosphere (Air, Climate). Available online: http://www.eionet.europa.eu/gemet/group/618 (accessed on 29 October 2024).
- Copernicus Open Access Journals. Available online: https://publications.copernicus.org/open-access_journals/journals_by_subject.html (accessed on 29 October 2024).
- Kotamarthi, R.; Hayhoe, K.; Mearns, L.; Wuebbles, D.; Jacobs, J.; Jurado, J. Downscaling Techniques for High-Resolution Climate Projections: From Global Change to Local Impacts; Cambridge University Press: Cambridge, UK; New York, NY, USA, 2021. [Google Scholar] [CrossRef]
- Spiridonov, V.; Curic, M. Fundamentals of Meteorology; Springer: Cham, Switzerland, 2020; pp. 1–437. [Google Scholar] [CrossRef]
- Nogueira, R.; Yang, W.; Lin, J.; Cho, K. Document Expansion by Query Prediction. arXiv 2019, arXiv:1904.08375. [Google Scholar]
- Cross-Encoder ms-marco-MiniLM-L-6-v2. Available online: https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-6-v2 (accessed on 29 October 2024).
- Hofstätter, S.; Althammer, S.; Schröder, M.; Sertkan, M.; Hanbury, A. Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation. arXiv 2020, arXiv:2010.02666. [Google Scholar]
- Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv 2019, arXiv:1910.01108. [Google Scholar]
- KeyBERT by Maarten Grootendorst. Available online: https://github.com/MaartenGr/KeyBERT (accessed on 29 October 2024).
- BERT NER Model (dslim/bert-base-NER-uncased). Available online: https://huggingface.co/dslim/bert-base-NER-uncased (accessed on 29 October 2024).
- Photon Geocoding API by Komoot. Available online: https://photon.komoot.io/ (accessed on 29 October 2024).
- Degbelo, A.; Teka, B.B. Spatial Search Strategies for Open Government Data: A Systematic Comparison. In Proceedings of the 13th Workshop on Geographic Information Retrieval, Lyon, France, 28–29 November 2019. [Google Scholar] [CrossRef]
- SOLR Vector Scoring. Available online: https://github.com/saaay71/solr-vector-scoring (accessed on 29 October 2024).
- FastAPI. Available online: https://fastapi.tiangolo.com/ (accessed on 29 October 2024).
- CKAN Solr VectorStore Extension. Available online: https://github.com/simeonwetzel/ckanext-solr-vectorstore (accessed on 29 October 2024).
- NIST COVID-19 Track. Available online: https://ir.nist.gov/covidSubmit/index.html (accessed on 29 October 2024).
- BioASQ. Available online: http://bioasq.org/ (accessed on 29 October 2024).
- WDC Climate Data Center. Available online: https://www.wdc-climate.de/ui (accessed on 29 October 2024).
- Derczynski, L. Complementarity, F-score, and NLP evaluation. In Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, Portorož, Slovenia, 23–28 May 2016; pp. 261–266, ISBN 9782951740891. [Google Scholar]
- Zhu, M. Recall, Precision and Average Precision; Department of Statistics and Actuarial Science, University of Waterloo: Waterloo, ON, Canada, 2004; pp. 1–11. [Google Scholar]
- Robertson, S. A new interpretation of average precision. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, 20–24 July 2008; pp. 689–690. [Google Scholar] [CrossRef]
- Zumwald, M.; Knüsel, B.; Baumberger, C.; Hirsch Hadorn, G.; Bresch, D.N.; Knutti, R. Understanding and assessing uncertainty of observational climate datasets for model evaluation using ensembles. Wiley Interdiscip. Rev. Clim. Chang. 2020, 11, e654. [Google Scholar] [CrossRef]
- Henzen, C.; Mäs, S.; Bernard, L. Provenance information in geodata infrastructures. In Lecture Notes in Geoinformation and Cartography; Springer: Cham, Switzerland, 2013; pp. 133–151. [Google Scholar] [CrossRef]
- Jiang, Y.; Li, Y.; Yang, C.; Hu, F.; Armstrong, E.M.; Huang, T.; Moroni, D.; McGibbney, L.J.; Finch, C.J. Towards intelligent geospatial data discovery: A machine learning framework for search ranking. Int. J. Digit. Earth 2018, 11, 956–971. [Google Scholar] [CrossRef]
- Shin, H.; Park, J.; Yuk, D.; Lee, J. BERT-based Spatial Information Extraction. In Proceedings of the Third International Workshop On Spatial Language Understanding (SpLU 2020), Virtual, 19 November 2020; Volume 8, pp. 10–17. [Google Scholar]
- PROV-O W3C Recommendation. Available online: https://www.w3.org/TR/prov-o/ (accessed on 29 October 2024).
- Text REtrieval Conference (TREC) by NIST. Available online: https://trec.nist.gov/ (accessed on 29 October 2024).
- Sanderson, M. Test collection based evaluation of information retrieval systems. Found. Trends Inf. Retr. 2010, 4, 247–375. [Google Scholar] [CrossRef]
- Buckley, C.; Dimmick, D.; Soboroff, I.; Voorhees, E. Bias and the limits of pooling for large collections. Inf. Retr. 2007, 10, 491–508. [Google Scholar] [CrossRef]
- Wang, Y.; Wang, L.; Li, Y.; He, D.; Chen, W.; Liu, T.Y. A theoretical analysis of NDCG ranking measures. J. Mach. Learn. Res. 2013, 30, 25–54. [Google Scholar]
- CCTC. Available online: https://github.com/simeonwetzel/CCTC (accessed on 29 October 2024).
Corpus Compositions | Contents | Number of Text Passages | Average Number of Words per Passage | Number of Words |
---|---|---|---|---|
(1a) | Dataset descriptions | 10,573 | 92 | 975,659 |
(1b) | Dataset descriptions + ontology concepts | 10,760 | 96 | 1,038,174 |
(2) | The scientific literature | 22,137 | 255 | 5,645,063 |
(3) = (1b) + (2) | The scientific literature + dataset descriptions + ontology concepts | 33,314 | 202 | 6,715,384 |
Retrieved | Not Retrieved | |
---|---|---|
Relevant | True Positive (TP) | False Negative (FN) |
Irrelevant | False Positive (FP) | True Negative (TN) |
Rank k | 1 | 2 | 3 | 4 | 5 |
Relevance of | relevant | irrelevant | irrelevant | relevant | relevant |
1 | 0 | 0 | 1 | 1 | |
1/1 | 1/2 | 1/3 | 2/4 | 3/5 | |
∗ | 1 | 0 | 0 | 1/2 | 3/5 |
1/3 | 1/3 | 1/3 | 2/3 | 3/3 | |
1 | 1/2 | 1/3 | 0.375 | 0.42 |
Retrieval Model | Q1 | Q2 | Q3 | Q4 | |
---|---|---|---|---|---|
Relevant Items | 1384 | 975 | 285 | 139 | |
BM25 | 0.8726 | 0.8432 | 0.5760 | 0.9358 | 0.8069 |
DistilBERT-base | 0.4216 | 0.1732 | 0.2237 | 0.2440 | 0.2656 |
corpus-1a | 1 | 0.9670 | 0.9491 | 0.9180 | 0.9585 |
corpus-1b | 0.9986 | 0.9603 | 0.9894 | 0.7973 | 0.9364 |
corpus-2 | 1 | 0.9677 | 0.8066 | 0.9707 | 0.9362 |
corpus-3 | 1 | 0.9680 | 0.7177 | 0.8917 | 0.8943 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wetzel, S.; Mäs, S. Context-Aware Search for Environmental Data Using Dense Retrieval. ISPRS Int. J. Geo-Inf. 2024, 13, 380. https://doi.org/10.3390/ijgi13110380
Wetzel S, Mäs S. Context-Aware Search for Environmental Data Using Dense Retrieval. ISPRS International Journal of Geo-Information. 2024; 13(11):380. https://doi.org/10.3390/ijgi13110380
Chicago/Turabian StyleWetzel, Simeon, and Stephan Mäs. 2024. "Context-Aware Search for Environmental Data Using Dense Retrieval" ISPRS International Journal of Geo-Information 13, no. 11: 380. https://doi.org/10.3390/ijgi13110380
APA StyleWetzel, S., & Mäs, S. (2024). Context-Aware Search for Environmental Data Using Dense Retrieval. ISPRS International Journal of Geo-Information, 13(11), 380. https://doi.org/10.3390/ijgi13110380