GitHub - whuhsy/GEE-FuB: A Method for Constructing an Operator-Function Knowledge Base for Geospatial Code Generation Tasks Using Large Language Models

Geo-FuB: A Method for Constructing an Operator-Function Knowledge Base for Geospatial Code Generation Tasks Using Large Language Models

Overview Welcome to the Geo-FuB repository! This project introduces Geo-FuB, a comprehensive framework designed to build an operator-function knowledge base to enhance geospatial code generation using Large Language Models (LLMs). The goal of Geo-FuB is to tackle the challenge of coding hallucinations in LLMs when applied to geospatial tasks by leveraging a domain-specific knowledge base.

Core Components Geo-FuB consists of three key components, each contributing to the construction of the operator-function knowledge base:

Function Semantic Framework Construction (Geo-FuSE):

Utilizes techniques like Chain-of-Thought (CoT), TF-IDF, t-SNE, and Gaussian Mixture Model. Discovers and organizes semantic features from geospatial scripts. Frequent Operator Combination Statistics (Geo-FuST):

Employs Abstract Syntax Trees (ASTs) and the APRIORI algorithm. Identifies and analyzes frequent operator combinations in code structures. Combination and Semantic Framework Mapping (Geo-FuM):

Combines LLMs with fuzzy matching algorithms. Aligns operator combinations with the function semantic framework to construct the Geo-FuB knowledge base.

Repository Contents This repository includes the following components of the Geo-FuB framework:

Geo-FuSE: Function Semantic Framework Construction Geo-FuST: Frequent Operator Combination Statistics Geo-FuM: Combination and Semantic Framework Mapping Each directory contains code, documentation, and examples relevant to the respective component.

Dataset The Geo-FuB knowledge base has been constructed using 154,075 Google Earth Engine scripts, which provide a rich source of geospatial functions and operator relationships. The dataset is available within this repository for further experimentation and development.

Evaluation The Geo-FuB framework has undergone rigorous evaluation to ensure both structural integrity and semantic accuracy:

Overall Accuracy: 88.89% Structural Accuracy: 92.03% Semantic Accuracy: 86.79% These metrics underscore the robustness and reliability of Geo-FuB in improving geospatial code generation tasks.

Applications Geo-FuB is particularly useful for:

Enhancing geospatial code generation with LLMs, particularly in avoiding coding hallucinations. Optimizing research workflows by providing an empirical resource for further fine-tuning and applying the Retrieval-Augmented Generation (RAG) paradigm.

How to Use Clone the repository: git clone https://github.com/whuhsy/Geo-FuB.git

Use the provided scripts and datasets to build and test your own operator-function knowledge base.

Citation If you use Geo-FuB in your research, please cite our paper: @article{your_paper, title={Geo-FuB: A Method for Constructing an Operator-Function Knowledge Base for Geospatial Code Generation Tasks Using Large Language Models}, author={Shuyang Hou, Anqi Zhao, Jianyuan Liang, Zhangxiao Shen, Huayi Wu}, journal={Your Journal Name}, year={2024}, volume={}, pages={} }

Contact For questions or collaborations, please contact the corresponding author:

Huayi Wu Email: wuhuayi@whu.edu.cn

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Geo-FuM.xlsx		Geo-FuM.xlsx
Geo-FuSE.xlsx		Geo-FuSE.xlsx
Geo-FuST.csv		Geo-FuST.csv
README.md		README.md
relation_with_function.csv		relation_with_function.csv
tf-idf_output		tf-idf_output

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

whuhsy/GEE-FuB

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Packages