8000 GitHub - whuhsy/GEE-FuB: A Method for Constructing an Operator-Function Knowledge Base for Geospatial Code Generation Tasks Using Large Language Models
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
/ GEE-FuB Public

A Method for Constructing an Operator-Function Knowledge Base for Geospatial Code Generation Tasks Using Large Language Models

Notifications You must be signed in to change notification settings

whuhsy/GEE-FuB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Geo-FuB: A Method for Constructing an Operator-Function Knowledge Base for Geospatial Code Generation Tasks Using Large Language Models

Overview Welcome to the Geo-FuB repository! This project introduces Geo-FuB, a comprehensive framework designed to build an operator-function knowledge base to enhance geospatial code generation using Large Language Models (LLMs). The goal of Geo-FuB is to tackle the challenge of coding hallucinations in LLMs when applied to geospatial tasks by leveraging a domain-specific knowledge base.

Core Components Geo-FuB consists of three key components, each contributing to the construction of the operator-function knowledge base:

Function Semantic Framework Construction (Geo-FuSE):

Utilizes techniques like Chain-of-Thought (CoT), TF-IDF, t-SNE, and Gaussian Mixture Model. Discovers and organizes semantic features from geospatial scripts. Frequent Operator Combination Statistics (Geo-FuST):

Employs Abstract Syntax Trees (ASTs) and the APRIORI algorithm. Identifies and analyzes frequent operator combinations in code structures. Combination and Semantic Framework Mapping (Geo-FuM):

Combines LLMs with fuzzy matching algorithms. Aligns operator combinations with the function semantic framework to construct the Geo-FuB knowledge base.

Repository Contents This repository includes the following components of the Geo-FuB framework:

Geo-FuSE: Function Semantic Framework Construction Geo-FuST: Frequent Operator Combination Statistics Geo-FuM: Combination and Semantic Framework Mapping Each directory contains code, documentation, and examples relevant to the respective component.

Dataset The Geo-FuB knowledge base has been constructed using 154,075 Google Earth Engine scripts, which provide a rich source of geospatial functions and operator relationships. The dataset is available within this repository for further experimentation and development.

Evaluation The Geo-FuB framework has undergone rigorous evaluation to ensure both structural integrity and semantic accuracy:

Overall Accuracy: 88.89% Structural Accuracy: 92.03% Semantic Accuracy: 86.79% These metrics underscore the robustness and reliability of Geo-FuB in improving geospatial code generation tasks.

Applications Geo-FuB is particularly useful for:

Enhancing geospatial code generation with LLMs, particularly in avoiding coding hallucinations. Optimizing research workflows by providing an empirical resource for further fine-tuning and applying the Retrieval-Augmented Generation (RAG) paradigm.

How to Use Clone the repository: git clone https://github.com/whuhsy/Geo-FuB.git

Use the provided scripts and datasets to build and test your own operator-function knowledge base.

Citation If you use Geo-FuB in your research, please cite our paper: @article{your_paper, title={Geo-FuB: A Method for Constructing an Operator-Function Knowledge Base for Geospatial Code Generation Tasks Using Large Language Models}, author={Shuyang Hou, Anqi Zhao, Jianyuan Liang, Zhangxiao Shen, Huayi Wu}, journal={Your Journal Name}, year={2024}, volume={}, pages={} }

Contact For questions or collaborations, please contact the corresponding author:

Huayi Wu Email: wuhuayi@whu.edu.cn

About

A Method for Constructing an Operator-Function Knowledge Base for Geospatial Code Generation Tasks Using Large Language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0