Retrieval Augmented Generation Text-to-SQL Application To Analyze US Government Contract Data

This python application uses Retrieval Augmented Generation(RAG) to ask the data questions directly using plain English. The application then uses OPENAI's GPT 4.o-mini model to convert this question i.e prompt into SQL, which then queries the DuckDb database that stores the data and returns the solution, in addition to the SQL statement that generates this data.

The simplicity of testing the correctness of the answers makes this application a powerful, and useful use of Large Language Models(LLMs) in Data Science that can directly provide values to Business Users who are unfamiliar with SQL by allowing them to directly use Business Questions to answer Data Questions in seconds with a Gradio Application.

Demo

https://www.loom.com/share/f292263472ae4e9cbfa813655bc7c654?sid=c3a5bf89-f80f-4d69-bae0-79beee641cbe

Customize this Application with your own Data

If you found the app useful, please make sure to give us a star!

Clone this Repository

git clone https://github.com/LNshuti/usgov-contracts-rag.git

Setup your Environment

conda env create --file=environment.yaml

Activate your Environment

conda activate gov-data

Install Dependencies

pip install -r requirements.txt

Add your data in the data folder

cd data

cp <your_data> .

Update path in the connect_db.py file to load your Excel/CSV data into the database

CSV_FILE_PATH = 'data/your_data.csv'
DB_FILE_PATH = 'gov-contracts.db'
TABLE_NAME = 'your_table_name'

Run the connect_db.py python helper file to load your data into the database

python connect_db.py

Examine the data with Datasette

datasette serve gov-contracts.db

Run the app.py python file to start the Gradio Application

python run app/app.py

Enhancement: XGBOOST prediction of government contract awards with confidence intervals

In addition to the previous work exploring the dataset, I've developed an appliction using the same dataset to predict award amounts based on the other features. This app has a gradio interface, and it's built in python, hosted on Huggingface. The app takes the dataset as an input to a gradient boosted tree model, after feature engineering. The user can select a combination of the features to produce the predicted award amount with a 95% confidence interval based on the bootstrap method.

application

References.

Harshad Suryawanshi. From Natural Language to SQL(Na2SQL): Extracting Insights from Databases using OPENAI GPT3.5 and LlamaIndex. https://github.com/AI-ANK/Na2SQL
Ravi Theja. Evaluate RAG with Llamaindex. https://cookbook.openai.com/examples/evaluation/evaluate_rag_with_llamaindex
Mostafa Ibrahim. A Gentle Introduction to Advanced RAG. https://wandb.ai/mostafaibrahim17/ml-articles/reports/A-Gentle-Introduction-to-Advanced-RAG--Vmlldzo2NjIyNTQw
Adam Obeng; J.C. Zhong; Charlie Gu. How we built Text-to-SQL at Pinterest. https://medium.com/pinterest-engineering/how-we-built-text-to-sql-at-pinterest-30bad30dabff

Name		Name	Last commit message	Last commit date
Latest commit History 205 Commits
.cache		.cache
.github/workflows		.github/workflows
.gradio		.gradio
.vscode		.vscode
audits		audits
docs		docs
logs		logs
macros		macros
models		models
prod-example		prod-example
seeds		seeds
src		src
static/logo		static/logo
tests		tests
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
connect_db.py		connect_db.py
db.db		db.db
environment.yaml		environment.yaml
gov-contracts.db		gov-contracts.db
introduction.mdx		introduction.mdx
mint.json		mint.json
output.png		output.png
quickstart.mdx		quickstart.mdx
requirements.txt		requirements.txt
sqlmesh.yml		sqlmesh.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retrieval Augmented Generation Text-to-SQL Application To Analyze US Government Contract Data

Demo

Customize this Application with your own Data

Clone this Repository

Setup your Environment

Activate your Environment

Install Dependencies

Add your data in the data folder

Update path in the connect_db.py file to load your Excel/CSV data into the database

Run the connect_db.py python helper file to load your data into the database

Examine the data with Datasette

Run the app.py python file to start the Gradio Application

Enhancement: XGBOOST prediction of government contract awards with confidence intervals

application

References.

About

Releases

Packages

Languages

LNshuti/usgov-contracts-rag

Folders and files

Latest commit

History

Repository files navigation

Retrieval Augmented Generation Text-to-SQL Application To Analyze US Government Contract Data

Demo

Customize this Application with your own Data

Clone this Repository

Setup your Environment

Activate your Environment

Install Dependencies

Add your data in the data folder

Update path in the connect_db.py file to load your Excel/CSV data into the database

Run the connect_db.py python helper file to load your data into the database

Examine the data with Datasette

Run the app.py python file to start the Gradio Application

Enhancement: XGBOOST prediction of government contract awards with confidence intervals

application

References.

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages