NHANES Practicum

Details and write-up about this project are available here.

Files:

Files.xlsx
- Contains the matched indexes of the NHANES files across the years sorted alphabetically, indicates features used for analysis, and re-naming of features across the years for consistency.
run_notebooks.py
- Script to batch run Jupyter Notebooks
- Jupyter notebook output from running this code is available in the /output folder for each respective folder
/NHANES-Downloader
- Scripts to download NHANES files and converts them to csv files
/Data Cleaning
- Notebooks that clean, rename, recategorize, and remove missing values for NHANES data files and uploads it to a local NoSQL database.
/Data Upload
- Notebooks that merge appropriate files for analysis, recategorize labels, one-hot encode features, and upload data to a local NoSQL database.
/Data Analysis
- Notebooks for exploratory data analysis, fitting random forest and XGBoost models to the data, and evaluating the performance of the models. Identify risk factors for hospital utilization and major diseases.
/Prevalence
- Notebooks to generate data (.csv) files for prevalence plots
- R Notebooks to generate prevalence plots

Download files from NHANES using NHANES-Downloader scripts
- Navigate to the NHANES-Downloader folder and follow the README.
- In terminal go to NHANES-Downloader folder and run (MAC):

$ ./get_data.py  
$ ./raw_to_csv.py

Run Jupyter Notebooks to clean clean, upload, and anlayze data:
- Navigate to the root folder and in terminal run:

$ python run_notebooks.py ./Data\ Cleaning/*.ipynb ./Data\ Upload/*.ipynb

Run Jupyter Notebooks to analyze data (This may take a while ~15 mins):
- Navigate to the root folder and in terminal run:

$ python run_notebooks.py ./Data\ Analysis/*.ipynb

Run Jupyter Notebooks to generate data for prevalence plots:
- Navigate to the root folder and in terminal run:

$ python run_notebooks.py ./Prevalence/*.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Data Analysis		Data Analysis
Data Cleaning		Data Cleaning
Data Upload		Data Upload
NHANES-Downloader		NHANES-Downloader
Prevalence		Prevalence
Files.xlsx		Files.xlsx
README.md		README.md
run_notebooks.py		run_notebooks.py