the_kitchen_ETL

ETL Project - Bringing people together one recipe at a time

Project Team / Organization

Name	Title
Kevin Lam	Programmer Analyst
Rachel Chan	Programmer Analyst
Ramyata Upmaka	Programmer Analyst
Ozzie Moreno	Programmer Analyst

Project Overview

A description of what each team member will be responsible for
1. GitHub setup - Ozzie
2. Extract Data - All (Primary contributors: Webscraping:Rachel, Edamam API: Ramyata, Kevin, Walmart API: Ozzie)
3. Pandas / Data transforming - All
4. Setting up database - All
5. Read Me - All

Relevant Code in Submission Folder.

Extraction

Edamam API | Beautiful Soup | Pandas |

Methods

Webscraped Healthline's '20 Delicious High Protein foods to Eat'. Take a look!

Webscraping Process:

Created for loop to scrape the 20 high protein foods. Split into two lists due to numbering removal.
Removed numbering from the two lists.
Concatenated the list to form a combined list using ingredients = newlst+newlst2
Removed unnecessary words.

Edamam API:

Using the edamam api, pulled the top 50 recipe lists for each protein.

API Process:

We used the ingredient list to query 50 recipes for each list element. We used a for loop to pull the recipes and saved them in lists.
- For each recipe, we returned the label, yield, url, cook time, and calorie information.
Then, we saved the recipe data into a pandas dataframe.

The Recipes

Looking for the following:

Recipe Name
Cooking Time
Serving Yield
Calories
Main Ingredients
Recipe Url

Store Locator

To find the stores, we have two separate steps. The first is finding the stores within New Jersey, and the second is to find whether the stores have the items we need.

Through an api we searched for Walmart stores within New Jersey.

We first uploaded a csv that contains all of the zipcodes within the United States.
With all of the zipcodes uploaded, we only look for the zipcodes that are in New Jersey.
We then found the stores within a 1 mile radius of each zipcode.
We pushed this information into a dataframe which will be cleaned.

To find whether the store has the main ingredient, we will Walmart's query feature to find if the item is available.

Using the clean dataframe, we use the store id and the main ingredients to create the url. This search will show the Walmart website and let the user know if the store has the item.
With the url complete a new dataframe is created to have Store Name, Store ID, Store Address, Store Zipcode(Zip), Main Ingredient, and Store URL.

Transform

Pandas |

Methods

Cleaning our data:

Recipes Dataframe:

Removed words "recipes" or "recipe" from entire dataframe:
- Sample code:
food_df["Recipe Name"].replace({' recipes':''},regex=True)
Split the recipe dataframe by main ingredient using the loc property to catch spelling/formatting errors more easily:
- Sample code:
greek_yo_df=food_df.loc[food_df["Main Ingredient"]=="Greek yogurt"]
Corrected spelling:
- Sample code:
food_df["Recipe Name"].replace("Quinoa Tabouleh", "Quinoa Tabbouleh", inplace=True)
- Sample code: food_df["Recipe Name"]=food_df["Recipe Name"].replace({'chilli':'chili'},regex=True)
Removed unnecessary words:
- Sample code:
food_df["Recipe Name"].replace("Diane's Sugared Peanuts", "Sugared Peanuts", inplace=True)

Store Locator:

Store Locator and Main Ingredient Data Process:

Reformatted the address column in the original dataframe by extracting the location information from a dictionary.
Narrowed the dataframe down by state (NJ), although we included stores within a one mile radius. So, some in other states were included.
Removed any duplicated "store ids".
Determined the length of the dataframe with unique store ids.
Created a new column based off of the ingredient list and added the column to the store id dataframe.
- To do this, we looped through each ingredient for every unique store id entry.
Created a new column based off of both the store id and ingredient list in order to populate the column with a unique url list. Added this column to the store id dataframe.
Exported this store locator dataframe to a csv.

Load

Methods

Created an erd for the tables.

Created the tables within postgresql.
Updated the dataframes to ensure that they can be properly uploaded to the sql server.

Query time

Below are some sample queries that we ran through our database:

Location Query

Query for recipes that have main ingredients in stock at a store near you. Location: Bayonne, NJ.

Main Ingredient Query

Query finding recipe name and url where teh main ingredient is almonds.

Calories Query

Query finding recipe name and url for a range of calories: 200-500.

Cook Time Query

Query finding recipes for cook times under 15 minutes.

Location and Main Ingredient Query

Query finding recipes for store location (Bayonne, NJ) and main ingredient (greek yogurt).

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Images		Images
Submission		Submission
__pycache__		__pycache__
datasets		datasets
scripts		scripts
.DS_Store		.DS_Store
ExampleQueries.sql		ExampleQueries.sql
Load.ipynb		Load.ipynb
README.md		README.md
config.py		config.py
gitignore.txt		gitignore.txt
healthline_scraping.ipynb		healthline_scraping.ipynb
healthline_scraping_new.ipynb		healthline_scraping_new.ipynb
hf2_recipes.json		hf2_recipes.json
json_pandas_test.ipynb		json_pandas_test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

the_kitchen_ETL

Project Team / Organization

Project Overview

Extraction

Methods

The Recipes

Store Locator

Transform

Methods

Load

Methods

Query time

Location Query

Main Ingredient Query

Calories Query

Cook Time Query

Location and Main Ingredient Query

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

sir-omoreno/the_kitchen_ETL

Folders and files

Latest commit

History

Repository files navigation

the_kitchen_ETL

Project Team / Organization

Project Overview

Extraction

Methods

The Recipes

Store Locator

Transform

Methods

Load

Methods

Query time

Location Query

Main Ingredient Query

Calories Query

Cook Time Query

Location and Main Ingredient Query

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages