8000 MattHondrakis (Matthew Hondrakis) Β· GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View MattHondrakis's full-sized avatar
😎
Learning
😎
Learning

Block or report MattHondrakis

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
MattHondrakis/README.md
  • πŸ‘‹ Hi, I’m @MattHondrakis
  • 🧠 I’m interested in Probability/Statistics, Actuarial Science, and Data Science.
  • 🌱 I received a Data Scientist Associate Certification from DataCamp and Google Data Analytics Certificate from Coursera.
  • πŸ† I 7A7F have a BS in Applied Mathematics from The City College of New York.
  • πŸ“« How to reach me: hondrakma@gmail.com

Datasets I found most interesting:

🧩 => Structured Analysis
πŸ’« => Unstructered Chronological Analysis
πŸ’» => Machine Learning Model

  1. NYC House Prices (DataAnalysis) πŸ’« πŸ’»
    • GAM, Random Forest and Linear Regression models, predicting prices of Real Estate properties in NYC. The type of property (Condo, Apartment, etc.) is extracted from the home_details variable, which plays a crucial role in the modeling process. Models are then compared against eachother by key metrics, such as R2 and Root Mean Squared Error.
  2. Job Placement (DataAnalysis) 🧩 πŸ’»
    • Validated data by checking for and appropriately dealing with missing values and outliers. Explored trends and correlations between different variables, utilizing visualizations and statistical tests. Subquently, created two models (Random Forest and Logistic Regression) predicting whether an individual received a job offer, with the best model’s values for accuracy and AUC being 0.853 and 0.932, respectively. Finally, analyzed the variable importance of each predictor in both models and compared.
  3. Coursera Case Study: Bikes (DataAnalysis) 🧩
    • Google Analytics Case Study (fictional company Cyclistic), analyzing a large dataset of more than 6 million rows of data using R to extract insights. The purpose of the Case Study is to get casual users to convert to memberships. Thorough exploratory data analysis with a conclusion providing suggestions for improvement and steps moving forward.
  4. Starbucks (First-Git) πŸ’« πŸ’»
    • One of the first real world datasets I ever worked on.
    • Logistic Regression model predicting whether a drink is a Frappuccino based on sodium (mg). The status of 'Frappuccino' is extracted from the name of the drink using text manipulation.

Most Recent/Actively working on:

Analyses are usually done in R but sometimes replicated in Python

Dataset: Premier League

  • R (TidyTuesday)

Dataset: Egg Production

  • R (TidyTuesday)

Note: For the sake of practice, I tend to jump from one dataset to the next.

Featured Visuals: Click Image to View Analysis

Pinned Loading

  1. TidyTuesday TidyTuesday Public

    Diving into the weekly datasets of TidyTuesday!

  2. Python-Data-Science Python-Data-Science Public

    (New) Data Science in Python

    Jupyter Notebook

  3. DataAnalysis DataAnalysis Public

    Miscellaneous Data Analysis

  4. First-Git First-Git Public

    First Git Attempt

0