- π Hi, Iβm @MattHondrakis
- π§ Iβm interested in Probability/Statistics, Actuarial Science, and Data Science.
- π± I received a Data Scientist Associate Certification from DataCamp and Google Data Analytics Certificate from Coursera.
- π I 7A7F have a BS in Applied Mathematics from The City College of New York.
- π« How to reach me: hondrakma@gmail.com
π§© => Structured Analysis
π« => Unstructered Chronological Analysis
π» => Machine Learning Model
- NYC House Prices (DataAnalysis) π« π»
- GAM, Random Forest and Linear Regression models, predicting prices of Real Estate properties in NYC. The type of property (Condo, Apartment, etc.) is extracted from the home_details variable, which plays a crucial role in the modeling process. Models are then compared against eachother by key metrics, such as R2 and Root Mean Squared Error.
- Job Placement (DataAnalysis) π§© π»
- Validated data by checking for and appropriately dealing with missing values and outliers. Explored trends and correlations between different variables, utilizing visualizations and statistical tests. Subquently, created two models (Random Forest and Logistic Regression) predicting whether an individual received a job offer, with the best modelβs values for accuracy and AUC being 0.853 and 0.932, respectively. Finally, analyzed the variable importance of each predictor in both models and compared.
- Coursera Case Study: Bikes (DataAnalysis) π§©
- Google Analytics Case Study (fictional company Cyclistic), analyzing a large dataset of more than 6 million rows of data using R to extract insights. The purpose of the Case Study is to get casual users to convert to memberships. Thorough exploratory data analysis with a conclusion providing suggestions for improvement and steps moving forward.
- Starbucks (First-Git) π« π»
- One of the first real world datasets I ever worked on.
- Logistic Regression model predicting whether a drink is a Frappuccino based on sodium (mg). The status of 'Frappuccino' is extracted from the name of the drink using text manipulation.
Analyses are usually done in R but sometimes replicated in Python
- R (TidyTuesday)
- R (TidyTuesday)
Note: For the sake of practice, I tend to jump from one dataset to the next.