8000 GitHub - socratesk/kaggle: This is the repository for Kaggle competitions in which I have participated
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

socratesk/kaggle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

99 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Welcome to my Kaggle GitHub Page

I created my Kaggle account in Apr 2015 but really started participating in competitions from November 2015. My first competition was predicting 'Rossmann Store Sales' that ran through mid-December 2015. My first submission was using basic Classification algorithm. Then used Random Forest and ensemble of both to get traction on the competitions.

Competitions participated

  • Rossmann Store Sales - Sales predicting competition. Competition status: Closed.
  • Walmart Recruiting: Trip Type Classification - Classification of Customer Trip to Walmart stores. Competition status: Closed.
  • The Winton Stock Market Challenge - Predicting intra and end of day stock returns future. Competition status: Closed.
  • Airbnb New User Bookings - Where will a new guest book their first travel experience? Competition status: Closed.
  • Prudential Life Insurance Assessment - Customer Risk classification on Life Insurance application. Competition status: < 699A strong>Closed.
  • Telstra Network Disruptions - Predict the severity of service disruptions on the network. Competition status: Active.

Models used

Before participating in the above competitions, I analyzed the previous Kaggle competitions and user scripts. Predominantly, 'XGBoost' was used for single model submissions in both competitions. So I tried 'XGBoost' to start with and ensembles of XGBoost, Random Forest, Neural Network, Classification, and GBM.

Rossmann Store Sales

  1. Linear Model - Link
  2. XGBoost with Feature Engineering - Link
  3. XGBoost with Feature Engineering and Parameter tuning - Link

Prudential Life Insurance

  1. Recursive Partition Classification Tree - Link
  2. Random Forest - Link

Walmart Recruiting

  1. XGBoost with Feature Engineering - Link
  2. Random Forest with Feature Engineering - Link

The Winton Stock Market Challenge

Working in this competition was bit more challenging than others due to the file size and volume of data.

  • Train data set: 40000 rows x 211 features. File size: 173 MB
  • Test data set: 120000 rows x 147 features. File size: 282 MB

With initial model, preparation of output file was taking hours to process. So had to findout alternate ways to process them faster. To do that task, ended-up developing Java code

  1. Median replacement - R Code; Java Code
  2. Median replacement 3% Adjusted - R Code
  3. Moving average on Test data - R Code; Java Code

Airbnb New User Bookings

  1. XGBoost with Feature Engineering - Link

Focus

Kaggle's recruiting competition prohibits participants to post their code in the forum and form a group or team. This helps each individual to exhibit their own idea and expertise in Data Science area - not hijacked by others idea. This really helps me evaluate where I stand, as an individual, when compared to other participants across the World.

About

This is the repository for Kaggle competitions in which I have participated

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0