I created my Kaggle account in Apr 2015 but really started participating in competitions from November 2015. My first competition was predicting 'Rossmann Store Sales' that ran through mid-December 2015. My first submission was using basic Classification algorithm. Then used Random Forest and ensemble of both to get traction on the competitions.
- Rossmann Store Sales - Sales predicting competition. Competition status: Closed.
- Walmart Recruiting: Trip Type Classification - Classification of Customer Trip to Walmart stores. Competition status: Closed.
- The Winton Stock Market Challenge - Predicting intra and end of day stock returns future. Competition status: Closed.
- Airbnb New User Bookings - Where will a new guest book their first travel experience? Competition status: Closed.
- Prudential Life Insurance Assessment - Customer Risk classification on Life Insurance application. Competition status: < 699A strong>Closed.
- Telstra Network Disruptions - Predict the severity of service disruptions on the network. Competition status: Active.
Before participating in the above competitions, I analyzed the previous Kaggle competitions and user scripts. Predominantly, 'XGBoost' was used for single model submissions in both competitions. So I tried 'XGBoost' to start with and ensembles of XGBoost, Random Forest, Neural Network, Classification, and GBM.
Rossmann Store Sales
- Linear Model - Link
- XGBoost with Feature Engineering - Link
- XGBoost with Feature Engineering and Parameter tuning - Link
Prudential Life Insurance
Walmart Recruiting
The Winton Stock Market Challenge
Working in this competition was bit more challenging than others due to the file size and volume of data.
- Train data set: 40000 rows x 211 features. File size: 173 MB
- Test data set: 120000 rows x 147 features. File size: 282 MB
With initial model, preparation of output file was taking hours to process. So had to findout alternate ways to process them faster. To do that task, ended-up developing Java code
- Median replacement - R Code; Java Code
- Median replacement 3% Adjusted - R Code
- Moving average on Test data - R Code; Java Code
Airbnb New User Bookings
- XGBoost with Feature Engineering - Link
Kaggle's recruiting competition prohibits participants to post their code in the forum and form a group or team. This helps each individual to exhibit their own idea and expertise in Data Science area - not hijacked by others idea. This really helps me evaluate where I stand, as an individual, when compared to other participants across the World.