- contains the 3 raw data files
- these were all directly pulled from the kaggle site
- contains the consolidated data
- starting point for my transforms
- contains the consolidated data
- "Qual Time" column now contains avg time of qual phases that the driver competed in if the race was new qual format (3-round quals started in 2006)
- "Qual Time" may contain "DNS" or "DNF" if a driver failed to complete phase 1 of the qual for any reason
- most recent and processed version of data--please use this as a baseline for further modification
- contains prev data from F1_qualtime.csv
- additional columns are "Time in Seconds" and "Adjusted Time"
- "Time in Seconds" is an intermediate for calculations
- "Adjusted Time" is an adjusted form of "Time/Retired" in seconds for all drivers
- "Adjusted Time" contains either a time in seconds (for drivers who did not complete all laps in time, the avg time for the 1st place driver to complete a lap is added per lap they were behind) or "DNF" if the driver did not finish
- "Adjusted Time" has some holes due to disqualifications--these are not yet accounted for
- transformed "F1_data.csv" into "F1_qualtime.csv"
- note that manually filled in holes where qual was unfinished (DNF, DNS, etc.) will be erased if you rerun this script
- transformed "F1_qualtime.csv" into "F1_racetime.csv"
- Data Loading: Reads multiple F1 datasets (results, constructors, races, circuits, drivers, status, qualifying, driver standings, pit stops) into DataFrames.
- Data Merging: Combines these datasets to include detailed race, driver, constructor, and circuit information.
- Column Renaming: Clarifies data by renaming columns (e.g.,
date
toraceDate
,position
tofinishPos
). - Feature Engineering: Calculates historical performance metrics (e.g., points and wins before races, recent placements, seasonal averages, circuit-specific averages, driver age at race, position changes).
- Pit Stop Data: Integrates pit stop data, calculating average number and time of pit stops.
- Final Adjustments: Filters data for races from 2011 onwards and saves the final DataFrame to
raw_data_with_pit_stops.csv
.
- Feed forward Neural Network based on the raw data file
- Uses pytorch for most of the heavy lifting
- MSE loss function
- Cross entopy loss function
- contains both the random forest and id3 classifier