I have a prototype method of attributing wins/losses to the offense, pitching and fielding at the game level.
I was hoping to get some constructive feedback on the approach, even if it is “this has been tried already” or “this method cannot work” or “someone has done it better by doing x”.
The summary of the method is below. I could not attach the file, which includes the summary results, due to file size/type constraints.
Win Attribution at the game level by Offense, Pitching, and Fielding using the 2018 season
Below is the outline of method using example game of Colorado at Arizona on March 29, 2018. Arizona won 8-2. Analysis is from the perspective of the home team.
Convert runs scored to expected win percentage based on typical results. For example, a team that scores 8 runs wins 92.6% of the time and loses 7.4% of the time.
Convert runs allowed to expected win percentage based on typical results. For example, a team that allows 2 runs wins 77.4% of the time and loses 22.6% of the time.
Use the relative percentages to allocate between offense and defense. Since the home team, Arizona, won we use the winning percentages.
Offense= 92.6/(92.6+77.4)=54.5%
Defense=1-offense=45.5%
The Arizona offense gets credit for 0.545 of a win and the defense gets credit for 0.455 of a win.
Next, allocate the defense of 0.455 wins between pitching and fielding. I used custom linear weights by category to determine how many runs to allocate to the pitchers and how much to fielding. The first step was to find the total runs to allocate to pitching (focusing on three true outcome type events) and fielding (essentially everything else).
Pitching Categories: Home runs, non-intentional walks, hit by pitcher, strikeouts, balks, and wild pitches.
Fielding Categories: Singles, doubles, triples, intentional walks, stolen bases, caught stealing, errors, passed balls, interference, non-strikeout putouts.
The linear weights were based on the season values and were NOT done as difference from average but as marginal impact on runs allowed since I needed (wanted) total runs. For example, I calculated the average impact of an incremental single by comparing games with 0 singles, 1 single, 2 singles, etc. and how many runs were allowed. I found the average, weighted by number of games in each category, of the impact for each extra single. I did this step for each event. In this data set, I found the value of 1 extra single to be 0.581 runs. For a comparison the run value of an extra home run was 1.660.
From step 4 I needed to make some adjustments. Using this method in most games there will be a difference between the actual runs allowed and the calculation. I allocated 50% of the difference to pitching and 50% to fielding. I used a 50/50 split since I found the correlation between the theoretical pitching runs and this difference to be nearly the same as the correlation between the theoretical fielding runs and this difference. This difference is essentially sequencing as best I could judge.
From step 4, I made one other adjustment. I found a 20% correlation between the theoretical pitching runs and the theoretical fielding runs, so I re-allocated 20% of the theoretical fielding runs to pitching runs. I ran the adjustment in this direction since it seemed much more likely that the pitchers impacted the fielding categories than the fielders impacted the pitching categories.
Using data for the same Arizona – Colorado game I found that the theoretical pitching runs was 3.859 runs based on 2 home runs allowed, 2 walks, 1 balk, and 12 strikeouts. The theoretical fielding runs allowed were 0.302 with 7 singles allowed and 15 fielding putouts.
The theoretical runs allowed was 4.161 (3.859 + 0.302). Since the actual runs allowed was 2, I allocated ((2-4.161))/2 each to pitching and fielding. I also allocated 0.20?0.302=.060 from fielding to pitching. I then rounded to the nearest run to end up with:
Pitching Runs Allowed = 3
Fielding Runs Allowed = -1
The next step was to convert the adjusted runs allowed to a winning percentage. I found the winning percentage for each level of pitching runs allowed and fielding runs allowed (which is why I rounded at the end of step 6). For 3 pitching runs allowed home teams had a winning percentage of 0.607. For -1 fielding runs allowed home teams had a winning percentage of 0.724.
To allocate the defensive wins of 0.455 I used the relative winning percentages from step 7 AFTER I weighted the pitching runs by 3. I did his since the total adjusted runs were about 75% pitching and 25% fielding.
Pitching Wins Attributed =0.455*((3*.607))/((3*0.607+0.724) )= 0.326
Fielding Wins Attributed = 0.455 – 0.326 = 0.129
Discussion Points
The linear weights, while mostly reasonable, are likely ‘overfitted’ since I had one season of data and nothing out of sample.
The runs scored/allowed to winning percentages, while reasonably and logical, are probably also overfitted for the same reasons as #1.
The translation from runs to winning percentage that I used could likely be replaced by a better model.
The 20% allocation of fielding to pitching makes sense and seems reasonable but is not based on anything other than logic and a correlation calculation.
Weighting pitching runs to fielding runs 3:1 (75%/25%) makes logical sense, fits the data, and yields reasonable results but is still a result of judgment.
I included IBB in fielding since I deemed it a managerial choice. I could be wrong here and it might belong in pitching.
Allocating the sequencing difference 50/50 to pitching and fielding made sense based on the similar r-values but a different split is highly likely to be more accurate.
I am somewhat uncomfortable with having negative runs allowed in some cases but could not come up with a better alternative, at least at this time