[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
 
Win Attribution to offense, pitching, and fielding at the game level (prototype method)
Posted: 19 November 2018 01:40 PM
Doubles Hitter
RankRank
Total Posts:  10
Joined  2013-03-06

I have a prototype method of attributing wins/losses to the offense, pitching and fielding at the game level.

I was hoping to get some constructive feedback on the approach, even if it is “this has been tried already”  or “this method cannot work” or “someone has done it better by doing x”. 

The summary of the method is below. I could not attach the file, which includes the summary results, due to file size/type constraints.

Win Attribution at the game level by Offense, Pitching, and Fielding using the 2018 season
Below is the outline of method using example game of Colorado at Arizona on March 29, 2018. Arizona won 8-2. Analysis is from the perspective of the home team.
Convert runs scored to expected win percentage based on typical results. For example, a team that scores 8 runs wins 92.6% of the time and loses 7.4% of the time.
Convert runs allowed to expected win percentage based on typical results. For example, a team that allows 2 runs wins 77.4% of the time and loses 22.6% of the time.
Use the relative percentages to allocate between offense and defense. Since the home team, Arizona, won we use the winning percentages.
Offense=  92.6/(92.6+77.4)=54.5%
Defense=1-offense=45.5%

The Arizona offense gets credit for 0.545 of a win and the defense gets credit for 0.455 of a win.

Next, allocate the defense of 0.455 wins between pitching and fielding. I used custom linear weights by category to determine how many runs to allocate to the pitchers and how much to fielding. The first step was to find the total runs to allocate to pitching (focusing on three true outcome type events) and fielding (essentially everything else).

Pitching Categories: Home runs, non-intentional walks, hit by pitcher, strikeouts, balks, and wild pitches.
Fielding Categories: Singles, doubles, triples, intentional walks, stolen bases, caught stealing, errors, passed balls, interference, non-strikeout putouts.

The linear weights were based on the season values and were NOT done as difference from average but as marginal impact on runs allowed since I needed (wanted) total runs. For example, I calculated the average impact of an incremental single by comparing games with 0 singles, 1 single, 2 singles, etc. and how many runs were allowed. I found the average, weighted by number of games in each category, of the impact for each extra single. I did this step for each event. In this data set, I found the value of 1 extra single to be 0.581 runs. For a comparison the run value of an extra home run was 1.660.

From step 4 I needed to make some adjustments. Using this method in most games there will be a difference between the actual runs allowed and the calculation. I allocated 50% of the difference to pitching and 50% to fielding. I used a 50/50 split since I found the correlation between the theoretical pitching runs and this difference to be nearly the same as the correlation between the theoretical fielding runs and this difference. This difference is essentially sequencing as best I could judge.
From step 4, I made one other adjustment. I found a 20% correlation between the theoretical pitching runs and the theoretical fielding runs, so I re-allocated 20% of the theoretical fielding runs to pitching runs. I ran the adjustment in this direction since it seemed much more likely that the pitchers impacted the fielding categories than the fielders impacted the pitching categories.
Using data for the same Arizona – Colorado game I found that the theoretical pitching runs was 3.859 runs based on 2 home runs allowed, 2 walks, 1 balk, and 12 strikeouts. The theoretical fielding runs allowed were 0.302 with 7 singles allowed and 15 fielding putouts.
The theoretical runs allowed was 4.161 (3.859 + 0.302). Since the actual runs allowed was 2, I allocated ((2-4.161))/2 each to pitching and fielding. I also allocated 0.20?0.302=.060 from fielding to pitching. I then rounded to the nearest run to end up with:
Pitching Runs Allowed = 3
Fielding Runs Allowed = -1

The next step was to convert the adjusted runs allowed to a winning percentage. I found the winning percentage for each level of pitching runs allowed and fielding runs allowed (which is why I rounded at the end of step 6). For 3 pitching runs allowed home teams had a winning percentage of 0.607. For -1 fielding runs allowed home teams had a winning percentage of 0.724.
To allocate the defensive wins of 0.455 I used the relative winning percentages from step 7 AFTER I weighted the pitching runs by 3. I did his since the total adjusted runs were about 75% pitching and 25% fielding.

Pitching Wins Attributed =0.455*((3*.607))/((3*0.607+0.724) )= 0.326
Fielding Wins Attributed = 0.455 – 0.326 = 0.129

Discussion Points
The linear weights, while mostly reasonable, are likely ‘overfitted’ since I had one season of data and nothing out of sample.
The runs scored/allowed to winning percentages, while reasonably and logical, are probably also overfitted for the same reasons as #1.
The translation from runs to winning percentage that I used could likely be replaced by a better model.
The 20% allocation of fielding to pitching makes sense and seems reasonable but is not based on anything other than logic and a correlation calculation.
Weighting pitching runs to fielding runs 3:1 (75%/25%) makes logical sense, fits the data, and yields reasonable results but is still a result of judgment.
I included IBB in fielding since I deemed it a managerial choice. I could be wrong here and it might belong in pitching.
Allocating the sequencing difference 50/50 to pitching and fielding made sense based on the similar r-values but a different split is highly likely to be more accurate.
I am somewhat uncomfortable with having negative runs allowed in some cases but could not come up with a better alternative, at least at this time

Profile
 
Posted: 19 November 2018 04:56 PM   [ # 1 ]
Doubles Hitter
RankRank
Total Posts:  10
Joined  2013-03-06

Below are the summary results for 2018 for home wins and losses by team after summing across all home games for each team.


HOME WINS ATTRIBUTION  
Home                     subtotals
Team OFF       DEF       PITCH     FIELD
ANA 21.5   20.5   14.6   5.8
ARI 19.9   20.1   15.2   4.8
ATL 22.5   20.5   15.1   5.4
BAL 15.1   12.9   9.3       3.5
BOS 30.0   27.0   20.3   6.7
CHA 15.0   15.0   11.0   4.0
CHN 26.3   24.7   18.5   6.2
CIN 19.1   17.9   13.3   4.5
CLE 24.0   25.0   19.1   6.0
COL 23.5   23.5   18.0   5.5
DET 18.5   19.5   14.5   5.0
HOU 22.5   23.5   17.8   5.6
KCA 16.1   15.9   11.9   4.0
LAN 21.1   23.9   18.2   5.7
MIA 16.3   21.7   16.3   5.4
MIL 24.7   26.3   19.5   6.8
MIN 25.3   23.7   17.7   6.0
NYA 27.8   25.2   19.3   6.0
NYN 16.0   21.0   16.0   5.0
OAK 24.1   25.9   19.3   6.6
PHI 22.9   26.1   20.3   5.8
PIT 20.2   23.8   17.9   6.0
SDN 14.4   16.6   12.8   3.8
SEA 20.6   24.4   18.3   6.1
SFN 20.4   21.6   16.4   5.2
SLN 21.7   21.3   15.9   5.4
TBA 26.1   24.9   19.1   5.8
TEX 19.7   14.3   10.7   3.6
TOR 21.2   18.8   13.4   5.4
WAS 21.8   19.2   14.3   4.9
TOT 638.3   644.7   484.2   160.5

HOME LOSSES ATTRIBUTION  
Home       subtotals
Team OFF   DEF   PITCH   FIELD
ANA 20.4   18.6   14.5   4.1
ARI 20.8   20.2   15.5   4.7
ATL 18.2   19.8   14.2   5.7
BAL 27.1   25.9   20.2   5.7
BOS 10.9   13.1   9.7       3.4
CHA 25.2   25.8   20.4   5.4
CHN 16.7   14.3   10.8   3.5
CIN 18.9   25.1   19.3   5.8
CLE 13.2   18.8   13.8   5.0
COL 12.6   21.4   16.1   5.3
DET 21.2   21.8   16.6   5.1
HOU 19.5   15.5   11.6   3.9
KCA 22.8   26.2   20.1   6.1
LAN 20.6   16.4   12.0   4.5
MIA 22.8   20.2   14.8   5.4
MIL 14.5   15.5   11.5   4.0
MIN 14.8   17.2   12.9   4.3
NYA 11.3   16.7   12.5   4.1
NYN 24.9   19.1   13.9   5.2
OAK 16.6   14.4   10.9   3.4
PHI 15.1   16.9   12.8   4.2
PIT 19.1   16.9   12.6   4.3
SDN 24.2   25.8   18.9   6.9
SEA 18.8   17.2   12.4   4.7
SFN 21.9   17.1   11.9   5.1
SLN 19.3   18.7   13.5   5.1
TBA 18.2   11.8   9.1       2.7
TEX 19.5   27.5   20.9   6.6
TOR 19.5   21.5   16.3   5.3
WAS 20.1   19.9   14.8   5.0
TOT 568.7   579.3   434.6   144.7

 

 

 

 

Profile
 
Posted: 21 November 2018 10:45 AM   [ # 2 ]
Administrator
RankRankRankRank
Total Posts:  383
Joined  2013-01-04

Let’s take it one step at a time.  First, your allocation between offense and defense is intriguing.

The way I would do it is based on replacement level (at the seasonal level).  Roughly speaking, it would be offense from .400 win% and defense from .400 win%.  So, if both offense and defense are equal (.500 win%), they’d each get 0.100 wins per game.  But if say you have a .500 team that was all-offense, say a .600 win% offense and a .400 win% defense, then I’d give 0.200 wins per game to the offense and 0 to the defense.

In your case, both sides will always get something, regardless of how good or bad they are in any particular game. So, you would end up, on average, with the offense getting .600 / (.600 + .400) = 60%.

I’m not really sure that’s going to work.

Profile
 
Posted: 24 November 2018 09:57 AM   [ # 3 ]
Doubles Hitter
RankRank
Total Posts:  10
Joined  2013-03-06

Thanks.  I liked features of what I did but it didn’t really separate teams out enough - the spread of results was too tight, most likely for the reasons you pointed out.

It took me some time to get something that worked but I used run differential from ‘replacement’ as the basis for allocation. I used 1 run per game worse than average as the replacement level.

So in the Arizona-Colorado game, Arizona’s offense put up 8 runs or 4.474 above replacement. Arizona allowed 2 runs or 3.372 runs better than replacement.  I started with a 50/50 split between offense and defense then adjusted for the runs versus replacement weighted by the win impact of a run. In this data set 1 extra run was worth about .12 wins on average (leaving out the tails of the run distributions).

I multiplied the .12 times (4.474-3.372) to get .132; .132 + .50 = .632. So the offense was attributed 63.2% of the win, the defense was allocated 36.8%.

In the original method the highest attribution of ‘games’ to offense to any team was 53% and the lowest was 47%. With the new method, the range was from 60% to 40%. The standard deviation of “allocated offensive games” was 2.2 in the original method and 8.2 in the revised method.

A ‘problem’ I have with this revised method is that the offense (or defense) could get a negative loss if they did well but the other unit did very poorly. I know that it works mathematically but it strikes me as odd.

I am wondering - would using RE work just as well/better?

Thanks again for the feedback.

Profile