Wednesday, November 29, 2017
So what is all the hulabaloo about WAR?
?Aaron Judge. And the Yankees.
Linear_Weights
With two outs, a strikeout is just like any other batting out, in terms of its impact to scoring. The inning is over. With the bases empty, a strikeout is also just like any other batting out: we have an extra out and we still have the bases empty.
So, in these situations, the two outs or bases empty scenarios, a strikeout is just as costly as a batting out. These situations happen 76% of the time. So, three-quarters of the time, a strikeout is functionally equivalent to other batting outs, in terms of its impact to scoring.
We are therefore left with the other 24% of the time to consider.
Let's first consider with a runner on 1B, or 1B&2B and less than 2 outs. This situation happens about 15-16% of the time. And in this situation a strikeout is LESS costly than a batting out. That's because the groundout can give you two outs. The net effect is around 0.05 runs.
How about when we have a runner on 2B and less than 2 outs, the classic "move the runner over" scenario? In this situation, it is indeed more costly to strikeout in comparison to other batting outs, by just over 0.04 runs. This situation however happens only 4% of the time.
Bases loaded and less than 2 outs has all kinds of things happening, with both the strikeout potentially being more costly and less costly than other batting outs. Overall, the net effect is that strikeouts are indeed costlier at an impact of almost 0.08 runs. But bases loaded, less than 2 outs, happens only 1% of the time.
That leaves us with the runners on 3B and less than 2 outs, a situation where it is clearly and obviously far more costly to strikeout than get other batting outs, most notably because of the potential SF. How costly? It is an enormous 0.26 runs. Each strikeout costs about 0.26 more runs than a regular batting out. However, this happens 3-4% of the time.
This is our tally:
Strikeouts as costly:
Strikeouts less costly:
Total net impact of 0.05 x 15.5% = +0.01 runs
Strikeouts more costly:
Total net impact of the above = -0.01 runs
(All numbers rounded for ease of illustration.)
***
In other words, while strikeouts are FAR costlier with runners on third and less than 2 outs, the sheer frequency of when they are less costly (runners on 1B and less than 2 outs), is enough to basically cancel that out.
If you are going to make an out, don't make it a K with runners on 3B and less than 2 outs. And don't make it a groundout with runners on 1B and less than 2 outs.
To the extent that you DO want to track a hitter's strikeouts, and how they are costly in relation to other batting outs, just track the number of strikeouts with a runner on third and less than 2 outs. And when you do that, you will find the league-leader to basically have about 10 such outs. So, all of this consternation for the ten times that a batter strikes outs with a runner on third and less than 2 outs.
?Let's say you bet on a game of 8-ball. You and your buddy each put 50 cents down. You win the game, you pick up 50 cents. Your buddy obviously is left with nothing. In other words, you are up $0.50 and your buddy is down $0.50. Similarly, you have a 1-0 record, while your buddy is 0-1. The average W-L record would obviously have been 0.500 record. So, you are +0.5 wins above average.
You play 9 more matches, of which you win seven of them. Each time you win, you earn 50 cents. Each time you lose, you lose 50 cents. So, in these seven wins, you added another 3.50$ to your 50 cents from the first game. So your 8 wins generated 4$ of profit. But your two losses cost you 50 cents each time, or 1$. Overall, after the 10 matches, you made 3$. You now have an 8-2 record.
What record would you have needed in order to have neither made a profit or a loss? A 5-5 record. In other words being +3 wins above average is +3$ above not having played at all (or having split the 10 games evenly). Each win ABOVE EXPECTED is worth 1 dollar. That ABOVE EXPECTED is important to remember.
***
The next day, you ask your buddy if he wants to go for round 2. Your buddy, having known you a long time, and knowing that you are a pretty good pool player, and him being passable, said: "You know, if this is going to be fair, you gotta give me some odds." So, you guys agree that you will put up 75 cents and he puts up 25 cents. So, each time you win, you gain your friend's quarter (+0.25$), while each time he wins, it'll cost you -0.75$.
You think this is fair. Your 8-2 record from the day before might be indicative of the strength of your play. If you went 8-2 today, you'd get your buddies 25 cents 8 times (+2.00$), and you'll give up 75 cents twice (-1.50$). You would still be up +0.50$.
***
If you have an opportunity to make an out on a play that you'd expect an average outfielder to make 25% of the time, you'd earn +0.75 outs for each catch.
If you have an opportunity to make an out on a play that you'd expect an average outfielder to make 75% of the time, you'd earn +0.25 outs for each catch.
Suppose you have 4 opportunities to make a tough play, of which you catch two. And you have 6 opportunies to make an easier play, of which you catch all but 1.
For the 4 tough plays, you earn +0.75 each for the two catches (+1.50 total), and -0.25 for the two tough ones you didn't make (-0.50 total). For these 4 tough plays, you will have earned +1.00 outs.
For the 6 easier plays, you earn +0.25 for each of the five catches you made (+1.25 total), and it cost your 0.75 outs for the one you didn, for a total of +0.50.
Overall, you will have earned +1.00 +0.50 = +1.50 for these 10 plays (7 outs, 3 hits). This is your value, your "profit". You made +1.50 more outs than an average fielder would have, GIVEN THE SAME NUMBER AND DIFFICULTY of opportunities. Remember this number. +1.50.
***
Now, you don't have to add up every single one like this. All of this gets reduced to simply:
profit = (actual minus EV*opps)
where EV = expected value per play
In this case, you had 4 tough plays, with an EV of 25%, and 6 easier plays with an EV of 75%.
4 x 25% plus 6 x 75% all divided by 10
= 55%
In other words, in YOUR OPPORTUNITY SPACE, the expected value is to have caught 55% of the balls. Given 10 plays, you are therefore expected to have caught 5.5 outs.
And what did you actually do? You were 7-3, so you caught 7. Going back to this:
profit = (actual minus EV*opps)
We plug in our numbers
profit = (7 minus 5.5)
profit = +1.5 outs
Remember the number I mentioned? That's how partial plus/minus works.
?Just a little reference. Another thing we should automate at some point in the off-season
http://tangotiger.com/images/uploads/linear_weights_2014_to_present.html
There are two ways to calculate these values. One way was described in Table 5 in The Book. You take the "run value" of the starting state of the event. And then you add up all the runs that actually scored following that event, to the end of the inning. The second was described in Table 7 in The Book. You take the "run value" of the starting state and of the ending state, subtract the two and add up the runs in-between. The results will be very close to each other, either way you do it.The above chart was done the first way (the Table 5 method) mostly because given the dataset I have to work with, it was easier to do it that way.
?A noble effort. However..... the use of regression is not only unneeded, but in fact worse than the simple logical solution.
If all you want to know is the run expectancy for the rest of the inning with a particular batter at the plate, all you have to do is figure out his run impact for that particular plate appearance over and above what the average expectancy.
For example, with bases empty, 0 outs, let's say run expectancy is .480 runs to end of inning. That's with an average batter. But, what if it's Mike Trout? Well, you calculate his linear weights for bases empty, 0 outs (using a table like this). Maybe for Trout, he's +.060 runs in that situation. So, you add that to .480 and you get .540. That's the run expectancy with Trout at the plate, bases empty, 0 outs.
Very straightforward, logical, and no regression.
?A very good primer by Neil on what FIP is. That people USE it for more than its intended construction, that's not a FIP-issue. As I noted in the comments:
I agree with the analogy of FIP to wOBA. They both:
(*) FIP ignores batted balls in field of play, and baserunner movement (SB, CS, WP, etc). wOBA ignores baserunner movement.
?I have a simple method to determine True Talent wOBA at the component level. I posted it in a post-by five years ago (see post 11), and it's never been referenced since, whether by me or anyone else. And it may be one of the most insightful things you come across.
For example, we all know that the run value of a HR is 1.40. But, what if instead we did this for a hitter’s HR coefficient:
PA/(PA+132) * 1.40
That becomes the new “skill” value for the HR.
Whether you regress the number of HR or you regress the coefficient for the HR, it comes out to the same thing, because we want to do this anyway:
PA/(PA+132) * 1.40 * HR
So, whether you do:
X * HR
where X = PA/(PA+132) * 1.40
Or you do:
X * 1.40
where X = PA/(PA+132) * HR
We still have the exact same thing.
And then go to post #12 for examples of the method in action.
See the typical thing is to regress wOBA, but that would make each individual component regress the same amount.
Almost everyone else will regress the amounts of each component (component-level regression), and then feed it back into wOBA or Linear Weights. And that's perfectly fine.
But if you want an incredibly sweet shortcut, follow the method I posted above: instead of regressing the amounts, you can instead regress the coefficient values!
?Yes!
If you look at his batting-neutral numbers, the ones that treats the value of a HR and a walk and a single the same regardless of the base-out situation, Votto is +348 runs better than average according to Fangraphs (look for wRAA), and +354 runs according to Baseball Reference (look for BtRuns).
But if you walk with first base open or you hit HR with the bases empty, the actual run impact would end up going down. However, if you take advantage of the situation, and walk when there's a runner on first, and not strikeout when there's a runner on third, etc, the actual run impact would end up going up.
So, what happens when Joey Votto is batting? Well, on both sites, you can look for RE24, which looks at how Votto does in each of the 24 base out states and gives him credit for his performance relative to the base-out states. And on Fangraphs he's +395 runs and on BR.com he's at the identical +395 runs.
Instead of being at +350 runs in neutral situations, he's close to +400 runs in actual situations. So, Votto is a situationally smart hitter.
(Technical interlude: what we actually want is to compare RE24 to batting runs times boLI, the Leverage Index of the base-out state. But that's really getting into the weeds there. We can do that in the comments if you want to.)
Looking at all 200 hitters with at least 2700 PA from 2007-2015, Votto is 26th in best situational hitter in MLB, putting him at the 87th percentile. Remember, this is comparing Votto situationally to Votto in neutral conditions. Number 1 is Jason Heyward. He's followed by Cargo, Giancarlo Stanton, Chase Utley (naturally), Drew Stubbs (yup, below average hitter who actually is above average based on the situation), Dexter Fowler, Ryan Braun, Victorino, Jimmy Rollins, Todd Helton. On the flip-side, the hitter that is the least situationally-aware is Kyle Seager, followed by AJ, Delmon Young, Navarro, and Mike Aviles.
So, if you want to know how a hitter SHOULD hit, talk to the guys who are actually performing above expectation. They'll tell you how to approach a situation. That means listen to Heyward and Utley and Rollins... and Joey Votto.
?This is very heavy on the math. But the payoff will be there.
?There's nothing really new here for the Straight Arrow readers. This is more for those stumbling across wOBA for the first time.
As I'm reading down this list, ?I was thinking "I'd like to see this one". That was on the first one. And the second. And the third and fourth... all, without exception, is exactly what I'd like to see. I can't even think that I'd prefer to listen to one over the other. They are all right up my alley. So, whoever over at SABR choose these presenters, you did a fantastic job. And of course, the presenters themselves have chosen terrific questions to answer.
I do hope that the rest of the public will get to see these presentations in some form at some point in time.
?For nonpitchers: add 0.003 wins per PA
For SP: add 0.011 wins per IP
For RP: add 0.007 wins per IP
?Jonathan asks the question:
So, ask yourself this: if wOBA / TAv are the standard means of evaluating batters, shouldn’t the fundamental measure of pitcher value be the extent to which they limit batter wOBA / TAv? Of course it should.
Not so fast! With linear weights (or wRAA as you will find it on Fangraphs, which is Runs Above Average based on wOBA), we treat all PA the same, regardless of the base-out situation. A HR is +1.4 runs whether the bases are empty or the bases are loaded.
With RE24, we apply a different run value for a HR based on the base-out situation. A HR with the bases empty is +1.0 runs, while a HR with runners on base will be higher, and much higher with the bases loaded.
Can you make a case for one over the other? Sure, it depends on what you are after.
Now, to be logically consistent, must you do the same for pitchers? No, it's not a necessity. It depends on the reason you do it for batters. If the reason you prefer Linear Weights to RE24 for batters is that the batter is not "responsible" for the base-out situation he sees, and so, it is "unfair" in terms of the number of opportunities faced, then that's a reasonable choice. This applies especially for leadoff hitters. It also presumes that a hitter won't change his approach based on the base-out situation, which of course is ludicrous.
And it gets to the point I keep making, that do we want to assign the impact of an event to a hitter simply because he happens to be involved, even if he may not "own" everything about the change in that event?
But for pitchers, it's different. If Verlander walks the bases loaded and then allows a HR, before striking out the side, that's 4 runs allowed (or +3.5 runs above average). If we followed Linear Weights, we'd give him +1 run for the 3 walks, +1.4 runs for the HR and -0.8 runs for the three K, for a total of +1.6 runs above average. Where did the other 1.9 runs go? Well, they went in how Verlander sequenced the events. He owns that and no one else.
If you include balls in play, the fielders also take their share of the credit for that, but overall, the pitcher is going to own more than 50% of the sequencing, maybe closer to 75%.
?The two players have virtually identical number of PA (difference of only six). Baseball Reference has Beltre at +220 runs above average with the bat, while it has Chipper at +558 runs. A 328 run is a clear and decisive victory for Chipper. Except we see that Chipper has 85 WAR and Beltre at 84, as close as you can get them to be even. So, what happened?
First and foremost is a 250 run gap in their defense. You can't believe it can be that much for what amounts to 15 full 162-game seasons? Well, we pretty much figure that the best fielder in baseball is about +20 runs better than average per season, and Beltre is among the best in baseball. It works. Then, Beltre played in the tougher AL league which account for another 40 runs or so. Chipper, according to Baseball Reference, played in an environment where the runs to win conversion was 10.5, while Beltre was 10.0.
Suddenly, a 300 run gap in offense, after accounting for defense, league, and playing environment shrinks to a 1 win gap.
Basically, given a choice of Chipper's career or Beltre's career (should it end this season), and WAR at Baseball Reference says it's a tossup.
We'll take the Jays who are 71-56, which is 33 wins above replacement, which we can split into a share for nonpitchers (19 WAR) and a share for pitchers (14 WAR). The key is that everything has to add up. For a pitcher, all we know about is his W-L record. Remember, simple pre-teen metric. The Jays pitchers have an obviously 71-56 record, but we want to somehow get that down to 14 WAR. You can get there by simply subtracting .45 wins from every decision.
We'll look at two pitchers to see how this affects them: Drew Hutchison has 14 decisions, of which we remove 6 wins, which turns his 12-2 record into 6 WAR. RA Dickey is 8-10, which now becomes 0 WAR. You can do this for all Jays pitchers, and you see you'll get 14 WAR for them.
Now, BASED ON WHAT WE KNOW, which is that we only know about a pitcher's W-L record, what I did is valid. But, it comes with it a huge assumption: all other things equal. That is, the pitcher W-L record, as assigned by MLB, is representative of a pitcher's performance. But Hutchison has received 3 runs per game in run support more than Dickey! Clearly, the W-L record doesn't represent our pitchers very well. It ignored a key context: the other half of baseball, which is the run support.
So, you need to heavily regress this metric if you don't consider run support. Yes, Hutch will still end up ahead of Dickey if all you have is the W-L. But at least, we won't give FULL WEIGHT to the W-L. We look at it with a high degree of skepticism. Maybe we end up with 3.5 WAR for Hutch and 2.5 WAR for Dickey or something, rather than 6 and 0.
The same applies for fielding, be it UZR or anything else. A player can be shown to be +30 runs above average, but, did it ignore some key piece of context? Or could it be biased in some manner? If you see one fielder at +30 and another at -20, it's almost certain there is not a 50 run difference between the two players, in terms of their performance. Chances are, the +30 fielder had good context that was not apparent or considered. And the -20 had bad context that wasn't considered.
The same applies for running and hitting and everything else. Except for hitting, things are ALOT more apparent. It's why we don't really care about regressing say the results by 5% or 10%... the margin of error is too small to bother with it. But just because we don't bother with it doesn't mean it doesn't exist or apply. It's there. And the honest thing to do is to apply it to all the value metrics, some more than others.
?Terrific stuff from Dan. We've talked about the stuff at the top half of his article from several years ago. And the bottom half adds another layer of discussion.
?Spencer wrote an article that showed the run value of a swinging strike at around .16 runs and the called strike at .04 runs. I asked him for proof. And he provided his evidence. It's clear to me what the problem is. If it's not to you, try to think it through. My answer is below the line.
?David made a few changes, including some new stuff from MGL and updates to the FIP park factors, which we previously discussed.
What I don't understand is why the current half-baked version keeps getting thrown out as if the process is finished.
As the one who had a leading hand in developing WAR, I can tell you unequivocally that it is NOT a finished process, and I never say that it's a finished process. And that I support TWO competing versions (Fangraphs and Baseball Reference) who sometimes don't agree with their estimates, no one can conclude at all that WAR is a finished process.
So, you are a building a strawman.
At the same time, WAR *is* the best thing available. If you choose to do anything else, you are doing something inferior. It's really that simple. If you think YOUR process is BETTER than WAR, then bring it on. Bill James is bringing it on, so, we can evaluate his method.
But no one else is doing that. No one. So why the heck should I listen to anyone who might suggest that player A is better than player B, when he's given me nothing at all to evaluate his opinion on. Or, whatever is given to me is in such a tight framework that I can't evaluate it in a more holistic fashion, to see whether it's consistent and systematic.
We've all got our own personal WAR-like system. The two WAR systems out there are simply the best of them all.
?Answer: current ultimate.
Why? Well, let's say that you think it's a quick and easy tool. What else would you do? You might... well, use RE24 instead of wOBA. That's still part of the WAR framework, simply a different implementation. Maybe you want WPA? Sure, go ahead. Maybe you prefer Dewan to Lichtman? Sure. But all of that is still part of the same framework. That's why WAR is the ultimate tool: it allows you to swap in/out your various components. You can even choose to have a different scale for the fielding spectrum. You can even change the replacement level to something higher or lower. And still, you'd be using WAR.
So, go ahead, and treat Fangraphs and Baseball Reference as something "quick and dirty". But I will promise you that whatever you will do will be even quicker and dirtier.
What WAR does is give you a framework, and makes it very easy for everyone to have their own implementation. Don't like what you see? Well, you are being given a systematic, consistent framework to which you can build your own house. Go ahead and do it, and give us an open house to look at it.
Or, complain that somehow these free WAR homes are not good enough and... well... keep wandering the streets with nowhere to sleep. The reality is that we all have some sort of WAR home. You just maybe don't know all the rooms in your own house, and the rooms keep changing, depending on which players come to visit. WAR at Fangraphs and Baseball Reference are merciless to its player guests.
Nov 23 14:15
Layered wOBAcon
Nov 22 22:15
Cy Young Predictor 2024
Oct 28 17:25
Layered Hit Probability breakdown
Oct 15 13:42
Binomial fun: Best-of-3-all-home is equivalent to traditional Best-of-X where X is
Oct 14 14:31
NaiveWAR and VictoryShares
Oct 02 21:23
Component Run Values: TTO and BIP
Oct 02 11:06
FRV v DRS
Sep 28 22:34
Runs Above Average
Sep 16 16:46
Skenes v Webb: Illustrating Replacement Level in WAR
Sep 16 16:43
Sacrifice Steal Attempt
Sep 09 14:47
Can Wheeler win the Cy Young in 2024?
Sep 08 13:39
Small choices, big implications, in WAR
Sep 07 09:00
Why does Baseball Reference love Erick Fedde?
Sep 03 19:42
Re-Leveraging Aaron Judge
Aug 24 14:10
Science of baseball in 1957
THREADS
October 02, 2024
Component Run Values: TTO and BIP
January 05, 2024
To the sublime CoreWOBA from the ridiculous OPS
November 17, 2023
Blake Snell or Spencer Strider?
September 26, 2023
Acuna and Betts, a smidge of a difference
April 02, 2023
Strikeouts v other outs
February 21, 2023
Who is the most fun player in MLB, outside of Ohtani?
February 06, 2023
Lies, Damned Lies, and Batting Average
December 03, 2022
Ryan Howard v Bobby Abreu, 2008
November 17, 2022
W/L using IP and ER
November 07, 2021
Statcast Lab: Markov Sequences, 4-seamers on 0-1 counts
July 21, 2021
Behind the wOBA curtain
April 12, 2021
Statcast Lab: How much is extra speed, movement and SSW worth?
March 13, 2021
Post-introducing Core wOBA
September 25, 2020
Run Values By Pitch Count
June 17, 2020
When Heroes Collide
Recent comments
Older comments
Page 2 of 150 pages < 1 2 3 4 > Last ›Complete Archive – By Category
Complete Archive – By Date
FORUM TOPICS
Jul 12 15:22 MarcelsApr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref
Apr 12 09:43 What if baseball was like survivor? You are eliminated ...
Nov 24 09:57 Win Attribution to offense, pitching, and fielding at the game level (prototype method)
Jul 13 10:20 How to watch great past games without spoilers