Will get to the headline question in a second. First the setup.
Suppose you have two teams, the Expos who are favoured to beat the Spiders, in 52% of their matchups, on neutral sites.
When playing at the homesite of the Expos, they win 56% of the time. When they play as visitors against the Spiders, they win only 48% of the time. This is a typical home site advantage in baseball.
In a winner-take-all game, at the Expos home site, the Expos have a 56% win expectancy.
In a best-of-7 game series, where the Expos are home for 4 of the 7 games, the Expos have a win expectancy of 55.6%
That's right: you may think that there is more randomness in a single game, but that is not true. The Expos have a better chance of winning a one-game series than a best-of-7 series.
ANSWER TO THE QUESTION
Indeed, in a best-of-3, where all the games are at the home site of the Expos, the Expos have a win expectancy of 59.0%.
Do you know what kind of best-of-X series you need, with an even split of homesite games (except for the last game naturally)? Would you believe... 27? That's right, a best-of-27 series, with 14 games at the Expos home site and 13 games at the Spiders home site has the same win expectation as a best-of-3 all-Expos-home games.
Next time someone complains that a 3-game series is too short, or too random, remind them of the above fact.
Tarik Skubal got 228 outs on strikeouts, facing 753 batters. Since the league average strikeout rate is 22.6%, then the league average pitcher would get 170 strikeouts on 753 batters. Skubal therefore got 58 more strikeouts than the league average.
The run value of an out, strikeout or otherwise, in 2024 is roughly -.264 runs. In other words, the run potential is reduced by .264 runs for every strikeout.
Since Skubal is +58 on strikeouts, and the run value is .264 runs for each strikeout, then Skubal saved about 15.4 runs on strikeouts.
Here's how Skubal looks for every event, where plus is good for the pitcher, and minus is bad:
+15.4 SO
+ 7.9 BB+HBP
+10.5 HR
+ 5.0 2B+3B
+ 2.9 1B+ROE
- 2.7 Fielded Outs
The total of all that is +39 runs, which leads MLB in 2024.
Notice also that I broke the events into two: the top set are what is called the Three True Outcomes, (TTO, or the unfieldable balls), while the bottom set are balls in park (BIP, or the fieldable balls).
Skubal is +34 in TTO and +5 in BIP. In other words, most of Skubal's value derives from Skubal himself without relying on his fielders.
The top pitcher in TTO in 2024 is Chris Sale at +40 runs, while he was -7 runs in BIP. That's right, Sale got below league average results on fieldable balls. Whether that is directly a result of Sale being a poor pitcher, or a bad job by the fielders, or a bad job by the team in setting up the fielding alignment, well, that's a different discussion. The key is to separate them in this manner, so we can focus on the one aspect, TTO, most in Sale's control, while acknowledging the other aspect, BIP.
Paul Skenes was fourth in TTO: +25 runs. He was +3 in BIP. Crochet third at +25 in TTO and -10 in BIP.
As you can see, our top 4 in TTO totalled +123 runs, while they were -9 runs in BIP. In other words: they got fantastic results when not relying on their fielders, and got below league average results when relying on their fielders.
How about pitchers who got fantastic results with their fielders? Bryce Miller looks like this:
+3.3 SO
+5.4 BB+HBP
+0.0 HR
+ 5.6 2B+3B
+ 9.8 1B+ROE
+ 9.3 Fielded Outs
All in all: +9 runs on TTO and +25 runs on BIP. Overall, that's +34 runs. Because he got most of his value on BIP, the understanding is that some of that value is not about the pitcher themselves, but rather his fielders or his team. Officially, he carries the stat line. In reality, he's the figurehead for everything that the pitcher and his fielders do.
So, what does this presentation buy us? Well, if it's not apparent, the TTO is essentially the same thing as FIP. Whereas FIP is set to the ERA scale, this presentation focuses on runs. Now, you may think: well, isn't ERA runs? Yes. But, for some reason the FIP-naysayers focus on that scaling to ERA to claim FIP is not a stat that describes what actually happened.
And in the case of runs on TTO, it's very clearly about describing what really happened. We take a direct path from event to runs.
What also happens by focusing on runs and breaking it into components is that now everything adds up very clearly and cleanly. It opens the door to seeing how players do year by year, and so we can see how much run value a pitcher gets on his TTO and his BIP. Indeed, we can see the run value at each event (SO, BB, HR, 2B, 3B, field outs, etc).
Anyway, this is how run values, wOBA, and FIP all tie together. They are all basically talking about the same thing, but just focus on different parts of the game or uses different scales.
As we know, when Corbin Burnes had his historical FIP season and won in 2021, that was a paradigm shift. From 2006-2020, the voters voted very consistently. And now, they are in a transition period. As a result, I have two Predictors, one is the Classic that works for 2006-2020, and another the New FIP-enhanced version that works almost as good as the Classic for 2021-2023. The Classic is probably a smidge ahead still, and it's a matter of time until the New version takes over. When that happens, I don't know. So, let's run both, with the Classic listed first, and the New in parens.
1. Skubal (1)
... way way way ahead
2A. Lugo (2)
2B. Burnes (5)
...
4. Valdez (6)
5A. Ragans (3)
5B. Blanco
5C. Gilbert (4)
5D. Miller
Anyone within ~1 point I put with the "letter" designation, signifying essentially a tie, and likely needing the New version as the tie-breaker
Even so, I consider FOUR points as being essentially tied, and so that's where the tiebreaker comes in. In the above, that means Valdez is really tied with the gang listed at #5
So, what do we learn here? Well, Skubal will win the Cy Young, easily. Number 2 will be Seth Lugo.
The uncertainty will be between Burnes who is NO LONGER the FIP-hero and Ragans. Burnes is ahead of Ragans by almost 6 points using the Classic Predictor, while Ragans is ahead of Burnes by almost 5 points using the New Predictor.
If we treat it as 3/4 Classic, 1/4 New, we get this as our top 6:
Skubal
Lugo
Burnes
Ragans
Valdez
Gilbert
Going 1/4 Classic, 3/4 New:
Skubal
Lugo
Ragans
Burnes
Gilbert
Valdez
And in all that will be Clase, who will finish somewhere between third and seventh. To finish second, he'd have to be considered the equal to Lugo, Burnes, Ragans. Given how little support the best relievers have received since Britton's incredible run in 2016, it'll be surprising if Clase is listed on all 30 ballots. As Britton was listed on 24/30, that's probably what Clase has as an over/under.
***
1A. Sale (1 runaway)
1B. Wheeler (2)
3. Skenes (3)
4A. Imanaga (9)
4B. King (6)
6. Lopez (7)
7. Cease (5)
..
10. Webb (4)
As close as Wheeler made it in the end, Sale's tripe-crown and runaway lead in the New Predictor will be an easy win for him. Whether it's unanimous is the only question.
Skenes will be third.
So, here's where the uncertainty happens, in the downballot. Cease/Webb are the FIP-hero, while Imanaga/King are the Classic hero.
Here's how it looks 3/4 Classic, 1/4 New:
Sale
Wheeler
Skenes
King
Imanaga
Lopez
Cease
Webb
Going 1/4 Classic, 3/4 New:
Sale
Wheeler
Skenes
Cease
Webb
King
Lopez
Imanaga
In order to see where we are in the paradigm shift, just look to see where Cease/Webb finish relative to Imanaga/King. Imanaga/King are ahead of Cease by 5 points and Webb by 10 points with the Classic Predictor. With the New Predictor, Webb is ahead of all of them, but especially with Imanaga by over 10 points. So, Webb/Imanaga especially will be the tell.
If you see a ballot that looks something like this:
Sale, Wheeler, Skenes, Webb, Imanaga
or
Sale, Wheeler, Skenes, Imanaga, Webb
Then you will see this makes no sense, as the voter has basically decided to not decide on their view. They've basically taken the position that they have no position and are still trying to balance everything out. A vote for Webb is a vote for Cease. And a vote for Imanaga is a vote for King. To choose one from each group is the reason we are still in this paradigm shift.
On the left image, the compass-like layout is showing the arm angle of every pitcher, with Zack Wheeler highlighted. The average angle for all of Wheeler's pitches is 27 degrees. The arm angle is measured in two dimensions, based on when the ball is released, and relative to the shoulder at that point in time.
The high arm slot pitchers are over 60 degrees, for both RHP (oRange circles) and LHP (purpLe circles). Pure sidearmers are at 0 degrees either way.
Small note: for LHP, you will see I put in parentheses an angle following the Cartesian plane standard. So, 0 degrees for a LHP is shown also as 180 degrees, while 60 degrees for a LHP is shown also as 120 degrees. This will become clear in a moment why I did that.
MIDDLE IMAGE
In the middle chart, we see the movement of all of Wheeler's pitches. Wheeler throws his sinker with an arm angle of 24 degrees (somewhat similar to the 27 degrees he throws all his pitches). That dark gray line represents his arm angle of 24 degrees.
Also on that chart is a dark orange line that goes to the center of his sinker movement chart: that represents the angle of movement of his sinker, which is 21 degrees.
As you can see, the angle of movement of his sinker is somewhat close to the angle of his arm. Logically, there would have to be some kind of relationship between how you throw your pitches and how a ball moves. The arm angle is just one factor. How the ball rolls off the fingers would be another, and this would be most obvious with sliders. And another one: by manipulating the orientation of the seams, you can trigger the airflow around the ball to push the ball in a certain direction more than it would otherwise move (aka Seam-Shifted Wake or SSW).
TOP RIGHT
The top right chart plots all of the release angles you see from the left chart (using the Cartesian standard) on the x-axis, along with the movement angle (that middle chart, but for the sinkers of all pitchers) on the y-axis. As you can see, there is a strong 1:1 relationship between arm angle and movement angle (for sinkers). Some of the pitchers buck the trend, like Tyler Rogers (that bottom left circle), with an arm angle of minus 65 degrees, but a movement angle of minus 84 degrees, for a 19 degree deviation.
BOTTOM RIGHT
The bottom right chart keeps the x-axis of the top chart and shows the y-axis as the movement angle minus the arm angle. In other words, how much deviation is there in the sinker movement, relative to the arm angle. Here, it becomes a bit clearer that there is additional movement arm-side.
For RHP, the average arm angle is 34 degrees, while the movement angle is 28.5 degrees, so there's an average of 5.5 degrees of deviation, an extra 5.5 degrees of drop (or sink, hence the term sinker).
For LHP, the average arm angle is 33 degrees (or 147 in Cartesian), with an extra 5.6 degrees of sink.
And this is how arm angle and movement angle relate to each other, for sinkers.
***
I actually had wanted to start this for 4-seam fastballs, but there were a few pitchers that were way off. In looking at those pitchers, it became clear the reason: they were likely throwing cutters, not 4-seam fastballs. While we investigate those pitchers, I turned my attention to sinkers to better illustrate the concept.
***
Fans of Matt Lentzner may remember this article from 15 years ago at Hardball Times, as a precursor to his Pitching Peanut (slideshow or powerpoint).
It is very (very very) simple to figure out Runs Above Average (RAA) for a pitcher. I'll use Paul Skenes as the example.
Take the league average ERA (4.086) and subtract our pitcher's ERA (1.992). That makes Skenes 2.094 runs per 9 IP better than league average.
Since Skenes has 131 IP, we take the above number (2.094/9) and multiply by 131 to give us +30.5 runs above average.
That's it. That is Runs Above Average using ERA-only. That figure for Skenes is 4th highest in MLB, behind Sale (+34 runs), Skubal and Wheeler (+33).
Now, you may be asking: what about park factors? Baseball Reference has Skenes as pitching in slightly batter's parks. So, that simple league average of 4.086 is actually too simple, since that figure is the same for all pitchers. We know that can't possibly be true. Skenes also faces tougher competition than average. Skenes supposedly has weaker fielding support than others. When you make all these adjustments, Skenes actually ends up being +41.5 runs above average. Remember, unadjusted he was at +30.5 runs above average. So, the adjustments gives him an extra +11 runs. That's right, his 1.99 ERA is actually NOT giving him enough credit.
Since Baseball Reference is terrific in how they share their data, it's really quite simple to compare the ERA-only RAA to the fully-adjusted RAA they provide.
On this chart (click to embiggen), on the x-axis is the ERA-only RAA. If you don't want anything adjusted and you just want to rely on ERA, then just look at those numbers.
The y-axis is the bonus (or deduction) you have to apply to your pitcher to account for the context that they end up pitching in. Skenes for example is in the right corner, at 30; 11. That means his ERA-only RAA is +30 runs, and he has a +11 run bonus for his context. So, he's worth +41 RAA.
Some pitchers get FAR more bonus than that. Hunter Greene gets +19 runs of bonus for his context. That means his ERA is really clouded, practically Coors-like in its effect. So, he's +20 runs for his ERA-only and another +19 runs for the context, for a total of +39 RAA.
Erick Fedde is +13 for his ERA-only, and another +17 for his context, giving him +30 runs above average.
We can compare Cy Young candidates Cole Ragans (+17, +10) to Logan Gilbert (+18, -10). You see, both are very similar based on their ERA. But according to Reference, Ragans faced a tough context, while Gilbert had a pretty easy context. That's a 20 run gap between the two in terms of their context. So Ragans ends up being +27 while Gilbert is only +8. In other words, instead of Ragans being 1 run behind Gilbert, he's 19 runs ahead, all because of the 20 run difference in their context.
Now, there's no question that if you are a Mariners fan, you will disagree, and a Royals fan is quite happy. That's unfortunately how these contexts gets interpreted: how does it affect MY player.
Chris Flexen is one of the worst pitchers in baseball using ERA, at -18 runs. But Reference says he also had one of the toughest pitching environments to the tune of +17 runs. So overall he ends up being practically league average at -1 runs from average.
Did Chris Bassitt have an ordinary season (-1 RAA)? Or did he have one of the easiest contexts in all of baseball (-15 runs) so that he actually had a disastrous season (-16 RAA leading to -0.1 WAR)?
By ERA, Bassitt is 17 runs better than Flexen. By fully-adjusted Reference method, Flexen is 15 runs better than Bassitt. One had an average season, one had a disastrous season. And which pitcher had which is based on whether to fully trust ERA or to fully accept the adjustments.
Reference lays it all out there for you so you can see what they are doing. You either buy it or you don't. But the transparency is something to be commended.
I looked at the Cy Young voting for 2018-23, excluding 2020. That's 10 Cy Young winners.
There were 100 names listed on those ballots (an average of 10 pitchers per Cy-season), with 70 unique pitchers. Gerrit Cole was listed all 5 times. Three-timers: Burnes, deGrom, Verlander, Gausman, Scherzer.
Let's talk about relief pitchers. There were 10 of them, with Edwin Diaz the only relief pitcher to appear in two different seasons (2018, Mariners; 2022, Mets).
The best showing by a relief pitcher was in 2018 with Blake Treinen, who appeared on 8 of 30 ballots. This is the BEST showing for a relief pitcher over these 10 Cy Young seasons.
There were 300 ballots cast in these seasons. Not a single one had a relief pitcher get a single first place vote. And given Sale/Skubal in 2024, that's going to continue.
Only 1 of the 300 ballots had a relief pitcher appearing in 2nd place (Diaz, 2022).
Only 4 of the 300 ballots with a reliever in 3rd place (Liam Hendricks with 3, and Treinen with 1).
Only 5 of the 300 ballots with a reliever in 4th place (3 Treinen, 1 Hader in 2018, 1 Yates in 2019).
Finally, 19 of the 300 ballots had the reliever in 5th place.
And this is where we are with relievers: 29 of the 1500 slots on the 300 ballots had a relief pitcher named! Five of the ten relievers who got Cy Young votes only got votes as a token 5th place.
So where does this leave Clase in 2024? Well, he won't get any 1st place votes. As for 2nd thru 5th, he's competing with: Lugo, Burnes, Ragans, Valdez. Given how relievers have been treated, it would be a huge win if Clase appears on half the ballots. I suspect he'll top off with at most 5 votes for 2nd place, and at most 10 votes for 2nd+3rd. Meanwhile, one of the remaining starters will likely get at least 15, if not 20 votes for 2nd+3rd place. Clase just won't be able to compete with that.
In the end, Clase will likely finish at best 3rd, and at worst 6th place. Treinen finished in 6th place while getting 8 votes (1 3rd, 3 4th, 4 5th). Clase will likely finish better than that. The last time a reliever finished better than 4th overall was, I dunno, when Eric Gagne won? Kimbrel, Kenley, Aroldis all topped off at 4th I believe. So, I'd look for Clase to finish 4th or 5th.
In trying to find players who have a particular spray angle tendency, and thereby really mess with fielding alignments, I tried a different approach: I will let the clubs tell me who has a particular spray angle tendency. How do I do that? Well, in 2021+22, clubs were totally allowed to place their fielders anywhere they wanted. So, I simply figured out how often batters were shifted. Then, in 2023+24, clubs were totally prevented from stacking fielders to one side. Therefore, if the spray tendency of the batters mattered, we would see it in the x-stats which specifically ignore the spray tendency of the batters.
I have 24,931 plate appearances from LH batters in 2023+24 who were heavily shifted (80%+ of the time) in 2021+22. Those batters had an xwOBA of .339. Remember, this is only using launch angle+speed, seasonal sprint speed (and walks and strikeouts and hit batters). It ignored the spray angle. What was their actual wOBA? .340.
How about the very opposite: LH batters who were shifted less than 20% of the time in 2021+22, how did they do in 2023+24? Actual .306, xwOBA of .308.
Here are the five groups of LHH, from least heavily shifted, to most heavily shifted, with Actual wOBA first, and xwOBA next:
.306, .308
.319, .315
.316, .320
.320, .320
.340, .339
As you can see, no pattern. This is unlike for example Sprint Speed. When we exclude seasonal Sprint Speed, and then group our players by Sprint Speed, we in fact DO see a pattern. This is called a Systematic Bias. This is why we subsequently included Sprint Speed as a parameter to counteract this bias and neutralize it.
The above by the way also applies to RHH, there was no pattern with them either:
Fielding Run Value (FRV) is what you will find on Savant (and Fangraphs) and is the metric I spearheaded.
DRS (Defensive Runs Saved) is what you will find on Fangraphs and Reference, and is the metric spearheaded by John Dewan.
Now, one way to measure a metric is to see how well it can predict the OTHER metric in the year after. I've done this for Catcher Framing for example, where we learned that the Steamer metric actually predicts next year's Savant Framing metric as good as current year's Savant does. In other words, Steamer is value-added, as it is like Savant, and more.
So, this is what I did, and given the incredible layout of Fangraphs, it took me literally under 5 minutes to run the study. I exported everyone with 600+ innings and removed catchers. I turned everyone into runs per 27 outs. I correlated year T to year T+1, matching on player and position. This left me with 635 matched players.
First, how does each correlate to itself? For FRV (that's the Statcast version), it's at r=0.60. For DRS, it's r=0.50. This is a pretty good sign that FRV is better able to isolate the players (though you might argue I haven't proven that I've taken care of parks, so maybe I should look at team switchers... someone out there can pick that up).
Now, how does DRS correlate with FRV? In other words, can DRS explain FRV? That correlation is r=0.38. That's not bad. It shows that DRS sees itself and FRV different enough, though it's still able to explain a good portion of FRV.
How about FRV explaining DRS? That correlation is r=0.40. That's not bad as well. The same explanation holds, though in this case, FRV is able to explain itself to a higher degree than DRS can explain itself, all the while being able to explain DRS slightly better than DRS can explain FRV.
The knockout punch isn't there. It would have been great if FRV would have had a correlation of r=0.50 to DRS (and thereby matching the DRS correlation to itself of 0.50). That didn't happen, with an r=0.40 instead. It would have been interesting had FRV had a correlation of r=0.38 with itself as that's the knockout punch DRS would need. That obviously didn't happen, as it instead had an r=0.60.
So, there's enough here to suggest that both have value, though the value is stronger with FRV.
INFIELDERS v OUTFIELDERS
Now, the FRV method is really an Outfield and Infield method, two separate methods. I suspect that DRS likely has two somewhat distinct methods. So, let's repeat all that, but look at infielders-only and outfielders-only.
With the Infield, DRS correlates with itself at r=0.49, while FRV is at r=0.46. Slight advantage to DRS for self-correlating better. DRS correlates with next season's FRV at r=0.30, while FRV correlated with next season's DRS at r=0.28. Overall, it certainly looks like DRS has a slight advantage. I'd probably call it 55/45 for DRS here. If you want to call it 60/40 in favor of DRS, ok, I won't argue. DRS has two things going for it, one is the DP handling and the other is the little things, the nuances, of playing the infield (like relays, and other subjective calls).
How about the outfield? Well, get ready to get your mind blown here. FRV correlates with itself at r=0.73, while DRS self-correlates at r=0.51. I mean, this is just no contest at all.
But, it's not just that. I'll give you the knockout punch as well. FRV correlates with next-season's DRS better then DRS correlates with itself next-season: r=0.53 to r=0.51. Let that sink in for a bit. FRV knows nothing about DRS, knows nothing about how DRS measures things. And yet, it can predict next season's DRS better than DRS can (for outfielders).
Indeed, DRS can predict next season's FRV almost as well as it can predict itself: r=0.51 for self-correlation and r=0.49 for correlating FRV.
Why does this happen? Because the starting point of the outfielder and how much distance they have to cover is critical, and Statcast can precisely measure this.
In terms of weighting, I'd have to go at least 90/10 for FRV, if not 100/0.
It's clear that in the off-season, my time should be spent much more with infielders, and handing all those extras that I've been putting off. DRS deserves its flowers there.
Cleveland was ahead by 1 run in the bottom of the 5th, 1 out, runners on the corners. Chance of winning is .776
Runner on 1B attempted to steal 2B.
Choices for Rays was:
Let runner steal uncontested, keep runner at bay, leaving runners at 2B+3B, chance of winning .795
Throw to 2B, allow runner to score
CS means .795
SB means .833
As you can see, the win value of the runner at 3B gaining a base is exactly equal to the win value of the runner on 1B being thrown out. Which makes sense, since the run value of going from 3B to home plate is about plus 0.4 runs, and the run value of the CS is about minus 0.4 runs.
Of course, this ONLY makes sense if the CS was a guaranteed out. Otherwise, having the runner steal 2B and the runner scoring would be a disaster play.
End result: catcher should NOT have attempted to throw the runner out, and he got lucky to breakeven on that play.
Logan Webb is 2nd in MLB in IP with 189.2, with a not great, but good ERA of 3.46
Paul Skenes (120 IP) is 2nd in MLB in ERA (min 80 IP), with a not great, but fantastic ERA of 2.10
Skenes has been charged with 28 ER and Webb 73. The difference between the two is 45 ER and 69.2 IP, which is an ERA of 5.81.
Of the 141 pitchers with at least 80 IP, the bottom 10% have an average ERA of 5.83. This is illustratively, if not exactly, what we mean when we talk about Replacement Level (or Readily Available Talent Level) in WAR. In other words, what is the minimal level of performance at which you can play in MLB and provide the minimal level of value where you would earn the minimum salary.
All those innings that Skenes didn't pitch, and that provided no value is essentially as if he had thrown 69.2 IP and allowed 45 ER. That is the no-value level of performance. It neither adds, nor subtracts, from his overall value. We added 0-value to his outstanding value of 2.10 ERA in 120 by adding 45 ER and 69.2 IP.
And when we do that, when we add no value of 45 ER and 69.2 IP, we end up with Logan Webb, and his 3.46 ERA in 189.2 IP.
In other words, Paul Skenes and Logan Webb generated the same amount of value, even if they got there in very different ways.
The framework of WAR is perfect. Indeed, I've adapted the concept of WAR in baseball to create frameworks for hockey, basketball, and volleyball. Doing the same for football and futbol and cricket and any other sport is easily solvable. Since I first developed WAR about twenty years ago, and with the benefit of hindsight, there is nothing that I would change about its framework.
I would tweak its presentation, but back then, I was speaking to a small group of folks, and those are just surface-level details. Had I known it would take off the way it did, I would have had a better presentation, notably the Individualized Won-Loss Records (aka The Indis).
So, framework is perfect. Presentation can be improved. What about the implementation?
The implementation is what you see on Fangraphs and Baseball Reference: they take the framework and then actually build something with it. The framework is the design, almost blueprint. But to actually build WAR, well, there are things not noted in the blueprints, like nails and screws and age of the wood and types of pipes and lighting fixtures. Those are implementation details.
In other words, there are alot of choices you have to make, big and small, in order to take a blueprint and turn it into a house.
This is easiest shown by looking at the differences in the two main implementations of WAR. On Fangraphs, their pitching WAR is centered around FIP. On Reference, the starting point is Runs Allowed (RA/9). The framework of WAR doesn't insist on anything in this regard. It is a feature, not a bug, that allows two frameworks to exist as they do with the big difference in choices made.
There are smaller choices as well. Park factors are encourage and considered in the framework of WAR. But how to actually calculate park factors? That's an implementation choice. Again, feature, not bug.
Should you consider performance with runners on base? Sure, the framework of WAR allows for it. The two main implemetations, Fangraphs and Reference, are of similar mindset when it comes to batters in this regard. But nothing is stopping anyone (other than time) from making a different choice. You could use RE24 (run expectancy by the 24 base-out states). You could use WPA (win probability added). You could decide that you are not really sure about any of these, but want to give some consideration to each of them: so, you could, for example, give 10% weight to WPA and 40% weight to RE24 and 50% weight to wOBA. You can literally make any small choice you want.
The implication of these choices will compound when you start to focus on individual players. See, for the majority of the players, whatever choices you make, it's going to cancel out. You make a dozen different choices in your implementation of WAR, and it'll help Aaron Judge a bit in seven of them, and hurt him in five. Or it'll help Bobby Witt Jr in four of them and hurt him in eight. You can modify your choices so that it helps Witt more than Judge. And every now and then, some of your choices will overwhelmingly favour one or three players. Naturally, you aren't building your WAR metric to want to do that. But, it'll happen. Again, feature, not bug. That's because these choices are opinions. Sure, they are fact-based opinions, but still opinions.
What the WAR framework does is insist on a systematic, unbiased, consistent process, rather than an arbitrary, biased, and capricious whim. Your personal WAR, whatever it is, is the latter. The WAR framework simply forces your opinion to follow a process.
As those who follow me know, the Cy Young Predictor has worked spectacularly well. Until Colin Burnes won it in his FIP year. I created a FIP-enhanced version as well, given that we may be in a paradigm shift.
Chris Sale (and Tarik Skubal) are running away with the predictor using the FIP-enhanced version. Skubal is ALSO running away with it with the classic predictor. So, we won't learn anything there.
However, Sale is barely holding back Wheeler with the Classic Predictor. This means that Wheeler has a chance for an upset here... as long as there are enough old-school voters whose behaviour is being captured by the Classic Predictor.
How many of the 30 voters are Classic voters? I don't know, but let's say that there are 20 Classic voters and 10 FIP-enhanced voters. This means that Wheeler is already 0-10, and he needs to perform well enough over his next 5 starts (and/or Sale pitch poorly enough) that Wheeler can get 16 of the 20 Classic voters. Wheeler and Sale are going to get all 1st and 2nd place votes, regardless of mindset.
In order for Wheeler to get 16 of 20 votes, he probably has to lead with the Classic Predictor by about 5 points. Right now, Sale is ahead in the Classic scoring by 1.4 points. So over the next 5 starts (assuming they each get 5 more starts), Wheeler needs to get about 6 or 7 more points than Sale.
How doable is that?
Sale is averaging 13.5 points per 5 starts with a standard deviation of 4.9 points per 5 starts. Wheeler is 12.25 points and 6.3, respectively. In terms of the difference of two distributions, the standard deviations is the RSS, or one standard deviation is 8 points.
With Wheeler 1.4 points behind already, and 1.25 points expected behind over the next 5 starts, he's 2.65 points behind and he needs to be about 5 points ahead, or a swing of almost 8 points.
In other words: one standard deviation. Which will happen about 16% of the time.
Of course, all this is pretty rough, and if you want to say 10% or 15% or 20% or 25%, that's fine. I can't really give you that precision.
I can tell you the current market is at 82% for Sale and 18% for Wheeler. So, it seems that the market is basically in line with the Predictor.
Fedde is at 3.9 wins above average (WAA), which is the same as the eventual NL Cy Young winner Chris Sale, and 0.1 wins below the eventual AL Cy Young winner Tarik Skubal. Hunter Greene leads at 4.3.
His WAR also follows similarly: 5.5 for Greene, 5.4 Skubal, 5.2 Fedde and Sale.
Fangraphs has Fedde at 2.9 WAR, tied for 24th, with the eventual Cy Young winners as 1-2: Sale 5.7 and Skubal 4.8.
So, what is going on here, how does Reference have Fedde squeezed in between Sale and Skubal?
For that, we have to give thanks to Sean Forman and his team for being ridiculously transparent about it all. Not only do they give you the step by step explainer for WAR, but then they present it component by component so we can understand what is going on.
The first thing to know is that Reference doesn't care about SO and BB and HBP and HR. What they principally care about is Runs Allowed. Not ERA, but RA/9.
Let's compare Fedde to Sale directly. Fedde has 1 more IP than Sale, while giving up 15 more runs (and 15 more ER for that matter). Right off the bat, we start with Fedde behind 15 runs behind Sale.
So how does he make up that difference? That 1 more IP gives him a 0.5 run advantage.
The first thing that jumps out here is the fielding support: Sale is being charged with 0.11 runs per 9 IP of fielding support, while Fedde is supposedly hurt with -0.41 r/9 of fielding support. That is a gap of 0.52 runs per 9 IP. And since they've each pitched the equivalent of 17 9-inning games, then 17 x .52 = 9 runs.
Is it possible for two pitchers to have a gap of 9 runs in fielding support? I actually track that right here:
Eovaldi and Bassitt have benefitted from 9 runs of fielding support above average when he was on the mound (that last part is key). Stroman and Spence have been hurt by 9 runs. So, comparing these pitchers specifically and we have an 18 run gap, which is huge. Therefore, a 9 run gap between two pitchers, while noteworthy, is reasonable.
The gap between Fedde and Sale however is only 4 runs... and it is FEDDE that has been getting better fielding support.
See, the difference in the two approaches is that on Savant we track the fielding support while that pitcher is on the mound. On Reference, it is a team-level adjustment. So, regardless of how the Braves fielders did with Sale on the mound, what matters is whatthe Braves fielders did for ALL their pitchers. Then that is proportioned out to each Braves pitcher. This is akin to a great hitting team counting as the same offensive support, even if in games pitched by one pitcher they only scored 3 runs per game and they scored 6 runs for another pitcher. When you make an overall team-level adjustment, ALL the pitchers are treated with the same run support. And that's what's happening here with the fielding support.
Indeed, Fedde has a .263 BABIP, while Sale is at .317. While not dispositive, it certainly argues in favor that Fedde has not been hurt by his fielders, while Sale has been. Which is what the Savant play-by-play evaluation supports (not to THIS extent, but to some extent).
Anyway, let's keep going.
Fedde is treated as pitching more in batter's parks, while Sale is neutral. I won't look into it some more, but let's assume this is accurate. The net impact is about 3% of runs, and so that's about 2 runs.
Fedde also faced tougher competition. Again, let's assume this is accurate. Reference shows an advantage of 0.13 runs per game, which works out to another 2 runs.
Let's add it up:
0.5 runs: IP advantage to Fedde
9 runs: fielding support to Fedde
2 runs: park support to Fedde
2 runs: opponent quality to Fedde
Add it up and it's 13.5 runs. That's close enough to the 15 runs that we've pretty much explained why Reference loves Fedde.
But, that fielding support number is what is carrying all the weight here. As I said, it should go 4 runs the other way. And once you do that, then all those components end up cancelling out down to .... 0 runs.
And we are left with Sale being 15 runs ahead of Fedde.
In order to buy into the Reference WAR, you have to buy into two things:
1. The overall fielding evaluations at the team level is correct
2. The partitioning of these evaluations at the pitcher level is fair
Unfortunately, there is no uncertainty level in these adjustments. And so, you end up with isolated issues like Fedde v Sale every year.
As a result, single-season WAR may be 90% reliable, but you have some one-offs like these that are off-putting.
That said, things like this work themselves out over a period of years to the point that being off by 1 or 2 wins here or there might be bothersome at the seasonal level, it ends up not really mattering at the career level.
I should also mention that I love Reference, it is an indispensible site for both me and the industry.
I introduced Leverage Index about twenty years ago. One of the early things I did with it back then, which has not really been followed-thru by anyone, is Re-Leveraging the data. I will explain what that means, using Aaron Judge as the example.
Leverage Index (LI) is simply a measure of how much impact that particular moment has on the game, in real-time. The average moment is 1.000. The highest leveraged moment (think bottom of the 9th, bases loaded, down by a run or two) will be around 10. Naturally, you can have an LI approach 0 in a blowout.
The top ace reliever will average an LI of 2.0, basically saying that the moment they come into the game has twice the impact as a random moment in a game.
AARON JUDGE
Aaron Judge, because he plays for the Yankees, and because games seem to be decided one way or the other earlier than normal, has an LI of only 0.9. That's not a reflection of HIM, but rather his circumstances. Right away, we can see that whatever he does, on average, it will be depressed by 10%. We'll take care of that in a moment.
The most crucial moment that Aaron Judge hit a HR is with an LI of almost 4, which is quite high. He has three more HR with an LI of around 2. Another 13 HR with an LI above 1. Another 13 with an LI above 0.5 Then 21 more HR with an LI of under 0.5. The average LI of when he hits a HR is only 0.78. This is much lower than his average circumstance of an LI of 0.9. When folks say that Judge hits alot of useless HR, this is what they are actually saying. How many useless HR is he hitting? I'll get back to that in a moment.
He has 31 doubles and triples. The average LI of those is 0.77, pretty much the same as his HR. This is not looking good for Judge. So far, his extra base hits are coming in substantially lower-leverage situations, even accounting for his overall low-LI to begin with.
His singles have an LI of 0.91, which is the almost the same as his overall average LI. His unintentional walks and HBP are at 0.84. His outs are also at an LI of 0.91.
Ok, so we have our evidence that Judge is actually not rising to the occasion. How can we measure that?
RE-LEVERAGING
When Judge hit that high-LI HR, the one with the LI of almost 4, that in essence meant that this plate appearance will swing the outcome of the game 4X as much as a random plate appearance. In other words, it's practically as if he had a 4-PA game in one PA. And so when he hit the HR in this situation, it is essentially as if he went 4-4 with 4 HR. And that's what we'll do: we will leverage this single PA and single HR as a 4PA event, counting it as 4 HR.
Of course, when he hits a HR in a 0.01 LI circumstance, that will count as 0.01 PA and 0.01 HR.
When we apply this to all his plate appearances, we end up with 491 plate appearances (instead of his actual 561, sans IBB). In order to properly re-leverage, we will bump up all his leveraged-stats by ~10%, so that we end up with 561 re-leveraged PA.
And when we do that, what happens? His actual 51 HR are re-leveraged as 45.4 HR. In other words, he loses 5.6 HR. And so we can say 5.6 of his HR are useless.
His 31 2B+3B become 27 when re-leveraged. So he loses 4 more extrabase hits. He gains 4 singles, loses 3 walks+HBP. And gets an extra 11 outs.
In the end, his actual wOBA of .497 ends up being re-leveraged as .467. This is a 30 point drop in wOBA, which we can easily convert to runs: divide by 1.2 and multiply by his PA of 561 to give us a loss of 14 runs.
IMPACT
In other words, whatever context-neutral value you may have as his run production, you need to drop it by 14 runs in order to properly account for the game situation. These are 14 runs that Aaron Judge did contribute to, but that the Yankees did not benefit from. So, when you translate his performance into wins, via WAR, you can consider removing 1.4 wins from his total. It all depends on whether you think it matters if his performance impacts a game in real-time or whether the circumstances are irrelevant. If the impact matters, then remove 1.4 wins. If the circumstances are irrelevant, then keep those rose-colored glasses on, I don't want to keep you from enjoying your own reality.
I will say this: the choice usually depends on how it affects your player. Had his re-leveraged performance would have gained him 1.4 wins, I am sure his legion of fans would accept the premise of Re-Leveraging.
A walk is as good as a hit, is essentially a true statement when the bases are empty. Which has been true for most of baseball history (with the exception being the extra inning placed runner, the XIPR).
In a Markov chain, the presumption is how you entered a state is immaterial. Being in a state is the information you need in order to know what's to come. So, if you have a runner on 1B with 0 outs, does it matter HOW you got there? If it doesn't, then that's your Markov state: runner on 1B, 0 outs. If it DOES matter, then your Markov state has to include how you go there, so that your actual Markov state is 1B-or-BB-or-HBP-or-Err, and the runner on 1B and 0 outs.
In an award-winning presentation at SABR52, Bailey Hall tackled that issue. The main overall point is that the number of runs that followed the runner on 1B, 0 outs state was essentially the same, regardless as to how the state was entered (0.94 to 0.93 runs following a leadoff BB or single respectively). But, Bailey did note that there may be a pitcher-by-pitcher effect, that maybe some pitchers are more affected by one or the other, and maybe even at the inning-level.
Most important to all this is that the question was asked, a solution has been offered, and the presentation is beyond outstanding (with pure baseball themes wherever you look). This is what an #AspiringSaberist should do: ask the question, roll up their sleeves, and show off the work. Because others will be watching, and they will remember any good work.
I am not looking at EVERY bases empty scenario. In the bottom of the 2nd, with Yanks ahead by 3, Aaron Judge was IBB with the bases empty (!). There were two outs, so maybe it's not so bad?
Let's go to the tape!
Aaron Judge is worth about 0.13 runs above average in a random PA, which means he's worth about 0.013 wins per PA. In this particular instance, the leverage index is 0.22, so his leveraged-wins impact is .013 x .22 = .003 wins
(Click to embiggen) The win expectancy for an average batter in this situation is .810. With Judge batting, that goes up a bit by .003 to .813.
An IBB puts the win expectancy at .817.
So, it's still a bad call to IBB Judge in this situation.
If instead of him being a .475 wOBA batter he was instead a .630 wOBA batter, then that's the breakeven point to walk him. Bonds at his best was around .540. So, no, you can't walk him here, which is why he doesn't get walked here.
Nothing bothers me more than people trying to report running speed in terms of Miles Per Hour. Actually, one thing bothers me more: trying to report it as some sort of instantaneous speed.
Let's take it one step (no pun intended) at a time. In a 100 m race that typically lasts close to 10 seconds, an Olympian will take some 40-50 steps. For the sake of ease of illustration, we'll say 100 m takes 50 steps, or 2 m per step.
You get maximum acceleration when your foot leaves the ground, while you get maximum deceleration when both feet are in the air. Usain Bolt for example would peak at 13 m / sec and bottom out at 11 m / sec, when he is in the middle of the race. That is a HUGE difference.
So, your window of measurement is critical here. If you take the instantaneous maximum speed, naturally it will be that blink-of-an-eye moment as the foot leaves the ground. That particular speed, on its own, is really irrelevant.
What you do want is a full cycle, a full step, at a minimum. And so, you measurement window will capture both the acceleration and deceleration phase.
Now, one step, 2 m in this illustration, will still give you some sort of measurement error. First, not every runner will take 2 m for one step. Some might be 2.5 m or 2.3 m. So, you are not capturing a full cycle here. It's going to be a mix-and-match of the acceleration-deceleration phase, where, depending on your start/stop, some runners will have a bit more of the accel-phase, while others will have a bit more of the decel-phase.
This is why we report running times based on the 10m split times. A 10 m window will give you 4 to 5 steps. Let's say 10 m is 4.5 steps. That gives you 4 steps in the accel-phase, 4 steps in the decel-phase, and then that leaves half-a-step which will be in the accel or the decel or in-between phase. What kind of uncertainty does this give us?
Let's go back to Bolt: with 4 full steps covering the accel-decel phases, he's running at 12 m / sec. The other half-step is either 11 m / sec at worst or 13 m / sec at best.
The weighted average therefore is 11.9 m/sec at worst and 12.1 m/sec at best. Therefore, out uncertainty range in terms of picking the "perfect 10m window" is 0.1 / 12 or 1%. That's our error range. That's probably what we can accept.
So, when you look at swimmers or runners or skaters, figure out the accel-decel phase for each step or stroke or cycle. Figure out what the speed is for each accel and decel. And figure out what uncertainty level you can accept. Once you do that, then you can figure out what window your distance and time will be measured against.
And please, report it in terms of seconds and metres (or feet or yards as your sport needs). Don't do MPH or KPH.
Having discussed Presence or Attentiveness plays, as well as Timing plays, we now turn our attention to the third kind of HR Saving plays: Speed
Setting aside the fence, Catch Probability is largely focused on how much Distance an outfielder has to run (from his starting point) and how much Time the ball is in the air for the outfielder to catch. Distance over Time is Speed. This is how we evaluate outfield defense. We intuitively understand this, even if we don't explicitly say it. That's because we don't have any easy reference points to say how many feet and how much time the play is. Until Statcast.
With Statcast, we know the Opportunity Time, and we know the Opportunity Distance. And so, we know the Opportunity Space.
The wall presents an extra challenge for us. The outfielder sees the fence as an impediment because in these particular HR saving plays, they are about to crash into a wall. This is unlike the Presence and Timing plays where the outfielder won't crash into the wall.
Even within the Speed plays involving the wall, there's a subset of plays as to whether the outfielder has to run-and-jump into the wall, or run-thru the wall. Each presents their own challenge. When it comes to tracking a ball 400 feet away, how high the ball is up the wall is sometimes difficult. Each foot matters a great deal vertically much more than it means horizontally. A ball measured 5 feet closer or deeper has a much smaller impact in our evaluation than a ball measured 5 feet higher or lower.
These speed plays are analogous to the 1-run save for a relief pitcher who comes into the game with the bases loaded and 0 outs being much different than a 3-run save with the bases empty.
There is probably no play more at odds between the eye test and the value conclusion than the HR saving play. And it all comes down to distinguishing about the different kinds of HR Saving plays.
The second kind of HR-saving play is based on timing. Similar to the Presence play, the outfielder has plenty of time to camp themselves under the ball. However, the wall is a bit higher, and the ball is a bit higher at the fence clearing point. And so, the fielder will need to jump. And because they need to jump in order to get to the ball, this will be based on timing the jump just right.
How hard is this to do? I don't know. But I would think such a play likely results in an out at least 70% of the time, and maybe as high as 90 or 95% of the time. For purposes of illustration, I'll say 80%. Don't forget that not all fielders are the same height or can jump just as high. So, timing a play for some fielders, they may have a larger margin of error than for other fielders.
In terms of evaluating the play, it works the same way as we do everything else: we compare to the average. We ALWAYS compare to the average. In everything. This does not mean that being 0 OAA means you have no value. This is probably the single-worst fallacy that is spewed by folks. 0 OAA just means you have AVERAGE value. And average value has... value.
Suppose you have a .500 starting pitcher, whose ERA and FIP and xERA and component ERA and whatever else you want to consider is exactly league average. Let's use the W-L record as representative of their performance. So, this average pitcher might be 14-14, and so is 0 Wins Above Average (WAA). These SP actually are in demand. Even if their WAA is 0.
See, the problem is that we've filtered down 14-14 as a two-dimensional value down to the one dimension of 0. This is a problem in presentation. This is why WAR (wins above replacement) took hold as well as it did: it keeps it one-dimensional, but it merges quantity with quality. Such a 0 WAA pitcher would be something like a 2.5 WAR pitcher.
Getting back to our outfielder who made the HR-saving timing play: if they made 4 such plays and mistimed one play, that's 80% out rate, which in this illustration is league average and so is 0 OAA. If they made all 5, they'd be +1 OAA. If they mistimed ALL five, they'd be -4 OAA. On average, league-wide, these outfielders are 0 OAA. That 0 OAA still has some value.
Recent comments
Older comments
Page 1 of 151 pages 1 2 3 > Last ›Complete Archive – By Category
Complete Archive – By Date
FORUM TOPICS
Jul 12 15:22 MarcelsApr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref
Apr 12 09:43 What if baseball was like survivor? You are eliminated ...
Nov 24 09:57 Win Attribution to offense, pitching, and fielding at the game level (prototype method)
Jul 13 10:20 How to watch great past games without spoilers